Skip to content

Some issues when sft Kimi K2.5 #19

@onehaitao

Description

@onehaitao

Hello, I want to sft Kimi K2.5, but I found some components or scripts is missing.

  1. Training logic is missing in K2.5 MoE block.
    There is only infer logic in DeepseekV3MoE and MoEGate.
    https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/modeling_deepseek.py#L456
    https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/modeling_deepseek.py#L536

  2. No bf16 ckpt for training
    Kimi-K2.5 base model is in INT4 format, no scripts provided for converting int4 to bf16
    https://huggingface.co/moonshotai/Kimi-K2.5/tree/main
    https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/SFT_Installation_Guide_KimiK2.5.md#12-convert-int4--bf16

Could you help me solve the problems metioned as above, Will the official release open-source training scripts in the future?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions