Skip to content

[docs] support deepseek_v4 readme#9430

Merged
Jintao-Huang merged 7 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_readme
May 27, 2026
Merged

[docs] support deepseek_v4 readme#9430
Jintao-Huang merged 7 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_readme

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the --loss_scale ignore_empty_think parameter to several Megatron SFT training scripts and introduces a new README documentation file for DeepSeek-V4 fine-tuning support. The reviewer identified several issues in the newly added README, including grammatical errors, typos (such as 'paddind_free'), incorrect markdown code block closures, and multiple empty placeholder sections that need to be completed.

Comment thread examples/models/deepseek_v4/README.md Outdated
Comment thread docs/source/BestPractices/deepseek-v4.md
Comment thread examples/models/deepseek_v4/README.md Outdated
Comment thread examples/models/deepseek_v4/README.md Outdated
Comment thread docs/source/BestPractices/deepseek-v4.md
Comment thread examples/models/deepseek_v4/README.md Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive training and fine-tuning support for DeepSeek-V4 in Megatron-SWIFT, including documentation in both Chinese and English, and updates various example scripts to use the --loss_scale ignore_empty_think flag. However, there are a couple of critical issues in the implementation: first, the configuration class reassignment in vllm_engine.py was accidentally nested under a model-specific check, which will break configuration loading for all other models; second, the layer-filtering logic in the documentation scripts only checks for layers. instead of model.layers., which could fail to filter layers properly and lead to out-of-memory errors when loading official weights.

Comment thread swift/infer_engine/vllm_engine.py
Comment on lines +63 to +66
if k.startswith('layers.'):
idx = int(k[len('layers.'):].split('.', 1)[0])
if idx >= 4:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The official HuggingFace DeepSeek-V4 model keys typically start with model.layers. rather than layers.. If the keys are prefixed with model., the condition k.startswith('layers.') will evaluate to False, and the script will fail to filter out layers 4 and above. This will cause the script to load the entire model, potentially leading to out-of-memory (OOM) errors. Please update the condition to support both model.layers. and layers. prefixes.

Suggested change
if k.startswith('layers.'):
idx = int(k[len('layers.'):].split('.', 1)[0])
if idx >= 4:
continue
if k.startswith('model.layers.'):
idx = int(k[len('model.layers.'):].split('.', 1)[0])
if idx >= 4:
continue
elif k.startswith('layers.'):
idx = int(k[len('layers.'):].split('.', 1)[0])
if idx >= 4:
continue

Comment on lines +63 to +66
if k.startswith('layers.'):
idx = int(k[len('layers.'):].split('.', 1)[0])
if idx >= 4:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The official HuggingFace DeepSeek-V4 model keys typically start with model.layers. rather than layers.. If the keys are prefixed with model., the condition k.startswith('layers.') will evaluate to False, and the script will fail to filter out layers 4 and above. This will cause the script to load the entire model, potentially leading to out-of-memory (OOM) errors. Please update the condition to support both model.layers. and layers. prefixes.

Suggested change
if k.startswith('layers.'):
idx = int(k[len('layers.'):].split('.', 1)[0])
if idx >= 4:
continue
if k.startswith('model.layers.'):
idx = int(k[len('model.layers.'):].split('.', 1)[0])
if idx >= 4:
continue
elif k.startswith('layers.'):
idx = int(k[len('layers.'):].split('.', 1)[0])
if idx >= 4:
continue

@Jintao-Huang Jintao-Huang merged commit 62a5c05 into modelscope:main May 27, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant