[docs] support deepseek_v4 readme by Jintao-Huang · Pull Request #9430 · modelscope/ms-swift

Jintao-Huang · 2026-05-27T07:33:51Z

No description provided.

gemini-code-assist

Code Review

This pull request adds the --loss_scale ignore_empty_think parameter to several Megatron SFT training scripts and introduces a new README documentation file for DeepSeek-V4 fine-tuning support. The reviewer identified several issues in the newly added README, including grammatical errors, typos (such as 'paddind_free'), incorrect markdown code block closures, and multiple empty placeholder sections that need to be completed.

Jintao-Huang · 2026-05-27T10:34:18Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces comprehensive training and fine-tuning support for DeepSeek-V4 in Megatron-SWIFT, including documentation in both Chinese and English, and updates various example scripts to use the --loss_scale ignore_empty_think flag. However, there are a couple of critical issues in the implementation: first, the configuration class reassignment in vllm_engine.py was accidentally nested under a model-specific check, which will break configuration loading for all other models; second, the layer-filtering logic in the documentation scripts only checks for layers. instead of model.layers., which could fail to filter layers properly and lead to out-of-memory errors when loading official weights.

gemini-code-assist · 2026-05-27T10:35:36Z

+    if k.startswith('layers.'):
+        idx = int(k[len('layers.'):].split('.', 1)[0])
+        if idx >= 4:
+            continue


The official HuggingFace DeepSeek-V4 model keys typically start with model.layers. rather than layers.. If the keys are prefixed with model., the condition k.startswith('layers.') will evaluate to False, and the script will fail to filter out layers 4 and above. This will cause the script to load the entire model, potentially leading to out-of-memory (OOM) errors. Please update the condition to support both model.layers. and layers. prefixes.

Suggested change

if k.startswith('layers.'):

idx = int(k[len('layers.'):].split('.', 1)[0])

if idx >= 4:

continue

if k.startswith('model.layers.'):

idx = int(k[len('model.layers.'):].split('.', 1)[0])

if idx >= 4:

continue

elif k.startswith('layers.'):

idx = int(k[len('layers.'):].split('.', 1)[0])

if idx >= 4:

continue

gemini-code-assist · 2026-05-27T10:35:36Z

+    if k.startswith('layers.'):
+        idx = int(k[len('layers.'):].split('.', 1)[0])
+        if idx >= 4:
+            continue


The official HuggingFace DeepSeek-V4 model keys typically start with model.layers. rather than layers.. If the keys are prefixed with model., the condition k.startswith('layers.') will evaluate to False, and the script will fail to filter out layers 4 and above. This will cause the script to load the entire model, potentially leading to out-of-memory (OOM) errors. Please update the condition to support both model.layers. and layers. prefixes.

Suggested change

if k.startswith('layers.'):

idx = int(k[len('layers.'):].split('.', 1)[0])

if idx >= 4:

continue

if k.startswith('model.layers.'):

idx = int(k[len('model.layers.'):].split('.', 1)[0])

if idx >= 4:

continue

elif k.startswith('layers.'):

idx = int(k[len('layers.'):].split('.', 1)[0])

if idx >= 4:

continue

support deepseek_v4 readme

3efbd31

gemini-code-assist Bot reviewed May 27, 2026

View reviewed changes

Jintao-Huang added 5 commits May 27, 2026 18:09

update

d113aad

update

73b94c8

fix

b4cb77a

fix

03e9d48

update

04e84a6

gemini-code-assist Bot reviewed May 27, 2026

View reviewed changes

fix

c5d9a6b

Jintao-Huang merged commit 62a5c05 into modelscope:main May 27, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] support deepseek_v4 readme#9430

[docs] support deepseek_v4 readme#9430
Jintao-Huang merged 7 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_readme

Jintao-Huang commented May 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 27, 2026

Uh oh!

gemini-code-assist Bot May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jintao-Huang commented May 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant