16 Jun 08:57

tpx818

f4cdfd5

v0.4.0 Latest

Latest

Highlights

Initial DeepSeek V4 support, covering Flash FSDP2 + EP training and DeepSeek V4 tool-call parsing and cleanup in #190 and #218
Expand Qwen3.5 training with padding-free / packed-sequence support and Qwen3.5 MoE GatedDeltaNet sequence-parallel support in #186 and #222
Add Gemma 4 multimodal training support #199
Strengthen LoRA training with rsLoRA for Multi-LoRA, FSDP2 support for Multi-LoRA SFT, and Expert Parallelism LoRA SFT examples for DeepSeek V4 and Qwen3.5 MoE in #187, #155, and #198
Improve NPU acceleration and stability with fused operators, Qwen3.5 FLA patches, Group MatMul EP scoping, and sequence-parallel compatibility fixes in #194, #204, #205, #206, and #208

New Features

Add padding-free and packed-sequence support for Qwen3.5 by @meichangsu1 in #186
Add rsLoRA support to Multi-LoRA by @xichengpro in #187
Add FSDP2 support for Multi-LoRA SFT by @kevssim in #155
Add DeepSeek V4 Flash FSDP2 + EP training support by @meichangsu1 in #190
Add NPU fused operators: RMSNorm, RoPE, SwiGLU, and SDPA by @ys2025-AI in #194
Add multi-turn rollout support by @tastelikefeet in #193
Add support for client-specified checkpoint saving paths by @vx120 in #196
Add LoRA SFT support for Expert Parallelism, with DeepSeek V4 and Qwen3.5 MoE examples by @kevssim in #198
Add Qwen3.5 NPU FLA and fused-operator patches by @ys2025-AI in #204
Add LoRA capacity query support by @kevssim in #201
Optimize Native FSDP memory_efficient_init weight loading for multi-node EP/FSDP jobs and add multi-node scripts by @meichangsu1 in #207
Add Gemma 4 support by @EvineR666 in #199
Add DeepSeek V4 tool-call parsing and cleanup support by @meichangsu1 in #218
Add Gemma 4 12B cookbook by @EvineR666 in #219
Add automatic device detection by @vx120 in #220
Add Qwen3.5 MoE GatedDeltaNet sequence-parallel support by @meichangsu1 in #222
Refactor server configuration and observability by @Yunnglin in #210

Bug Fixes

Fix cache reset behavior for multimodal models by @hjh0119 in #189
Fix Qwen3.5 GatedDeltaNet padding-free compatibility and create_causal_mask compatibility after cache_positions removal in transformers >5.3.0 by @meichangsu1 in #202
Fix transformers 5.9 AttentionMask wrapper compatibility in sequence parallel by @ys2025-AI in #206
Fix SP path overriding the NPU-patched chunk_gated_delta_rule by @ys2025-AI in #208
Fix NPU Group MatMul patch scope so it only applies in EP scenarios by @0hujun in #205
Fix adapter saving to use the MultiLora state dict by @meichangsu1 in #215

更新内容

亮点功能

首发支持 DeepSeek V4，覆盖 Flash FSDP2 + EP 训练，以及 DeepSeek V4 tool call 解析与清理 in #190 and #218
扩展 Qwen3.5 训练能力，新增 padding-free / packed-sequence 支持和 Qwen3.5 MoE GatedDeltaNet sequence parallel 支持 in #186 and #222
新增 Gemma 4 多模态训练支持 in #199
增强 LoRA 训练能力，覆盖 Multi-LoRA 的 rsLoRA、Multi-LoRA SFT 的 FSDP2 支持，以及 DeepSeek V4 / Qwen3.5 MoE 的 EP LoRA SFT 示例 in #187, #155, and #198
增强 NPU 加速与稳定性，覆盖融合算子、Qwen3.5 FLA patch、Group MatMul EP 以及 sequence-parallel 兼容性修复 in #194, #204, #205, #206, and #208

新特性

支持 Qwen3.5 padding-free / packed-sequence 训练 by @meichangsu1 in #186
Multi-LoRA 支持 rsLoRA by @xichengpro in #187
Multi-LoRA SFT 支持 FSDP2 by @kevssim in #155
支持 DeepSeek V4 Flash FSDP2 + EP 训练 by @meichangsu1 in #190
新增 NPU 融合算子：RMSNorm、RoPE、SwiGLU、SDPA by @ys2025-AI in #194
支持 multi-turn rollout by @tastelikefeet in #193
支持客户端指定服务端路径保存 checkpoint by @vx120 in #196
EP 支持 LoRA SFT，并新增 DeepSeek V4 和 Qwen3.5 MoE 示例 by @kevssim in #198
新增 Qwen3.5 NPU FLA 与融合算子补丁 by @ys2025-AI in #204
支持查询 LoRA capacity 信息 by @kevssim in #201
优化 Native FSDP memory_efficient_init 多节点 EP/FSDP 权重加载，并新增多节点脚本 by @meichangsu1 in #207
新增 Gemma 4 支持 by @EvineR666 in #199
新增 DeepSeek V4 tool call 解析与清理支持 by @meichangsu1 in #218
新增 Gemma 4 12B cookbook by @EvineR666 in #219
新增自动显卡设备检测 by @vx120 in #220
支持 Qwen3.5 MoE GatedDeltaNet sequence parallel by @meichangsu1 in #222
服务端配置与可观测性重构 by @Yunnglin in #210

Bug 修复

修复多模态模型 cache reset 问题 by @hjh0119 in #189
修复 Qwen3.5 GatedDeltaNet padding-free 训练兼容性，并兼容 transformers >5.3.0 中 cache_positions 移除后的 create_causal_mask 逻辑 by @meichangsu1 in #202
修复 sequence parallel 中 transformers 5.9 AttentionMask wrapper 兼容问题 by @ys2025-AI in #206
修复 SP 路径覆盖 NPU patch 后的 chunk_gated_delta_rule 问题 by @ys2025-AI in #208
修复 NPU Group MatMul patch 作用范围，限定仅在 EP 场景启用 by @0hujun in #205
修复保存 adapter 时未使用 MultiLora state dict 的问题 by @meichangsu1 in #215

New Contributors

@tpx818 made their first contribution in #65
@wangxingjun778 made their first contribution in #68
@hzher made their first contribution in #92
@xichengpro made their first contribution in #123
@vx120 made their first contribution in #118
@0hujun made their first contribution in #183
@a550580874 made their first contribution in #176
@ys2025-AI made their first contribution in #194
@EvineR666 made their first contribution in #199

Full Changelog: https://github.com/modelscope/twinkle/commits/v0.4.0

Contributors

wangxingjun778, meichangsu1, and 12 other contributors

Assets 2

07 May 02:46

tastelikefeet

v0.3.0

92185a4

v0.3.0

中文版本

新特性

全面支持padding_free参数，可用于sft、dpo、grpo等各类训练中，在InputProcessor构造时传入padding_free=True即可生效
支持resume-from-checkpoint，参考这里

Bug修复

更新了lora dtype和模型dtype不同导致的训练问题
修复了npu gemm算子的支持
修复npu下fsdp生效时megatron gather报错的问题

English Version

New Features

Full support for the padding_free parameter, which can be used in various training types such as SFT, DPO, GRPO, etc. It takes effect by passing padding_free=True when constructing InputProcessor.
Support for resume-from-checkpoint. Refer to here.

Bug Fixes

Fixed a training issue caused by mismatched LoRA dtype and model dtype.
Fixed support for the NPU GEMM operator.
Fixed an error where Megatron gather failed when FSDP was enabled on NPU.

What's Changed

Update docker file by @tastelikefeet in #180
fix: model dtype is not same as lora dtype in FSDP train by @0hujun in #183
fix: when setting fsdp size unuse megatron for gather in npu by @0hujun in #185
npu gemm patch by @a550580874 in #176
Support dpo/grpo/gkd/sft padding_free by @tastelikefeet in #181
[feat] Resume from ckpt by @kevssim in #135

New Contributors

@0hujun made their first contribution in #183
@a550580874 made their first contribution in #176

Full Changelog: v0.2.1...v0.3.0

Contributors

kevssim, tastelikefeet, and 2 other contributors

Assets 2

22 Apr 13:45

tastelikefeet

v0.2.1

593e567

v0.2.1

中文版本

新功能

支持了Qwen/Qwen3.6-27B的魔搭官方服务，详情查看：https://www.modelscope.cn/organization/twinkle-kit

Bug修复

修复了expert权重同步错误的问题
修复了多lora场景下GRPO MoE训练崩塌的问题
修复了对多模态输入的序列切分问题
修复了pp > 1 和tp>1时服务器不正常的问题
修复了多处remote_function不正确的问题
修复了服务器训练模型上传和模型训练共用pipeline导致阻塞的问题
修复了采样器模块的一些bug

English Version

New Features

Added support for the official ModelScope service on Qwen/Qwen3.6-27B. For details, see: https://www.modelscope.cn/organization/twinkle-kit

Bug Fixes

Fixed an issue with incorrect expert weight synchronization.
Fixed a training collapse issue with GRPO MoE in multi-LoRA scenarios.
Fixed a sequence splitting issue for multimodal inputs.
Fixed abnormal server behavior when pp > 1 and tp > 1.
Fixed multiple incorrect remote_function implementations.
Fixed a blocking issue caused by the model upload and model training pipelines sharing the same pipeline on the server side.
Fixed several bugs in modules such as the Sampler.

What's Changed

add base_layer suffix for expert weights by @hjh0119 in #159
update cookbook and doc 0415 by @Yunnglin in #157
Docs support Q3.6 by @tastelikefeet in #158
Fix multi lora device by @tastelikefeet in #160
Fix MoE multi-lora training by @tastelikefeet in #161
Fix model id and upload to hub by @Yunnglin in #162
Add notebooks by @tastelikefeet in #164
Npu adapt megatron by @addsubmuldiv in #153
Fix save by @tastelikefeet in #165
A small refactor by @tastelikefeet in #166
A small refactor, move 4d mask to processor by @tastelikefeet in #167
Fix some potential bugs by @tastelikefeet in #168
Fix some bugs by @tastelikefeet in #169
fix mm tokentypeids splitting by @tastelikefeet in #170
Fix model pp > 1 and tp > 1 errors by @Yunnglin in #171
Fix moe weight sync by @tastelikefeet in #172
update notebooks by @Yunnglin in #174
Modify remote_function decorators in multi_lora_transformers by @xichengpro in #173
support cp ,fix qwen3.5 gdn sp by @meichangsu1 in #138
support qwen3.6 grpo & in-place add lora by @hjh0119 in #163
Fix multi lora by @tastelikefeet in #177
support q3.6-27b by @tastelikefeet in #178
Fix sampler and grpo by @Yunnglin in #179

Full Changelog: v0.2.0...v0.2.1

Contributors

meichangsu1, addsubmuldiv, and 4 other contributors

Assets 2

14 Apr 15:48

tastelikefeet

v0.2.0

0990ada

v0.2.0

中文

新特性

重构了服务部分，目前的多租户服务支持tinker/twinkle双client语法规则。
支持GKD和On-policy蒸馏，请查看cookbook。
megatron的底层替换为mcore_bridge库，并支持了对应的多模态训练。
支持了DPO算法，请查看cookbook。
支持了Qwen3.5系列的多模态任务的训练。
新增了服务端可用的Dockerfile。

Bug修复

0.2.0 bug修复较多，请查看如下的修复列表。

English

New Features

Refactored the service layer; the multi-tenant service now supports both tinker/twinkle dual client syntax rules.
Added support for GKD and On-policy distillation — see cookbook.
Replaced the underlying Megatron backend with the mcore_bridge library, with support for corresponding multimodal training.
Added support for the DPO algorithm — see cookbook.
Added support for multimodal task training on the Qwen3.5 series.
Added a server-side Dockerfile.

Bug Fixes

A significant number of bugs have been fixed in 0.2.0 — please refer to the fix list below.

What's Changed

Fix tinker loss device mismatch by @addsubmuldiv in #115
Refact server by @Yunnglin in #111
[Fix] EP+FSDP checkpoint save for MoE expert parameters by @kevssim in #116
Support GKD and on-policy distillation by @tastelikefeet in #112
Support patcher on samplers by @tastelikefeet in #119
fix vllmsampler client by @tastelikefeet in #122
[Fix] Prevent client from importing ray via twinkle.server.common.serialize by @xichengpro in #123
update Qwen3.5 grpo demo by @hjh0119 in #124
Fix mm server by @tastelikefeet in #125
fix tp get logps by @hjh0119 in #126
fix serve_multiplexed_model_id and mm data process by @Yunnglin in #120
[feat] fsdp2 memory_efficient_init by @kevssim in #117
fix megatron weights sync by @hjh0119 in #128
Support DPO by @tastelikefeet in #130
Fix npu qwen3moe grpo by @vx120 in #118
fix dpo with lazy_dataset by @tastelikefeet in #136
support transformers multi-modal grpo by @hjh0119 in #131
fix import by @kevssim in #137
Refactor megatron to mcore_bridge by @tastelikefeet in #134
Fix bugs by @tastelikefeet in #139
Change online model to qwen3.5-27b by @tastelikefeet in #140
Merge to main by @tastelikefeet in #141
fix megatron multi-lora converter by @hjh0119 in #144
Remerge release/0.2 to main by @tastelikefeet in #146
support rl vit lora with vLLM by @hjh0119 in #147
Add server metrics monitor and DPO client by @Yunnglin in #132
Fix multi lora saving by @tastelikefeet in #148
fix transformers model loading by @tastelikefeet in #150
fix short math grpo cookbook by @Yunnglin in #149
fix docker file by @Yunnglin in #151
fix cookbook by @tastelikefeet in #152
fix tensor collect by @Yunnglin in #154
fix multi lora training by @tastelikefeet in #156

New Contributors

@xichengpro made their first contribution in #123
@vx120 made their first contribution in #118

Full Changelog: v0.1.3...v0.2.0

Contributors

addsubmuldiv, kevssim, and 5 other contributors

Assets 2

13 Mar 05:03

tastelikefeet

v0.1.3

4640ce7

v0.1.3

中文版本

新特性

增加了client模式的便捷安装脚本，并提升了文档描述
支持transformers分支的ep+fsdp分片

Bug修复

修复加载本地数据集失败的问题
修复服务化启动时http_options错误传递到模型的问题

English Version

New features

Add a shell installation script to support the client mode, and improve the description of documentation
Support ep+fsdp sharding of transformers

BugFix

Fix a bug that causes an error on local dataset loading
Fix an error that the http_options argument was mis-transfered to the model in the server mode

What's Changed

Fix loading local datasets by @tastelikefeet in #108
[fix] http_options leaking to model init & NPU tensor serialization failure over HTTP by @kevssim in #109
Fix docs and add new start scripts by @tastelikefeet in #113
[feat]support ep_fsdp by @kevssim in #71

Full Changelog: v0.1.2...v0.1.3

Contributors

kevssim and tastelikefeet

Assets 2

05 Mar 15:42

tastelikefeet

v0.1.2

ec3a344

v0.1.2

中文

新特性

支持Qwen3.5系列的transformers模型多模态训练，包含图片和视频
支持数据集预处理的batched=True操作，提升速度

Bug修复

修复NPU下权重同步卡死的问题

English

New Features

Support multi-modal training of Qwen3.5 transformers framework, containing images and videos
Support batched=True when preprocess datasets

BugFix

Fix the hang problem of NPU weight synchronization

What's Changed

Update cookbok to qwen35 by @tastelikefeet in #98
Support Qwen3.5 mm by @tastelikefeet in #100
Support batched preprocessing by @tastelikefeet in #101
fix video mm by @tastelikefeet in #105
Fix GRPO weight-sync hangs and HCCL resource exhaustion on NPU by @addsubmuldiv in #102
add new cookbook with qwen3.5 by @tastelikefeet in #106
fix cookbook by @tastelikefeet in #107

Full Changelog: v0.1.1...v0.1.2

Contributors

addsubmuldiv and tastelikefeet

Assets 2

03 Mar 03:32

tastelikefeet

v0.1.1

c30d544

v0.1.1

Twinkle 0.1.1 version Release

中文

支持Qwen3.5-2B~Qwen3.5-9B等Dense模型

English

Support model series of Qwen3.5-2B~Qwen3.5-9B

Full Changelog: v0.1...v0.11

Assets 2

02 Mar 14:26

tastelikefeet

v0.1

4e3a6db

v0.1

中文

Twinkle框架的0.1版本发布！

新功能

🎉完整的数据集、DataLoader、Loss、Transformers和Megatron模型、Advantage、Sampler等组件的支持
🎉支持PT、SFT、RL等多种训练Stage，并支持单卡、多机多卡、Ray、Client-Server等多种训练模式
🎉支持了首版的多租户复用训练，并完整开源了server端实现。使用ray serve实现了多副本可扩缩容部署，并支持粘滞路由
🎉在魔搭官方网站上，提供了在线服务，用户可以使用该服务免费训练Qwen/Qwen3-30B-A3B-Instruct-2507，并推送模型到ModelHub上

English

Twinkle Framework Version 0.1 Released!

New Features

🎉 Full support for components including Dataset, DataLoader, Loss, Transformers and Megatron models, Advantage, Sampler, and more
🎉Support for multiple training stages such as PT, SFT, and RL, with various training modes including single-GPU, multi-node multi-GPU, Ray, and Client-Server
🎉 First version of multi-tenant shared training is now supported, with the server-side implementation fully open-sourced. Multi-replica scalable deployment is implemented using Ray Serve, with support for sticky routing
🎉 An online service is now available on the ModelScope official website, where users can train Qwen/Qwen3-30B-A3B-Instruct-2507 for free and push models to ModelHub

What's Changed

Squash to main by @tastelikefeet in #46
rename cmb by @tpx818 in #65
docs: update README and remove ulysses_size from ep_fsdp_qwen3_moe.py by @meichangsu1 in #64
add contrbutors by @yingdachen in #66
fix lora fetch by @tastelikefeet in #67
Update documentation links in README.md by @wangxingjun778 in #68
Fix router by @tastelikefeet in #69
Fix doc links and add tests by @tastelikefeet in #70
Refactor code by @tastelikefeet in #72
Fix compat tinker and update doc by @Yunnglin in #73
[compat] gpt_bridge compat transformers_5 by @Jintao-Huang in #75
Fix server state adapter limit by @Yunnglin in #74
Fix some bugs by @tastelikefeet in #77
[model] support Qwen3.5 series models by @hjh0119 in #76
fix single gpu bug by @tastelikefeet in #78
[bugfix] fix dense model get layer spec by @hjh0119 in #80
fix grad norm bug by @tastelikefeet in #81
Update readme by @yingdachen in #83
Add custom route for sticky session by @Yunnglin in #82
[bugfix] fix 4d attention mask device by @hjh0119 in #85
add more comment for node resouces by @tastelikefeet in #79
Update doc and fix bugs by @tastelikefeet in #84
Fix logps by @tastelikefeet in #86
recover cp sequence before loss by @hjh0119 in #88
[bugfix] fix logps with PP by @hjh0119 in #89
Fix megatron loss by @tastelikefeet in #90
Dev feature by @hzher in #92
Fix proxy by @Yunnglin in #87
fix TEGroupedLinear by @tastelikefeet in #94
[bugfix] fix grpo loss by @hjh0119 in #93
fix numpy version by @tastelikefeet in #95
[bugfix] fix contiguous by @hjh0119 in #96
Add a sample script by @tastelikefeet in #97

New Contributors

@tastelikefeet made their first contribution in #46
@tpx818 made their first contribution in #65
@wangxingjun778 made their first contribution in #68
@hzher made their first contribution in #92

Full Changelog: https://github.com/modelscope/twinkle/commits/v0.1

Contributors

yingdachen, wangxingjun778, and 7 other contributors

Assets 2

Releases: modelscope/twinkle

v0.4.0

Highlights

New Features

Bug Fixes

更新内容

亮点功能

新特性

Bug 修复

New Contributors

Contributors

Uh oh!

v0.3.0

中文版本

新特性

Bug修复

English Version

New Features

Bug Fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.1

中文版本

新功能

Bug修复

English Version

New Features

Bug Fixes

What's Changed

Contributors

Uh oh!

v0.2.0

中文

新特性

Bug修复

English

New Features

Bug Fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.3

中文版本

新特性

Bug修复

English Version

New features

BugFix

What's Changed

Contributors

Uh oh!

v0.1.2

中文

新特性

Bug修复

English

New Features

BugFix

What's Changed

Contributors

Uh oh!

v0.1.1

中文

English

Uh oh!

v0.1

中文

新功能

English

Twinkle Framework Version 0.1 Released!

New Features

What's Changed

New Contributors

Contributors

Uh oh!