Skip to content

Releases: modelscope/twinkle

v0.4.0

16 Jun 08:57

Choose a tag to compare

Highlights

  • Initial DeepSeek V4 support, covering Flash FSDP2 + EP training and DeepSeek V4 tool-call parsing and cleanup in #190 and #218
  • Expand Qwen3.5 training with padding-free / packed-sequence support and Qwen3.5 MoE GatedDeltaNet sequence-parallel support in #186 and #222
  • Add Gemma 4 multimodal training support #199
  • Strengthen LoRA training with rsLoRA for Multi-LoRA, FSDP2 support for Multi-LoRA SFT, and Expert Parallelism LoRA SFT examples for DeepSeek V4 and Qwen3.5 MoE in #187, #155, and #198
  • Improve NPU acceleration and stability with fused operators, Qwen3.5 FLA patches, Group MatMul EP scoping, and sequence-parallel compatibility fixes in #194, #204, #205, #206, and #208

New Features

  • Add padding-free and packed-sequence support for Qwen3.5 by @meichangsu1 in #186
  • Add rsLoRA support to Multi-LoRA by @xichengpro in #187
  • Add FSDP2 support for Multi-LoRA SFT by @kevssim in #155
  • Add DeepSeek V4 Flash FSDP2 + EP training support by @meichangsu1 in #190
  • Add NPU fused operators: RMSNorm, RoPE, SwiGLU, and SDPA by @ys2025-AI in #194
  • Add multi-turn rollout support by @tastelikefeet in #193
  • Add support for client-specified checkpoint saving paths by @vx120 in #196
  • Add LoRA SFT support for Expert Parallelism, with DeepSeek V4 and Qwen3.5 MoE examples by @kevssim in #198
  • Add Qwen3.5 NPU FLA and fused-operator patches by @ys2025-AI in #204
  • Add LoRA capacity query support by @kevssim in #201
  • Optimize Native FSDP memory_efficient_init weight loading for multi-node EP/FSDP jobs and add multi-node scripts by @meichangsu1 in #207
  • Add Gemma 4 support by @EvineR666 in #199
  • Add DeepSeek V4 tool-call parsing and cleanup support by @meichangsu1 in #218
  • Add Gemma 4 12B cookbook by @EvineR666 in #219
  • Add automatic device detection by @vx120 in #220
  • Add Qwen3.5 MoE GatedDeltaNet sequence-parallel support by @meichangsu1 in #222
  • Refactor server configuration and observability by @Yunnglin in #210

Bug Fixes

  • Fix cache reset behavior for multimodal models by @hjh0119 in #189
  • Fix Qwen3.5 GatedDeltaNet padding-free compatibility and create_causal_mask compatibility after cache_positions removal in transformers >5.3.0 by @meichangsu1 in #202
  • Fix transformers 5.9 AttentionMask wrapper compatibility in sequence parallel by @ys2025-AI in #206
  • Fix SP path overriding the NPU-patched chunk_gated_delta_rule by @ys2025-AI in #208
  • Fix NPU Group MatMul patch scope so it only applies in EP scenarios by @0hujun in #205
  • Fix adapter saving to use the MultiLora state dict by @meichangsu1 in #215

更新内容

亮点功能

  • 首发支持 DeepSeek V4,覆盖 Flash FSDP2 + EP 训练,以及 DeepSeek V4 tool call 解析与清理 in #190 and #218
  • 扩展 Qwen3.5 训练能力,新增 padding-free / packed-sequence 支持和 Qwen3.5 MoE GatedDeltaNet sequence parallel 支持 in #186 and #222
  • 新增 Gemma 4 多模态训练支持 in #199
  • 增强 LoRA 训练能力,覆盖 Multi-LoRA 的 rsLoRA、Multi-LoRA SFT 的 FSDP2 支持,以及 DeepSeek V4 / Qwen3.5 MoE 的 EP LoRA SFT 示例 in #187, #155, and #198
  • 增强 NPU 加速与稳定性,覆盖融合算子、Qwen3.5 FLA patch、Group MatMul EP 以及 sequence-parallel 兼容性修复 in #194, #204, #205, #206, and #208

新特性

Bug 修复

  • 修复多模态模型 cache reset 问题 by @hjh0119 in #189
  • 修复 Qwen3.5 GatedDeltaNet padding-free 训练兼容性,并兼容 transformers >5.3.0 中 cache_positions 移除后的 create_causal_mask 逻辑 by @meichangsu1 in #202
  • 修复 sequence parallel 中 transformers 5.9 AttentionMask wrapper 兼容问题 by @ys2025-AI in #206
  • 修复 SP 路径覆盖 NPU patch 后的 chunk_gated_delta_rule 问题 by @ys2025-AI in #208
  • 修复 NPU Group MatMul patch 作用范围,限定仅在 EP 场景启用 by @0hujun in #205
  • 修复保存 adapter 时未使用 MultiLora state dict 的问题 by @meichangsu1 in #215

New Contributors

Full Changelog: https://github.com/modelscope/twinkle/commits/v0.4.0

v0.3.0

07 May 02:46

Choose a tag to compare

中文版本

新特性

  1. 全面支持padding_free参数,可用于sft、dpo、grpo等各类训练中,在InputProcessor构造时传入padding_free=True即可生效
  2. 支持resume-from-checkpoint,参考这里

Bug修复

  1. 更新了lora dtype和模型dtype不同导致的训练问题
  2. 修复了npu gemm算子的支持
  3. 修复npu下fsdp生效时megatron gather报错的问题

English Version

New Features

  1. Full support for the padding_free parameter, which can be used in various training types such as SFT, DPO, GRPO, etc. It takes effect by passing padding_free=True when constructing InputProcessor.
  2. Support for resume-from-checkpoint. Refer to here.

Bug Fixes

  1. Fixed a training issue caused by mismatched LoRA dtype and model dtype.
  2. Fixed support for the NPU GEMM operator.
  3. Fixed an error where Megatron gather failed when FSDP was enabled on NPU.

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.0

v0.2.1

22 Apr 13:45

Choose a tag to compare

中文版本

新功能

  1. 支持了Qwen/Qwen3.6-27B的魔搭官方服务,详情查看:https://www.modelscope.cn/organization/twinkle-kit

Bug修复

  1. 修复了expert权重同步错误的问题
  2. 修复了多lora场景下GRPO MoE训练崩塌的问题
  3. 修复了对多模态输入的序列切分问题
  4. 修复了pp > 1 和tp>1时服务器不正常的问题
  5. 修复了多处remote_function不正确的问题
  6. 修复了服务器训练模型上传和模型训练共用pipeline导致阻塞的问题
  7. 修复了采样器模块的一些bug

English Version

New Features

  1. Added support for the official ModelScope service on Qwen/Qwen3.6-27B. For details, see: https://www.modelscope.cn/organization/twinkle-kit

Bug Fixes

  1. Fixed an issue with incorrect expert weight synchronization.
  2. Fixed a training collapse issue with GRPO MoE in multi-LoRA scenarios.
  3. Fixed a sequence splitting issue for multimodal inputs.
  4. Fixed abnormal server behavior when pp > 1 and tp > 1.
  5. Fixed multiple incorrect remote_function implementations.
  6. Fixed a blocking issue caused by the model upload and model training pipelines sharing the same pipeline on the server side.
  7. Fixed several bugs in modules such as the Sampler.

What's Changed

Full Changelog: v0.2.0...v0.2.1

v0.2.0

14 Apr 15:48

Choose a tag to compare

中文

新特性

  1. 重构了服务部分,目前的多租户服务支持tinker/twinkle双client语法规则。
  2. 支持GKD和On-policy蒸馏,请查看cookbook
  3. megatron的底层替换为mcore_bridge库,并支持了对应的多模态训练。
  4. 支持了DPO算法,请查看cookbook
  5. 支持了Qwen3.5系列的多模态任务的训练。
  6. 新增了服务端可用的Dockerfile。

Bug修复

  1. 0.2.0 bug修复较多,请查看如下的修复列表。

English

New Features

  1. Refactored the service layer; the multi-tenant service now supports both tinker/twinkle dual client syntax rules.
  2. Added support for GKD and On-policy distillation — see cookbook.
  3. Replaced the underlying Megatron backend with the mcore_bridge library, with support for corresponding multimodal training.
  4. Added support for the DPO algorithm — see cookbook.
  5. Added support for multimodal task training on the Qwen3.5 series.
  6. Added a server-side Dockerfile.

Bug Fixes

  1. A significant number of bugs have been fixed in 0.2.0 — please refer to the fix list below.

What's Changed

New Contributors

Full Changelog: v0.1.3...v0.2.0

v0.1.3

13 Mar 05:03

Choose a tag to compare

中文版本

新特性

  1. 增加了client模式的便捷安装脚本,并提升了文档描述
  2. 支持transformers分支的ep+fsdp分片

Bug修复

  1. 修复加载本地数据集失败的问题
  2. 修复服务化启动时http_options错误传递到模型的问题

English Version

New features

  1. Add a shell installation script to support the client mode, and improve the description of documentation
  2. Support ep+fsdp sharding of transformers

BugFix

  1. Fix a bug that causes an error on local dataset loading
  2. Fix an error that the http_options argument was mis-transfered to the model in the server mode

What's Changed

Full Changelog: v0.1.2...v0.1.3

v0.1.2

05 Mar 15:42

Choose a tag to compare

中文

新特性

  1. 支持Qwen3.5系列的transformers模型多模态训练,包含图片和视频
  2. 支持数据集预处理的batched=True操作,提升速度

Bug修复

  1. 修复NPU下权重同步卡死的问题

English

New Features

  1. Support multi-modal training of Qwen3.5 transformers framework, containing images and videos
  2. Support batched=True when preprocess datasets

BugFix

  1. Fix the hang problem of NPU weight synchronization

What's Changed

Full Changelog: v0.1.1...v0.1.2

v0.1.1

03 Mar 03:32

Choose a tag to compare

Twinkle 0.1.1 version Release

中文

  • 支持Qwen3.5-2B~Qwen3.5-9B等Dense模型

English

  • Support model series of Qwen3.5-2B~Qwen3.5-9B

Full Changelog: v0.1...v0.11

v0.1

02 Mar 14:26

Choose a tag to compare

中文

Twinkle框架的0.1版本发布!

新功能

  1. 🎉完整的数据集、DataLoader、Loss、Transformers和Megatron模型、Advantage、Sampler等组件的支持
  2. 🎉支持PT、SFT、RL等多种训练Stage,并支持单卡、多机多卡、Ray、Client-Server等多种训练模式
  3. 🎉支持了首版的多租户复用训练,并完整开源了server端实现。使用ray serve实现了多副本可扩缩容部署,并支持粘滞路由
  4. 🎉在魔搭官方网站上,提供了在线服务,用户可以使用该服务免费训练Qwen/Qwen3-30B-A3B-Instruct-2507,并推送模型到ModelHub上

English

Twinkle Framework Version 0.1 Released!

New Features

  1. 🎉 Full support for components including Dataset, DataLoader, Loss, Transformers and Megatron models, Advantage, Sampler, and more
  2. 🎉Support for multiple training stages such as PT, SFT, and RL, with various training modes including single-GPU, multi-node multi-GPU, Ray, and Client-Server
  3. 🎉 First version of multi-tenant shared training is now supported, with the server-side implementation fully open-sourced. Multi-replica scalable deployment is implemented using Ray Serve, with support for sticky routing
  4. 🎉 An online service is now available on the ModelScope official website, where users can train Qwen/Qwen3-30B-A3B-Instruct-2507 for free and push models to ModelHub

What's Changed

New Contributors

Full Changelog: https://github.com/modelscope/twinkle/commits/v0.1