[WIP] feat: add functional MiniMax-M2.5 baseline#1064
Draft
QwertyJack wants to merge 1 commit intojd-opensource:mainfrom
Draft
[WIP] feat: add functional MiniMax-M2.5 baseline#1064QwertyJack wants to merge 1 commit intojd-opensource:mainfrom
QwertyJack wants to merge 1 commit intojd-opensource:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
这个 PR 引入了 MiniMax-M2.5 模型的功能基线,包含了大量的代码添加和修改,以支持该模型在 xLLM 上的适配。整体来看,这次改动非常全面,从构建系统、环境设置到核心的分布式运行时、模型层实现都做了相应的调整。特别是针对 NPU 环境下的多卡稳定性和正确性进行了诸多修复和优化,例如改进了进程组的创建方式、修复了 ParallelArgs 拷贝导致的崩溃、以及对 ACL graph 捕获逻辑的调整等,这些都体现了对细节的深入思考。
我主要关注了代码的可维护性和安全性,并提出了几点建议:
- 在
CMakeLists.txt和env.py中,避免硬编码 Ascend 驱动路径,建议使用环境变量以提高可移植性。 - 在新增的
minimax_compare_modules.py工具中,torch.load的使用存在安全风险,建议开启weights_only=True选项。
这些修改将有助于提升代码的健壮性和安全性。总体而言,这是一次高质量且内容丰富的提交。
Comment on lines
+347
to
+355
| /usr/local/Ascend/driver/lib64/driver | ||
| /usr/local/Ascend/driver/lib64/common | ||
| $ENV{NPU_HOME_PATH}/opp/vendors/xllm/op_api/lib/ | ||
| $ENV{XLLM_KERNELS_PATH}/lib/ | ||
| ) | ||
| add_link_options( | ||
| -Wl,-rpath-link,$ENV{PYTORCH_INSTALL_PATH}/../torch.libs | ||
| -Wl,-rpath-link,/usr/local/Ascend/driver/lib64/driver | ||
| -Wl,-rpath-link,/usr/local/Ascend/driver/lib64/common |
Contributor
Comment on lines
+105
to
+106
| "/usr/local/Ascend/driver/lib64/driver" + ":" + \ | ||
| "/usr/local/Ascend/driver/lib64/common" + ":" + \ |
Contributor
tools/minimax_compare_modules.py
Outdated
| ) | ||
|
|
||
| if suffix in {".pt", ".pth", ".bin"}: | ||
| obj = torch.load(str(path), map_location="cpu", weights_only=False) |
Contributor
There was a problem hiding this comment.
torch.load 在 weights_only=False 的情况下使用存在安全风险,因为它会执行任意代码。考虑到这个工具可能会处理来自不同来源的文件,这是一个潜在的安全漏洞。强烈建议将 weights_only 设置为 True 以降低此风险。如果文件包含的不仅仅是权重,请考虑使用更安全的格式(如 safetensors)来保存和加载所有相关文件。
Suggested change
| obj = torch.load(str(path), map_location="cpu", weights_only=False) | |
| obj = torch.load(str(path), map_location="cpu", weights_only=True) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MiniMax-M2.5 在 xLLM 上的 CANN 8.5 基线进展
一、当前结论
本 PR 现在以 CANN 8.5 / torch-npu 2.7.1 上可复现、可启动、可服务的 MiniMax-M2.5 基线为主,而不再只是早期 bring-up 过程记录。
当前已经验证的基线是:
MiniMax-M2.5/models/MiniMax-M2.5-bf16dp=1, ep=1, attn_tp=16, moe_tp=16XLLM_MINIMAX_NATIVE_DECODE_ATTN=1XLLM_MINIMAX_NATIVE_DECODE_MOE=1XLLM_MINIMAX_EP_MOE_REFERENCE=0--enable_graph=true--enable_chunked_prefill=true--enable_prefix_cache=true1024tokens,decode bucket 预热1/2/4/8/16在这套配置下,服务已经能够稳定启动、完成 ACL graph warmup,并正常对外提供
http://127.0.0.1:18994/v1的 OpenAI-compatible 接口。二、本 PR 已完成的核心工作
1. MiniMax 专属执行路径
rotary_dim=64只作用于真实 rotary slicesigmoid(router_logits)e_score_correction_bias仅用于 expert choice--reasoning_parser auto能识别<think>...</think>。2. CANN 8.5 / torch-npu 2.7.1 基线补齐
fp8 -> bf16转换工具:tools/dequant_minimax_fp8.py。run/mm.sh:build/xllm/core/server/xllm早期“fresh build 启动即 crash”并不是坏二进制,而是后台 workers 必须从持久 shell 启动;否则测试 harness 可能在日志落盘前回收后台进程,造成假崩溃信号。
3. 基线验证结果
在最新 CANN 8.5 容器中,使用 fresh build server binary + bf16 checkpoint +
tp=16基线,已经验证:1/2/4/8/16都被预捕获18994正常启动最近一次实测日志(
logs/persist_test/node_0.log)中的单请求指标:ttft ~= 5.2-5.9savg tpot ~= 18.0-18.5msgeneration speed ~= 54.7-55.9 tok/s这说明当前 PR 已经具备一个可复现、可测量、可继续优化的 MiniMax CANN 8.5 基线。
三、仍然存在的已知问题
1. 输出仍保留 reasoning trace
当前 API 路径已经稳定,但对于 reasoning model,请求返回中仍可能包含原始
<think>内容,短 token budget 下也容易先耗尽在 thinking phase,出现finish_reason=length。这不是当前基线的启动/算子正确性阻塞问题,但仍需要在 response path 上继续清理。
2.
dp=2, ep=2不是当前基线当前 CANN 8.5 的稳定基线是 pure TP16。
dp=2, ep=2的 grouped EP MoE 路径在新软件栈上仍需要重新 debug。之前旧容器里的 grouped-path 修复结论不能直接沿用到当前CANN 8.5 / torch-npu 2.7.1环境。3. 直接 fp8 运行路径仍待重新确认
当前推荐运行方式仍是离线转换后的 bf16 checkpoint。
fp8 原始 checkpoint 在新栈上的直接运行路径还需要单独 re-validate,避免把 loader / dequant / runtime path 的问题与当前已验证的 bf16 基线混在一起。
四、下一步计划
下一阶段工作围绕这条新基线展开:
dp=2, ep=2grouped EP MoE 路径。tp=16稳定基线上继续做 decode 性能优化。五、PR 定位
因此,这个 PR 当前的定位是:
dp/ep路径重新收敛