[Cherry-Pick][RL] cherry-pick #7218 support moe-topk use topk_reduce_func by zoooo0820 · Pull Request #7217 · PaddlePaddle/FastDeploy

zoooo0820 · 2026-04-07T09:49:59Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-07T09:50:08Z

Thanks for your contribution!

codecov-commenter · 2026-04-07T11:43:54Z

Codecov Report

❌ Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.5@c735f76). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/layers/moe/moe.py	76.92%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.5    #7217   +/-   ##
==============================================
  Coverage               ?   68.48%           
==============================================
  Files                  ?      390           
  Lines                  ?    54372           
  Branches               ?     8574           
==============================================
  Hits                   ?    37236           
  Misses                 ?    14451           
  Partials               ?     2685

Flag	Coverage Δ
GPU	`68.48% <80.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fastdeploy-bot

🤖 AI Code Review | 2026-04-08

📋 Review 摘要

PR 概述：Cherry-Pick PR，从 #7218 支持 MoE topk 使用自定义 topk_reduce_func 进行归一化

变更范围：fastdeploy/model_executor/layers/moe/、fastdeploy/model_executor/models/glm4_moe.py

影响面 Tag：[RL] [OP]

📝 PR 规范检查

PR 标题格式符合 Cherry-Pick 规范，包含 [Cherry-Pick] 标签、[RL] tag 和原 PR ID #7218。

问题

未发现阻塞性问题。

总体评价

代码实现正确，Cherry-Pick 在 5 个文件中保持一致性。新增的 topk_reduce_func 参数允许模型（如 GLM4-Moe）自定义 MoE topk 权重的归一化方式，仅在 FD_USE_PHI_MOE_TOPK=True 环境下生效。所有调用链（ep.py、fused_moe_cutlass_backend.py、fused_moe_deepgemm_backend.py）都正确传递了该参数，并使用 getattr(layer, "topk_reduce_func", None) 保证向后兼容性。

fastdeploy-bot

🤖 AI Code Review | 2026-04-08

📋 Review 摘要

PR 概述：Cherry Pick PR，将 #7218 的修改 pick 到 release/2.5 分支，支持 moe-topk 使用 topk_reduce_func 参数进行归一化。

变更范围：model_executor/layers/moe/、model_executor/models/glm4_moe.py、tests/operators/test_noaux_tc_redundant.py

影响面 Tag：[RL] [OP] [Models]

📝 PR 规范检查

PR 描述缺少以下内容，请完善：

Modifications 部分：请详细描述本次代码变更的具体内容
Checklist：请根据实际情况勾选以下选项：
- Format your code, run pre-commit before commit.
- Add unit tests. Please write the reason in this PR if no unit tests.

问题

级别	文件	概述
🟡 建议	`tests/operators/test_noaux_tc_redundant.py:150`	直接修改 `os.environ` 可能影响并发测试

总体评价

代码变更逻辑正确，topk_reduce_func 参数的传递和使用链路完整。在 FD_USE_PHI_MOE_TOPK 模式下，通过 topk_reduce_func 在外部进行归一化，避免了在 CUDA kernel 中的复杂计算，设计合理。

删除 moe_topk_select 函数并统一使用 ep_runner.moe_select 简化了代码结构，提高了可维护性。测试用例验证了新功能在各种参数配置下的正确性。

仅存在一个轻微的代码质量问题：测试代码直接修改全局环境变量，建议使用 unittest.mock.patch 改进。

tests/operators/test_noaux_tc_redundant.py

fastdeploy-bot

📋 Review 摘要

PR 概述：为 MoE 层添加 topk_reduce_func 参数，支持在 FD_USE_PHI_MOE_TOPK=1 时自定义 topk 值的归一化方式，用于训练对齐。

变更范围：fastdeploy/model_executor/layers/moe/、models/glm4_moe.py、tests/operators/

影响面 Tag：[RL] [OP]

PR 规范检查

❌ 标题符合 [Cherry-Pick][Tag](#id) 格式
❌ 描述未填写 Motivation/Modifications/Usage 等必需章节

标题建议（当前已符合规范）：

[Cherry-Pick][RL] cherry-pick #7218 support moe-topk use topk_reduce_func

描述模板（建议补充）：

## Motivation
添加 `topk_reduce_func` 参数以支持自定义 topk 值的归一化方式，主要用于训练对齐场景。当 `FD_USE_PHI_MOE_TOPK=1` 时，可通过此参数自定义归约函数（如 sum、mean、max 等）。

## Modifications
1. 在 `get_moe_scores` 函数添加 `topk_reduce_func` 参数
2. 在 `FusedMoE.__init__` 添加 `topk_reduce_func` 参数并存储为实例属性
3. 在 EP/Cutlass/DeepGemm backend 中传递该参数
4. 删除 `fused_moe_deepgemm_backend.py` 中的重复 `moe_topk_select` 函数
5. 在 GLM4-MoE 模型中传入默认的 topk_reduce_func
6. 更新测试文件验证 PHI MoE topk 实现的正确性

问题

级别	文件	概述
🟡 建议	`tests/operators/test_noaux_tc_redundant.py:161`	测试环境变量恢复方式不安全
🟡 建议	整体变更	测试覆盖不完整（缺 `FD_USE_PHI_MOE_TOPK=0` 默认路径测试）

总体评价

代码实现正确，topk_reduce_func 参数的引入为训练对齐提供了必要的灵活性。核心逻辑清晰，各 backend 正确传递了新参数。建议改进测试环境变量管理方式以避免影响其他测试用例。

tests/operators/test_noaux_tc_redundant.py

fastdeploy-bot

🤖 AI Code Review | 2026-04-08

📋 Review 摘要

PR 概述：Cherry-Pick PR，支持 MoE topk 使用自定义 topk_reduce_func 归一化函数
变更范围：model_executor/layers/moe/、model_executor/models/glm4_moe.py、tests/operators/
影响面 Tag：[RL] [OP]

📝 PR 规范检查

✅ 标题包含有效 Tag: [RL] [Cherry-Pick]
✅ Cherry-Pick 格式正确
⚠️ 描述未填写 Motivation 和 Modifications 章节（Checklist 未完成）

标题建议（可直接复制）：

[RL] [Cherry-Pick] support moe-topk use topk_reduce_func(#7218)

描述模板（可直接复制）：

## Motivation
支持自定义 topk 归约函数，使得在 FD_USE_PHI_MOE_TOPK 环境下可以灵活控制 topk 值的归一化方式。

## Modifications
1. 在 `get_moe_scores` 函数中添加 `topk_reduce_func` 参数
2. 在 `FusedMoE` 类 `__init__` 中添加 `topk_reduce_func` 参数
3. 更新各 backend (cutlass, deepgemm, ep) 传递该参数
4. 在 GLM4MoE 模型中传入默认的 `topk_reduce_func`
5. 更新测试用例以支持新参数

问题

级别	文件	概述
🟡 建议	`moe.py:133`	`topk_reduce_func` 返回值可能接近 0 时存在数值稳定性风险

如无问题，写"未发现阻塞性问题。"

总体评价

代码变更整体合理，通过添加 topk_reduce_func 参数允许自定义归一化逻辑。默认行为保持不变（使用 sum + 1e-20），且只有在使用 FD_USE_PHI_MOE_TOPK 环境变量时才会使用该函数。删除了重复的 moe_topk_select 函数，统一使用 get_moe_scores，代码更简洁。测试覆盖了新旧两种模式，但存在一个数值稳定性风险。

fastdeploy/model_executor/layers/moe/moe.py

fastdeploy-bot

🤖 AI Code Review | 2026-04-08 12:12 CST

📋 Review 摘要

PR 概述：Cherry-pick #7218，支持 MoE topk 使用自定义 topk_reduce_func 归一化函数

变更范围：model_executor/layers/moe/、model_executor/models/glm4_moe.py、tests/operators/

影响面 Tag：[RL] [OP]

📝 PR 规范检查

PR 描述未填写，Motivation 和 Modifications 字段为空。

描述模板（可直接复制）：

## Motivation
Cherry-pick #7218 to release/2.5 分支，支持 MoE topk 使用自定义 reduce 函数。

## Modifications
- 新增 `topk_reduce_func` 参数支持自定义 topk 值归一化函数
- 删除 `fused_moe_deepgemm_backend.py` 中重复的 `moe_topk_select` 函数
- 在 `glm4_moe.py` 中显式传递 `topk_reduce_func` 参数

变更分析

文件	变更类型	说明
`moe.py`	功能增强	新增 `topk_reduce_func` 参数，支持自定义归一化函数
`ep.py`	适配修改	通过 `getattr(layer, "topk_reduce_func", None)` 传递参数
`fused_moe_cutlass_backend.py`	适配修改	传递 `topk_reduce_func` 参数
`fused_moe_deepgemm_backend.py`	代码清理	删除重复的 `moe_topk_select` 函数，统一使用 `get_moe_scores`
`glm4_moe.py`	使用示例	显式传递默认归一化函数
`test_noaux_tc_redundant.py`	测试新增	新增 `FD_USE_PHI_MOE_TOPK=1` 路径测试

问题

未发现阻塞性问题。

总体评价

代码变更逻辑正确，topk_reduce_func 参数有合理默认值，向后兼容。删除重复代码 moe_topk_select 并统一使用 get_moe_scores 是合理的代码清理。测试覆盖了新增功能路径。建议补充 PR 描述。

fastdeploy-bot

🤖 AI Code Review | 2026-04-08 15:53 CST

📋 Review 摘要

PR 概述：Cherry Pick #7218，支持 MoE topk 选择使用自定义 topk_reduce_func 函数

变更范围：model_executor/layers/moe/

影响面 Tag：[RL] [OP]

问题

级别	文件	概述
🔴 Bug	`fastdeploy/model_executor/layers/moe/moe.py:89`	`topk_reduce_func` 参数默认值设计导致内部逻辑分支无法到达

总体评价

代码逻辑整体清晰，统一了不同后端的 topk 选择实现。但 get_moe_scores 函数中 topk_reduce_func 参数的默认值设计存在逻辑缺陷，需要修复。

fastdeploy-bot · 2026-04-08T07:53:54Z

fastdeploy/model_executor/layers/moe/moe.py

    expert_in_rank_num_list: paddle.Tensor = None,
    tokens_per_expert_stats_list: paddle.Tensor = None,
    redundant_ep_rank_num_plus_one: int = 1,
+    topk_reduce_func: Callable = lambda x: x.sum(axis=-1, keepdim=True) + 1e-20,


🔴 Bug topk_reduce_func 参数的默认值设计存在逻辑问题。

问题描述：

函数签名中默认值为 lambda x: x.sum(axis=-1, keepdim=True) + 1e-20（第89行）

函数内部检查 if topk_reduce_func is not None（第133行）

由于默认值是 lambda 函数而非 None，条件永远为 True

导致第136-137行的 else 分支成为死代码

代码片段：

# 第89行 - 默认值是 lambda 函数 topk_reduce_func: Callable = lambda x: x.sum(axis=-1, keepdim=True) + 1e-20, # 第133-137行 - else 分支永远无法到达 if topk_reduce_func is not None: topk_values = topk_values / topk_reduce_func(topk_values) else: # 使用默认的 sum + epsilon <-- 这行代码永远不会执行 topk_values = topk_values / (topk_values.sum(axis=-1, keepdim=True) + 1e-20)

建议修复方式：
将默认值改为 None，在函数内部处理默认逻辑：

def get_moe_scores( ... topk_reduce_func: Callable = None, # 改为 None ) -> paddle.Tensor: ... if envs.FD_USE_PHI_MOE_TOPK: if original_renormalize: if topk_reduce_func is not None: topk_values = topk_values / topk_reduce_func(topk_values) else: # 使用默认的 sum + epsilon topk_values = topk_values / (topk_values.sum(axis=-1, keepdim=True) + 1e-20)

这样既保持了语义清晰，又能正确处理所有调用场景。

fastdeploy-bot

🤖 AI Code Review | 2026-04-09

📋 Review 摘要

PR 概述：Cherry-Pick PR，为 MoE 层新增 topk_reduce_func 参数，统一不同 backend 的 topk 选择逻辑。

变更范围：model_executor/layers/moe/、model_executor/models/glm4_moe.py、tests/operators/

影响面 Tag：[RL] [OP]

📝 PR 规范检查

1. 标题格式问题

标题使用了 [RL] Tag，但根据 PR 内容（MoE 层参数变更），建议使用更精确的 Tag
原始 PR #7218 的标题中使用了 [RL]，作为 Cherry-Pick 保持一致性可以理解

2. 描述信息缺失

Motivation 部分未填写
Modifications 部分未填写
Usage 部分未填写
Checklist 未勾选（虽然作为 Cherry-Pick 可以接受）

建议补充的描述信息：

Motivation：

原 PR #7218 引入了 `topk_reduce_func` 参数，允许用户自定义 MoE topk 归一化函数。
此 Cherry-Pick 将该变更移植到 release/2.5 分支，统一不同 backend 的实现逻辑。

Modifications：

1. 在 `FusedMoE.__init__` 和 `get_moe_scores` 中新增 `topk_reduce_func` 参数
2. 删除 `fused_moe_deepgemm_backend.py` 中重复的 `moe_topk_select` 函数
3. 统一 `ep.py`、`fused_moe_cutlass_backend.py`、`fused_moe_deepgemm_backend.py` 中对 `get_moe_scores` 的调用
4. 在 `glm4_moe.py` 中传递 `topk_reduce_func` 参数
5. 更新测试用例使用新的参数传递方式

问题

级别	文件	概述
🟡 建议	-	PR 描述信息不完整，建议补充 Motivation 和 Modifications

总体评价

代码实现逻辑正确，通过引入 topk_reduce_func 参数有效统一了不同 backend 的 topk 选择实现，消除了代码重复。默认值设置合理（lambda x: x.sum(axis=-1, keepdim=True) + 1e-20），向后兼容性良好。测试用例已更新并覆盖新参数。主要问题为 PR 描述信息缺失，建议补充。

support moe-topk use topk_reduce_func

42deaf7

zoooo0820 had a problem deploying to Metax_ci April 7, 2026 09:50 — with GitHub Actions Failure

zoooo0820 changed the title ~~support moe-topk use topk_reduce_func~~ [Cherry-Pick][RL] cherry-pick #7218 support moe-topk use topk_reduce_func Apr 7, 2026

fix ep error

c93ba72

zoooo0820 had a problem deploying to Metax_ci April 7, 2026 12:48 — with GitHub Actions Failure