【FSDP】Add FSDP in dynamic by Xing-lil · Pull Request #78577 · PaddlePaddle/Paddle

Xing-lil · 2026-04-03T07:08:52Z

PR Category

Distributed Strategy

PR Types

New features

Description

add FSDP

是否引起精度变化

否

paddle-bot · 2026-04-03T07:09:07Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

liym27 · 2026-04-08T11:59:08Z

python/paddle/distributed/fleet/meta_optimizers/dygraph_optimizer/hybrid_parallel_optimizer.py

        # Note: Only sharding stage 1 is considered in HybridParallelOptimizer.
        # The sharding stage2 and stage3 optimizers are invoked in other api.
-        if hcg.get_sharding_parallel_world_size() > 1:
+        if hcg.get_sharding_parallel_world_size() > 1 and False:


这里为什么用 and False ？

临时修改，已修复

liym27 · 2026-04-08T11:59:38Z

python/paddle/optimizer/optimizer.py

                if isinstance(params_grads, list):
-                    if self._grad_clip is not None:
-                        params_grads = self._grad_clip(params_grads)
+                    # if self._grad_clip is not None:


这里为什么注释掉 grad_clip?

临时修改，已修复

liym27 · 2026-04-08T12:01:05Z

python/paddle/distributed/fsdp/fully_shard_fusion.py

+            paddle.device.cuda.empty_cache()
+
+            curr_rank = paddle.distributed.get_rank()
+            world_size = paddle.distributed.get_world_size()


这里应该是获取 world_size 还是 sharding group size

仅开fsdp的情况不影响，现已修改为使用 sharding group。

liym27 · 2026-04-08T12:03:45Z

python/paddle/distributed/fsdp/fully_shard_fusion.py

+        self, model, mesh=None, fsdp_unit_layers=None, moe_layers_name=None
+    ):
+        self.model = model
+        self.mesh = None


self.mesh 被强制设置为 None，这符合预期吗？

已做修改，动手下不需要使用mesh

liym27 · 2026-04-08T12:04:42Z

python/paddle/distributed/fsdp/fully_shard_fusion.py

+        ctx.layer = layer
+        ctx.comm_manager = comm_manager
+        ctx.recursive = recursive
+        return inputs


FusionBackwardHook.forward 有 return inputs if len(inputs) > 1 else inputs[0]，这里不一致可能导致下游层收到意外的 tuple

已修改，感谢！

liym27 · 2026-04-08T12:05:43Z

python/paddle/distributed/fsdp/fully_shard_fusion.py

+class FSDPBufferManager:
+    def __init__(self, model, mesh, fsdp_unit_layers=None):
+        self.model = model
+        # self._fsdp_group = mesh.get_group("dp")


存在一些注释代码，如L209，L216，L240等

已清理，感谢！

add_fsdp

c0adc0b

update

90d00e0

liym27 reviewed Apr 8, 2026

View reviewed changes

fix review

de14cc5

Conversation

Xing-lil commented Apr 3, 2026

PR Category

PR Types

Description

是否引起精度变化

Uh oh!

paddle-bot bot commented Apr 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xing-lil Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Xing-lil Apr 9, 2026 •

edited

Loading