Skip to content

【FSDP】Add FSDP in dynamic#78577

Open
Xing-lil wants to merge 3 commits intoPaddlePaddle:developfrom
Xing-lil:add_fsdp_dy
Open

【FSDP】Add FSDP in dynamic#78577
Xing-lil wants to merge 3 commits intoPaddlePaddle:developfrom
Xing-lil:add_fsdp_dy

Conversation

@Xing-lil
Copy link
Copy Markdown
Contributor

@Xing-lil Xing-lil commented Apr 3, 2026

PR Category

Distributed Strategy

PR Types

New features

Description

add FSDP

是否引起精度变化

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 3, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

# Note: Only sharding stage 1 is considered in HybridParallelOptimizer.
# The sharding stage2 and stage3 optimizers are invoked in other api.
if hcg.get_sharding_parallel_world_size() > 1:
if hcg.get_sharding_parallel_world_size() > 1 and False:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么用 and False ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

临时修改,已修复

if isinstance(params_grads, list):
if self._grad_clip is not None:
params_grads = self._grad_clip(params_grads)
# if self._grad_clip is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么注释掉 grad_clip?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

临时修改,已修复

paddle.device.cuda.empty_cache()

curr_rank = paddle.distributed.get_rank()
world_size = paddle.distributed.get_world_size()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该是获取 world_size 还是 sharding group size

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

仅开fsdp的情况不影响,现已修改为使用 sharding group。

self, model, mesh=None, fsdp_unit_layers=None, moe_layers_name=None
):
self.model = model
self.mesh = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.mesh 被强制设置为 None,这符合预期吗?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已做修改,动手下不需要使用mesh

ctx.layer = layer
ctx.comm_manager = comm_manager
ctx.recursive = recursive
return inputs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FusionBackwardHook.forward 有 return inputs if len(inputs) > 1 else inputs[0],这里不一致可能导致下游层收到意外的 tuple

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,感谢!

class FSDPBufferManager:
def __init__(self, model, mesh, fsdp_unit_layers=None):
self.model = model
# self._fsdp_group = mesh.get_group("dp")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

存在一些 注释代码,如L209,L216,L240等

Copy link
Copy Markdown
Contributor Author

@Xing-lil Xing-lil Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已清理,感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants