Support for MXFP8 All gather #2160
avizon-aws
started this conversation in
General
Replies: 2 comments 2 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@avizon-aws added this support in pytorch/ao#3435. Thanks for your work on this! Let me know if there's any additional questions or comments on this, otherwise closing this issue for now |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I noticed that torch titan currently does not support MXFP8 AllGather, however, it does support FP8 All Gather. Is there a timeline for the support for MXFP8 All gather? (MXFP8 AG is supported in torchao)
I also looked into some implementation details for enabling MXFP8 AG, seems like we would need to update the desired_input_layouts for the sharding strategy of the model layers as shown here, i.e. change Replicate -> Shard, but I expect more changes to follow, this was just a high level overview.
Edit: I took a deeper look and there are more changes involved to get MXFP8 All Gather working, i did a POC for this and got it to work. Started discussion with AO team to get the code merged in.
Beta Was this translation helpful? Give feedback.
All reactions