Update AFMoE architecture to use v5-style MoE impl#44063
Update AFMoE architecture to use v5-style MoE impl#44063AutumnAurelium wants to merge 11 commits intohuggingface:mainfrom
Conversation
ArthurZucker
left a comment
There was a problem hiding this comment.
sounds good thanks for updating!
|
Any update on getting this merged? Fixed problems mentioned above. |
|
run-slow: afmoe |
|
confirmed that model trains in axolotl, as well as loads experts as expected; |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM just let's leverage modular in that case the MOE is standard can be inherited!
| return final_hidden_states | ||
|
|
||
|
|
||
| class AfmoeMoE(nn.Module): |
There was a problem hiding this comment.
pretty sure you can now inherit this from another class! can you try ? 🤗
There was a problem hiding this comment.
@ArthurZucker Should this also be named AfmoeSparseMoeBlock for consistency?
|
[For maintainers] Suggested jobs to run (before merge) run-slow: afmoe |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
This brings the Arcee AFMoE architecture in line with other MoE models' implementation patterns since v5. It also adds integration testing using Trinity Nano.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker @Cyrilvallez