Skip to content

Refactor spec modification/introspection to make references to Submodules typed#2834

Closed
nschank wants to merge 8 commits intoNVIDIA:mainfrom
nschank:submodules
Closed

Refactor spec modification/introspection to make references to Submodules typed#2834
nschank wants to merge 8 commits intoNVIDIA:mainfrom
nschank:submodules

Conversation

@nschank
Copy link
Contributor

@nschank nschank commented Jan 6, 2026

What does this PR do ?

In order to safely refactor Submodules classes, I want to make sure I can easily find everywhere those classes are being referenced. This updates every instance I could find where ModuleSpec submodules are being inspected or modified, and either uses cast or uses a typed helper method to ensure that searching for references/usages of a field will consistently find them.

Relevant design doc: https://docs.google.com/document/d/1shyv0iKEzRdevLOlouF_NktbdJazvWifqxUwPXFigQE/edit?tab=t.0#heading=h.uwes2zo47yg6

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @megatron-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either [email protected] or [email protected].

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

@nschank nschank requested review from a team as code owners January 6, 2026 17:38
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@nschank nschank requested a review from Skylion007 January 7, 2026 23:00
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Jan 11, 2026
@yashaswikarnati
Copy link
Contributor

@chtruong814 @ko3n1g Mostly changes in tests - could you help take a look. thank you!

@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Jan 12, 2026
@nschank
Copy link
Contributor Author

nschank commented Jan 15, 2026

Updated to fix conflicts.

Anyone mind taking a look? This should be pretty uncontroversial (except maybe the format of the gpt_layer_specs changes), but merge conflicts are going to increasingly be a pain haha

@Phlip79 Phlip79 requested review from a team and removed request for Phlip79 and Skylion007 January 16, 2026 03:07
@Phlip79
Copy link
Member

Phlip79 commented Jan 16, 2026

@NVIDIA/mcore-oncall

@nschank
Copy link
Contributor Author

nschank commented Jan 23, 2026

Fixed merge conflicts

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Jan 25, 2026
@nschank
Copy link
Contributor Author

nschank commented Jan 29, 2026

I added a nice little helper inspired by @Skylion007 to make the gpt_layer_spec methods keep all their many arguments during type checking. It's optional tho, I can drop it if there are concerns about it!

@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Jan 29, 2026
@nschank
Copy link
Contributor Author

nschank commented Jan 30, 2026

Just noting the other places I found where copy_signature will come in handy! nschank#2

if self.sequence_parallel_lm or self.context_parallel_lm > 1:
if not language_model_type.startswith('nemotron5-hybrid'):
attn_module = language_transformer_layer_spec.submodules.self_attention
assert isinstance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trintamaki @parthmannan do you have any concerns with these asserts here? can we expect the language model to be always uses mcore specs or do you also use HF models directly like for the vision encoder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are concerns, I'm happy to switch back to cast instead (which only affects type checking and won't do anything at runtime), just let me know.

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Feb 1, 2026
@yashaswikarnati yashaswikarnati added the Expert Review Apply this label to indicate that your PR is ready for expert review. label Feb 2, 2026
@nschank
Copy link
Contributor Author

nschank commented Feb 4, 2026

This has gotten no traction, so I'm going to split this up to make it easier to review. #3255 is the most interesting subset, will follow with a few one liners and finally do the tests as a last step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request complexity: high Expert Review Apply this label to indicate that your PR is ready for expert review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants