Skip to content

docs(readme): update convergence table, latest news, and outdated links#2638

Open
sbhavani wants to merge 7 commits intoNVIDIA:mainfrom
sbhavani:fix/readme-updates
Open

docs(readme): update convergence table, latest news, and outdated links#2638
sbhavani wants to merge 7 commits intoNVIDIA:mainfrom
sbhavani:fix/readme-updates

Conversation

@sbhavani
Copy link
Copy Markdown
Collaborator

@sbhavani sbhavani commented Feb 1, 2026

Description

Updates README add missing format support documentation, update news section, and fix broken/outdated links.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Add MXFP8 and NVFP4 format support to highlights and description
  • Update FP8 convergence table with MXFP8 results from arxiv paper
  • Remove outdated JAX Toolbox links and "available on request" entries
  • Update Docker container versions to 26.01
  • Fix DeepSpeed and Lightning integration links
  • Add Nemotron 3 paper to Latest News
  • Add quickstart notebook link after PyTorch example

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 1, 2026

Greptile Summary

This documentation-only PR updates README.rst with MXFP8/NVFP4 format mentions in the description and highlights, adds two new convergence table rows for Blackwell results sourced from an arxiv paper, fixes stale integration links (DeepSpeed org rename, Lightning stable docs URL), bumps NGC Docker container versions to 26.01, and adds a Nemotron 3 news entry. Previously raised concerns (missing quickstart.ipynb target file, extra whitespace before "Megatron Core" in the convergence table) are already tracked in prior review threads.

Confidence Score: 5/5

Safe to merge; all blocking issues from prior threads should be resolved before the notebook link goes live.

Documentation-only PR with accurate, well-scoped changes. The only open issues (missing quickstart.ipynb file and table whitespace) are already tracked in existing review threads and are P2 in nature — they do not block the informational value of these updates.

No new files require special attention beyond what is already tracked in prior review threads.

Important Files Changed

Filename Overview
README.rst Documentation update adding MXFP8/NVFP4 mentions, new convergence rows from arxiv, fixed DeepSpeed/Lightning links, updated Docker container versions to 26.01, and Nemotron 3 news entry; quickstart.ipynb link target is still missing (tracked in prior threads)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[README.rst PR Changes] --> B[Latest News\nAdd Nemotron 3 entry - 12/2025]
    A --> C[Overview & Highlights\nAdd MXFP8 / NVFP4 mentions]
    A --> D[Examples Section\nAdd quickstart.ipynb link]
    A --> E[Installation / Docker\nUpdate NGC containers to 26.01]
    A --> F[Convergence Table\nAdd LLM-8B and MoE-16B MXFP8 rows]
    A --> G[Integrations\nFix DeepSpeed org URL and Lightning docs link]
    D -.->|quickstart.ipynb does not exist| H[404 - tracked in prior threads]
Loading

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

Reviews (9): Last reviewed commit: "fix(readme): update convergence section,..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

README.rst Outdated
loss = out.sum()
loss.backward()

For a tutorial with more details, see the `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced quickstart.ipynb file does not exist in docs/examples/. The actual notebooks in that directory are fp8_primer.ipynb, advanced_optimizations.ipynb, and te_jax_integration.ipynb. Consider using one of these existing notebooks or creating the quickstart notebook before merging.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

README.rst Outdated
Comment on lines +354 to +358
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| GPT-22B | NeMo Framework | Available on request |
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra whitespace in "Megatron Core" framework names - inconsistent with other rows

Suggested change
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| GPT-22B | NeMo Framework | Available on request |
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |
| LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 |
+------------+------------------+---------------------------------------------------------------------------------------------------------+
| MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 |

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 10, 2026

Additional Comments (1)

README.rst
Broken reST link markup

This link uses ...__(double-underscore), while the rest of the README uses..._. In reStructuredText, __ denotes an anonymous hyperlink reference and can render incorrectly here.

* `Stable and Scalable FP8 Deep Learning Training on Blackwell | GTC 2025 <https://www.nvidia.com/en-us/on-demand/session/gtc24-s62457/>`_

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 17, 2026

Additional Comments (1)

README.rst
MXFP8 results not reflected in section title or intro

The PR description states the convergence table is being "updated with MXFP8 results from arxiv paper" (arxiv.org/abs/2506.08027), and entries for LLM-8B and MoE-16B via Megatron Core are cited from that paper. However, the section heading remains FP8 Convergence and the introductory paragraph only asserts convergence for FP8 vs BF16:

"FP8 has been tested extensively... no significant difference between FP8 and BF16 training loss curves."

If the new entries represent MXFP8 convergence (not standard FP8), readers may be misled about what precision those Megatron Core results actually demonstrate. Consider either:

  • Adding a sentence to the intro noting MXFP8 convergence has also been validated, or
  • Adding a "Format" column to the table to distinguish FP8 from MXFP8 entries.

sbhavani added 5 commits March 2, 2026 17:55
…t info

- Add MXFP8 and NVFP4 format support to highlights and description
- Update FP8 convergence table with MXFP8 results from arxiv paper
- Remove outdated JAX-Toolbox links and "available on request" entries
- Update Docker container versions to 26.01
- Fix DeepSpeed and Lightning integration links
- Add Nemotron 3 paper to Latest News
- Add quickstart notebook link after PyTorch example

Signed-off-by: Santosh Bhavani <[email protected]>
- Replace quickstart.ipynb link with fp8_primer.ipynb (file exists)
- Fix extra whitespace in Megatron Core table rows

Signed-off-by: Santosh Bhavani <[email protected]>
@sbhavani sbhavani force-pushed the fix/readme-updates branch from f0bd464 to c391c17 Compare March 3, 2026 01:55
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Santosh Bhavani <[email protected]>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Additional Comments (1)

README.rst, line 398
FP8 Convergence section doesn't reflect MXFP8 entries

The section title and its introductory description only mention FP8, but the two new rows (LLM-8B and MoE-16B with Megatron Core) are sourced from a paper described in the PR as "MXFP8 results." This makes the table potentially misleading — a reader looking for MXFP8 convergence validation won't find the section, while a reader expecting the table to cover only standard FP8 may be confused by entries from a different format.

Consider either:

  1. Renaming the section to "FP8 / MXFP8 Convergence" and updating the opening sentence to mention MXFP8, or
  2. Adding a Format column to distinguish FP8 rows from MXFP8 rows so readers can quickly tell what each entry validates.

For example, a quick fix to the section header and description:

FP8 / MXFP8 Convergence
========================

FP8 and MXFP8 have been tested extensively across different model architectures and configurations and we found **no significant difference** between FP8/MXFP8 and BF16 training loss curves. FP8 and MXFP8 have also been validated for accuracy on downstream LLM tasks (e.g. LAMBADA and WikiText). Below are examples of models tested for convergence across different frameworks.

@sbhavani sbhavani requested a review from ptrendx March 10, 2026 16:23
@sbhavani sbhavani requested a review from ksivaman April 7, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants