Add CLAUDE.md with AI agent instructions and quick reference by supriyar · Pull Request #4195 · pytorch/ao

supriyar · 2026-03-27T21:17:41Z

Stack from ghstack (oldest at bottom):

Replace empty placeholder with structured documentation for AI coding
assistants (Claude Code, Cursor, Copilot). Includes config class table,
granularity reference, deprecated API warnings, and pointers to in-repo
docs for architecture details.

Comparison: Old CLAUDE.md vs New CLAUDE.md
Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval
Setup:

Subject model: Claude Sonnet
Judge model: Claude Opus

Sonnet/Opus Results (61 prompts, final)

Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. [ghstack-poisoned]

pytorch-bot · 2026-03-27T21:17:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4195

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 10 Pending

As of commit 15beca9 with merge base 79159f2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. ghstack-source-id: e0fe747 Pull Request resolved: #4195

Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Setup: - Subject model: Claude Haiku (weaker model, more likely to show improvement from context) - Judge model: Claude Sonnet (scores responses 0-3 against rubrics) - Prompts: 48 questions across 12 categories (getting started, config classes, float8 training, QAT, sparsity, optimizers, architecture, integrations, development, use cases, gotchas, comparisons) <img width="428" height="140" alt="image" src="https://github.com/user-attachments/assets/b0c20539-aa69-49d2-a03f-4943508f62e2" /> Prompts that went from 0 to 3 (completely wrong to perfect): - MXFP8 dense training, MXFP8 MoE training, NVFP4 inference - all 0->3 - ExecuTorch, sparsity, PyTorch version, config comparison - all 0->3 - Int4WeightOnlyConfig vs Int8DynamicActivation difference - 0->3 - torchao vs bitsandbytes comparison - 1->3 +32% improvement, wrong answers dropped from 13 to 2. The new CLAUDE.md has the biggest impact on architecture questions, MX/NVFP4 formats, and comparison questions - the areas where the old empty CLAUDE.md gave the model nothing [ghstack-poisoned]

vkuzo · 2026-03-31T13:29:54Z

CLAUDE.md

+pytest test/prototype/mx_formats/
+```
+
+## Coding Style


I think we should set up hooks for running pre-commit/linters: https://code.claude.com/docs/en/hooks, this gets code style for free without spending tokens

you mean pre-commit run? we do have these I think

I think Vasiliy is referring to claude code hooks for auto-format on edit? That can be a follow-up PR.

Will remove lint instructions from here

vkuzo · 2026-03-31T13:30:29Z

CLAUDE.md

+
+## Commit Messages
+
+- Do not commit without explicit request from the user


I have this in my personal CLAUDE.md, but this seems specific to personal preference? maybe leave out of repo-wide one?

had originally added this since I saw it in the pytorch/pytorch claude.md file. vLLM also has similar instructions in its repo for agent authored commits

if pytorch and vllm have it, makes sense, thanks

CLAUDE.md

vkuzo · 2026-03-31T13:40:01Z

+32% improvement, wrong answers dropped from 13 to 2. The new CLAUDE.md has the biggest impact on architecture questions, MX/NVFP4 formats, and comparison questions - the areas where the old empty CLAUDE.md gave the model nothing

I think this is a good way to eval. Can we check the eval into source control so it's easy to iterate on and measure future improvements? Doesn't have to be in torchao, can be in separate repo if that's easier.

vkuzo · 2026-03-31T13:42:29Z

also, for the eval, should we just use Opus 4.6 for everything? IMO better to optimize for the best available model, and even doing this things can go out of date really quickly. I'm not sure it's worth spending a lot of time on evals of older models

CLAUDE.md

supriyar · 2026-03-31T18:47:11Z

also, for the eval, should we just use Opus 4.6 for everything? IMO better to optimize for the best available model, and even doing this things can go out of date really quickly. I'm not sure it's worth spending a lot of time on evals of older models

Sonnet/Opus Results (61 prompts, final)

I ran both Opus/Opus and Sonnet/Opus on 61 prompts. The issue with Opus as subject is it times out on ~26% of prompts (16/61) with the 56K char system prompt, even with 180s timeout. Opus generates verbose responses that exceed the time limit. timeouts score 0 and corrupt the data.

Sonnet as subject had zero timeouts and scored comparably on the prompts where Opus didn't time out. Results with Sonnet subject + Opus judge: 2.51 -> 2.74 (+9%), 47/61 perfect scores with new CLAUDE.md vs 37/61 without.

Eval repro: https://github.com/supriyar/torchao-eval

Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Haiku (weaker model, more likely to show improvement from context) - Judge model: Claude Sonnet (scores responses 0-3 against rubrics) - Prompts: 48 questions across 12 categories (getting started, config classes, float8 training, QAT, sparsity, optimizers, architecture, integrations, development, use cases, gotchas, comparisons) <img width="428" height="140" alt="image" src="https://github.com/user-attachments/assets/b0c20539-aa69-49d2-a03f-4943508f62e2" /> Prompts that went from 0 to 3 (completely wrong to perfect): - MXFP8 dense training, MXFP8 MoE training, NVFP4 inference - all 0->3 - ExecuTorch, sparsity, PyTorch version, config comparison - all 0->3 - Int4WeightOnlyConfig vs Int8DynamicActivation difference - 0->3 - torchao vs bitsandbytes comparison - 1->3 +32% improvement, wrong answers dropped from 13 to 2. The new CLAUDE.md has the biggest impact on architecture questions, MX/NVFP4 formats, and comparison questions - the areas where the old empty CLAUDE.md gave the model nothing [ghstack-poisoned]

…nd quick reference" Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]

Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]

…nd quick reference" Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]

Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]

supriyar · 2026-03-31T23:37:30Z

@pytorchbot merge

pytorchmergebot · 2026-03-31T23:38:05Z

Merge failed

Reason: 1 mandatory check(s) are pending/not yet run. The first few are:

Facebook CLA Check

Dig deeper by viewing the pending checks on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: superuser

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2026

supriyar mentioned this pull request Mar 27, 2026

Add llms.txt to docs site for LLM crawler discoverability #4196

Merged

supriyar added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Mar 27, 2026

supriyar mentioned this pull request Mar 27, 2026

Expand CODEOWNERS and add GitHub issue templates #4197

Merged

supriyar requested review from drisspg, jerryzh168 and vkuzo March 27, 2026 22:57

supriyar added 2 commits March 27, 2026 16:11

vkuzo reviewed Mar 31, 2026

View reviewed changes

CLAUDE.md Outdated Show resolved Hide resolved

vkuzo reviewed Mar 31, 2026

View reviewed changes

CLAUDE.md Outdated Show resolved Hide resolved

jerryzh168 reviewed Mar 31, 2026

View reviewed changes

CLAUDE.md Outdated Show resolved Hide resolved

jerryzh168 reviewed Mar 31, 2026

View reviewed changes

CLAUDE.md Outdated Show resolved Hide resolved

supriyar added 2 commits March 31, 2026 11:48

vkuzo approved these changes Mar 31, 2026

View reviewed changes

supriyar added 4 commits March 31, 2026 16:00

supriyar changed the base branch from gh/supriyar/1/base to main March 31, 2026 23:21

pytorchmergebot added the merging label Mar 31, 2026

pytorchmergebot removed the merging label Mar 31, 2026

supriyar merged commit 7c1b138 into main Mar 31, 2026
35 of 38 checks passed


		## Commit Messages

		- Do not commit without explicit request from the user

Conversation

supriyar commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4195

⏳ No Failures, 10 Pending

Uh oh!

vkuzo Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

supriyar Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

vkuzo Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

supriyar Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

vkuzo Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vkuzo commented Mar 31, 2026

Uh oh!

vkuzo commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

supriyar commented Mar 31, 2026

Uh oh!

supriyar commented Mar 31, 2026

Uh oh!

pytorchmergebot commented Mar 31, 2026

Merge failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

supriyar commented Mar 27, 2026 •

edited

Loading

pytorch-bot bot commented Mar 27, 2026 •

edited

Loading