Add CLAUDE.md with AI agent instructions and quick reference#4195
Add CLAUDE.md with AI agent instructions and quick reference#4195
Conversation
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4195
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 10 PendingAs of commit 15beca9 with merge base 79159f2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. ghstack-source-id: e0fe747 Pull Request resolved: #4195
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Setup: - Subject model: Claude Haiku (weaker model, more likely to show improvement from context) - Judge model: Claude Sonnet (scores responses 0-3 against rubrics) - Prompts: 48 questions across 12 categories (getting started, config classes, float8 training, QAT, sparsity, optimizers, architecture, integrations, development, use cases, gotchas, comparisons) <img width="428" height="140" alt="image" src="https://github.com/user-attachments/assets/b0c20539-aa69-49d2-a03f-4943508f62e2" /> Prompts that went from 0 to 3 (completely wrong to perfect): - MXFP8 dense training, MXFP8 MoE training, NVFP4 inference - all 0->3 - ExecuTorch, sparsity, PyTorch version, config comparison - all 0->3 - Int4WeightOnlyConfig vs Int8DynamicActivation difference - 0->3 - torchao vs bitsandbytes comparison - 1->3 +32% improvement, wrong answers dropped from 13 to 2. The new CLAUDE.md has the biggest impact on architecture questions, MX/NVFP4 formats, and comparison questions - the areas where the old empty CLAUDE.md gave the model nothing [ghstack-poisoned]
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Setup: - Subject model: Claude Haiku (weaker model, more likely to show improvement from context) - Judge model: Claude Sonnet (scores responses 0-3 against rubrics) - Prompts: 48 questions across 12 categories (getting started, config classes, float8 training, QAT, sparsity, optimizers, architecture, integrations, development, use cases, gotchas, comparisons) <img width="428" height="140" alt="image" src="https://github.com/user-attachments/assets/b0c20539-aa69-49d2-a03f-4943508f62e2" /> Prompts that went from 0 to 3 (completely wrong to perfect): - MXFP8 dense training, MXFP8 MoE training, NVFP4 inference - all 0->3 - ExecuTorch, sparsity, PyTorch version, config comparison - all 0->3 - Int4WeightOnlyConfig vs Int8DynamicActivation difference - 0->3 - torchao vs bitsandbytes comparison - 1->3 +32% improvement, wrong answers dropped from 13 to 2. The new CLAUDE.md has the biggest impact on architecture questions, MX/NVFP4 formats, and comparison questions - the areas where the old empty CLAUDE.md gave the model nothing [ghstack-poisoned]
CLAUDE.md
Outdated
| pytest test/prototype/mx_formats/ | ||
| ``` | ||
|
|
||
| ## Coding Style |
There was a problem hiding this comment.
I think we should set up hooks for running pre-commit/linters: https://code.claude.com/docs/en/hooks, this gets code style for free without spending tokens
There was a problem hiding this comment.
you mean pre-commit run? we do have these I think
There was a problem hiding this comment.
I think Vasiliy is referring to claude code hooks for auto-format on edit? That can be a follow-up PR.
Will remove lint instructions from here
|
|
||
| ## Commit Messages | ||
|
|
||
| - Do not commit without explicit request from the user |
There was a problem hiding this comment.
I have this in my personal CLAUDE.md, but this seems specific to personal preference? maybe leave out of repo-wide one?
There was a problem hiding this comment.
had originally added this since I saw it in the pytorch/pytorch claude.md file. vLLM also has similar instructions in its repo for agent authored commits
There was a problem hiding this comment.
if pytorch and vllm have it, makes sense, thanks
I think this is a good way to eval. Can we check the eval into source control so it's easy to iterate on and measure future improvements? Doesn't have to be in torchao, can be in separate repo if that's easier. |
|
also, for the eval, should we just use Opus 4.6 for everything? IMO better to optimize for the best available model, and even doing this things can go out of date really quickly. I'm not sure it's worth spending a lot of time on evals of older models |
Sonnet/Opus Results (61 prompts, final)
I ran both Opus/Opus and Sonnet/Opus on 61 prompts. The issue with Opus as subject is it times out on ~26% of prompts (16/61) with the 56K char system prompt, even with 180s timeout. Opus generates verbose responses that exceed the time limit. timeouts score 0 and corrupt the data. Sonnet as subject had zero timeouts and scored comparably on the prompts where Opus didn't time out. Results with Sonnet subject + Opus judge: 2.51 -> 2.74 (+9%), 47/61 perfect scores with new CLAUDE.md vs 37/61 without. Eval repro: https://github.com/supriyar/torchao-eval |
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Haiku (weaker model, more likely to show improvement from context) - Judge model: Claude Sonnet (scores responses 0-3 against rubrics) - Prompts: 48 questions across 12 categories (getting started, config classes, float8 training, QAT, sparsity, optimizers, architecture, integrations, development, use cases, gotchas, comparisons) <img width="428" height="140" alt="image" src="https://github.com/user-attachments/assets/b0c20539-aa69-49d2-a03f-4943508f62e2" /> Prompts that went from 0 to 3 (completely wrong to perfect): - MXFP8 dense training, MXFP8 MoE training, NVFP4 inference - all 0->3 - ExecuTorch, sparsity, PyTorch version, config comparison - all 0->3 - Int4WeightOnlyConfig vs Int8DynamicActivation difference - 0->3 - torchao vs bitsandbytes comparison - 1->3 +32% improvement, wrong answers dropped from 13 to 2. The new CLAUDE.md has the biggest impact on architecture questions, MX/NVFP4 formats, and comparison questions - the areas where the old empty CLAUDE.md gave the model nothing [ghstack-poisoned]
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Haiku (weaker model, more likely to show improvement from context) - Judge model: Claude Sonnet (scores responses 0-3 against rubrics) - Prompts: 48 questions across 12 categories (getting started, config classes, float8 training, QAT, sparsity, optimizers, architecture, integrations, development, use cases, gotchas, comparisons) <img width="428" height="140" alt="image" src="https://github.com/user-attachments/assets/b0c20539-aa69-49d2-a03f-4943508f62e2" /> Prompts that went from 0 to 3 (completely wrong to perfect): - MXFP8 dense training, MXFP8 MoE training, NVFP4 inference - all 0->3 - ExecuTorch, sparsity, PyTorch version, config comparison - all 0->3 - Int4WeightOnlyConfig vs Int8DynamicActivation difference - 0->3 - torchao vs bitsandbytes comparison - 1->3 +32% improvement, wrong answers dropped from 13 to 2. The new CLAUDE.md has the biggest impact on architecture questions, MX/NVFP4 formats, and comparison questions - the areas where the old empty CLAUDE.md gave the model nothing [ghstack-poisoned]
…nd quick reference" Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]
…nd quick reference" Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. Comparison: Old CLAUDE.md vs New CLAUDE.md Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval Setup: - Subject model: Claude Sonnet - Judge model: Claude Opus Sonnet/Opus Results (61 prompts, final) <img width="586" height="257" alt="image" src="https://github.com/user-attachments/assets/fc1ff374-eb02-40ed-91c7-089f55715144" /> [ghstack-poisoned]
|
@pytorchbot merge |
Merge failedReason: 1 mandatory check(s) are pending/not yet run. The first few are:
Dig deeper by viewing the pending checks on hud |

Stack from ghstack (oldest at bottom):
Replace empty placeholder with structured documentation for AI coding
assistants (Claude Code, Cursor, Copilot). Includes config class table,
granularity reference, deprecated API warnings, and pointers to in-repo
docs for architecture details.
Comparison: Old CLAUDE.md vs New CLAUDE.md
Instructions+Scripts for repro available in https://github.com/supriyar/torchao-eval
Setup:
Sonnet/Opus Results (61 prompts, final)