Skip to content

Commit 8f366b8

Browse files
committed
Add CLAUDE.md with AI agent instructions and quick reference
Replace empty placeholder with structured documentation for AI coding assistants (Claude Code, Cursor, Copilot). Includes config class table, granularity reference, deprecated API warnings, and pointers to in-repo docs for architecture details. ghstack-source-id: e0fe747 Pull Request resolved: #4195
1 parent 6f56403 commit 8f366b8

File tree

1 file changed

+102
-2
lines changed

1 file changed

+102
-2
lines changed

CLAUDE.md

Lines changed: 102 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,103 @@
1-
# TorchAO Claude Instructions
1+
# TorchAO
22

3-
Fill me in
3+
PyTorch-native library for quantization, sparsity, and low-precision training. Works with `torch.compile()` and `FSDP2`.
4+
5+
## Quick Reference
6+
7+
```python
8+
# Quantize a model to int4
9+
from torchao.quantization import quantize_, Int4WeightOnlyConfig
10+
quantize_(model, Int4WeightOnlyConfig(group_size=32))
11+
12+
# Float8 dynamic quantization
13+
from torchao.quantization import Float8DynamicActivationFloat8WeightConfig, PerRow
14+
quantize_(model, Float8DynamicActivationFloat8WeightConfig(granularity=PerRow()))
15+
16+
# Per-layer configs (different quantization per module)
17+
from torchao.quantization import FqnToConfig
18+
quantize_(model, FqnToConfig({"layers.0.attn": Int4WeightOnlyConfig(), "layers.0.mlp": Float8DynamicActivationFloat8WeightConfig()}))
19+
20+
# Filter specific layers
21+
quantize_(model, Int4WeightOnlyConfig(), filter_fn=lambda mod, fqn: "mlp" in fqn)
22+
23+
# QAT (prepare, train, then convert)
24+
from torchao.quantization import Int8DynamicActivationIntxWeightConfig, PerGroup
25+
from torchao.quantization.qat import QATConfig
26+
base_config = Int8DynamicActivationIntxWeightConfig(weight_dtype=torch.int4, weight_granularity=PerGroup(32))
27+
quantize_(model, QATConfig(base_config, step="prepare"))
28+
# ... train ...
29+
quantize_(model, QATConfig(base_config, step="convert"))
30+
31+
# Float8 training (H100/B200 required)
32+
from torchao.float8 import convert_to_float8_training
33+
convert_to_float8_training(model)
34+
```
35+
36+
## Config Classes
37+
38+
All configs inherit from `AOBaseConfig`. Defined in `torchao/quantization/quant_api.py`:
39+
40+
| Config | Description |
41+
|--------|-------------|
42+
| `Int4WeightOnlyConfig` | int4 weight-only (most common for inference) |
43+
| `Int8WeightOnlyConfig` | int8 weight-only |
44+
| `Int8DynamicActivationInt8WeightConfig` | int8 weights + int8 dynamic activations |
45+
| `Int8DynamicActivationIntxWeightConfig` | int8 activations + arbitrary int weight width |
46+
| `Float8WeightOnlyConfig` | float8 weight-only |
47+
| `Float8DynamicActivationFloat8WeightConfig` | float8 weights + float8 dynamic activations |
48+
| `Float8DynamicActivationInt4WeightConfig` | float8 activations + int4 weights |
49+
| `IntxWeightOnlyConfig` | arbitrary bit-width for edge/ExecuTorch |
50+
| `FqnToConfig` | map module names to different configs for per-layer quantization |
51+
52+
### Granularity
53+
54+
Controls how many elements share a quantization scale. Import from `torchao.quantization`:
55+
- `PerTensor` - one scale for the whole tensor
56+
- `PerRow` / `PerAxis` - one scale per row/axis (recommended for float8)
57+
- `PerGroup(group_size)` - one scale per group (e.g., group_size=32 for int4)
58+
- `PerBlock` - one scale per block
59+
- `PerToken` - one scale per token (for activations)
60+
61+
### Prototype configs (in `torchao/prototype/mx_formats/`)
62+
- `MXDynamicActivationMXWeightConfig` - MXFP8/MXFP4 (H100/B200/MI350x)
63+
- `NVFP4DynamicActivationNVFP4WeightConfig` - NVIDIA FP4 (B200 Blackwell only)
64+
- `NVFP4WeightOnlyConfig` - NVFP4 weight-only (B200 Blackwell only)
65+
66+
## Stable vs Prototype
67+
68+
- **Stable** (`torchao/quantization/`, `torchao/float8/`, `torchao/sparsity/`, `torchao/optim/`): API stability guaranteed. Breaking changes go through deprecation cycle.
69+
- **Prototype** (`torchao/prototype/`): Experimental features, API may change without notice. Includes: `mx_formats/` (MXFP8, MXFP4, NVFP4), `moe_training/` (MoE mixed-precision), `awq/`, `hqq/`, `autoround/`, `quantized_training/`.
70+
71+
## Architecture and Contributing
72+
73+
For architecture details, tensor subclass design, and contributor guides, see the in-repo docs:
74+
- [Quantization Overview](docs/source/contributing/quantization_overview.rst) - full stack walkthrough, tensor subclasses, quantization flows
75+
- [Contributor Guide](docs/source/contributing/contributor_guide.rst) - how to add tensors, kernels, configs
76+
- [Workflows Matrix](docs/source/workflows/index.md) - dtype x hardware status table
77+
78+
These same files render at https://docs.pytorch.org/ao/main/contributing/index.html
79+
80+
## Deprecated APIs
81+
82+
Do not use or recommend these:
83+
- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed. New tensor types inherit from `TorchAOBaseTensor` in `torchao/utils.py`
84+
- `autoquant()` - deleted
85+
- Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted
86+
- `TorchAODType` - deprecated
87+
- `change_linear_weights_to_int4_woqtensors` - deleted, use `quantize_(model, Int4WeightOnlyConfig())`
88+
89+
## Development
90+
91+
```bash
92+
# Setup
93+
USE_CPP=0 pip install -e . --no-build-isolation # CPU-only
94+
USE_CUDA=1 pip install -e . --no-build-isolation # With CUDA
95+
96+
# Lint (ruff v0.11.6, rules: F and I)
97+
ruff check --fix && ruff format .
98+
99+
# Test (mirrors source structure)
100+
pytest test/quantization/test_quant_api.py
101+
pytest test/float8/
102+
pytest test/prototype/mx_formats/
103+
```

0 commit comments

Comments
 (0)