Read this in Português (Brasil).
Story points are dead. Long live TokenPoints.
A framework for estimating software work in dollars of LLM inference, not hours or story points.
When humans wrote 100% of the code, hours were a (bad) proxy for effort. Story points tried to fix this and mostly made it weirder — abstract, unfalsifiable, and trivially gameable.
In 2026, when an agent writes the first draft of most of your code, the question changed:
"How much will this task cost to ship?"
That cost is now a measurable, falsifiable number in USD — the price of the tokens the model burns getting the work done. Tracking it is no harder than tracking time, and unlike time, it doesn't lie.
TokenPoints is a planning vocabulary built around that number.
- A 6-pillar manifesto — why dollars beat points, and where the limits are.
- A sizing scale (XS → XL) — calibrated to real agentic-coding sessions in 2026, anchored in USD ranges, not vibes.
- A calibration playbook — how your team turns the scale into your numbers in two sprints.
- Tracking templates — the minimum data to capture so calibration actually happens.
- Agile integration guides — how this drops into Scrum, Kanban, or whatever you already run.
- Anti-patterns — the obvious ways to misuse this and how to avoid them.
- Read the Manifesto (5 min). If you disagree, this framework isn't for you and that's fine.
- Skim the Sizing Guide to internalize the XS–XL scale.
- Use the estimation template on your next 10 tasks. Don't change anything else yet.
- After two sprints, run calibration with your real data.
- Once calibrated, integrate into planning — integration guide.
Time to first useful estimate: ~10 minutes. Time to a calibrated team baseline: ~2 sprints.
| Size | Cost (USD) | Typical pattern | Human time |
|---|---|---|---|
| XS | < $1 | Pinpoint edit, autocomplete-heavy | < 30 min |
| S | $1 – $8 | Single-file feature/bug, 5–15 turns | 30 min – 2h |
| M | $8 – $40 | Multi-file feature, 15–40 turns | 2 – 8h |
| L | $40 – $160 | Refactor, deep debug, cross-module | 1 – 3 days |
| XL | $160 – $400 | Architectural change, multi-system | 3+ days |
| ?? | unknown | Spike first — investigate before sizing | time-boxed |
Anything above XL must be decomposed. If you can't decompose it, you don't understand it yet — that's a spike.
These ranges are starting anchors, not laws. Your team will diverge based on codebase size, model mix, and tooling. See Calibration.
- Dollars are more honest than hours.
- Variance is information, not noise.
- Outcome over output.
- Calibrate locally.
- Multidimensional, not unidimensional.
- Human time still exists.
Full elaboration: MANIFESTO.md.
tokenpoints/
├── README.md ← you are here
├── MANIFESTO.md ← the 6 pillars
├── docs/
│ ├── framework.md ← end-to-end overview
│ ├── sizing-guide.md ← XS–XL with worked examples
│ ├── calibration.md ← turn the scale into your numbers
│ ├── tracking.md ← what to measure, how
│ ├── integration-agile.md ← Scrum / Kanban fit
│ └── anti-patterns.md ← how to misuse this
├── templates/
│ ├── estimation-template.md
│ ├── tracking-sheet.csv
│ └── retrospective-template.md
├── examples/
│ ├── frontend-feature.md
│ ├── backend-refactor.md
│ └── debug-session.md
└── CONTRIBUTING.md
The most valuable thing you can contribute is your team's calibration data — anonymized averages, model mix, codebase context. Over time, this turns the repo from one person's framework into an empirical reference.
See CONTRIBUTING.md for how.
v0.1 — early draft. Names, ranges, and pillars are open for community input. If you have a strong opinion, open an issue or a PR. The scale will evolve as more teams report data.
CC BY 4.0 — use it, fork it, build on it. Just credit the source.
Created by Rogério — proposal, not prescription.