Skip to content

perf(formatter): POC store AstNodes on stack instead of in arena#21931

Draft
overlookmotel wants to merge 50 commits intoom/04-29-ci_alloc_add_formatter_to_just_allocs_from
om/04-29-formatter-no-alloc
Draft

perf(formatter): POC store AstNodes on stack instead of in arena#21931
overlookmotel wants to merge 50 commits intoom/04-29-ci_alloc_add_formatter_to_just_allocs_from
om/04-29-formatter-no-alloc

Conversation

@overlookmotel
Copy link
Copy Markdown
Member

@overlookmotel overlookmotel commented Apr 29, 2026

Proof-of-concept only. Some conformance test failures.

Remove allocation of AstNodes into arena.

Process of the 1st phase of formatter is something like this:

  1. FormatWrite::write creates AstNodes for the node's children.
  2. It does write! (which expands to a Formatter::write_fmt call), passing &AstNodes (and other stuff).
  3. Formatter::write_fmt calls Format::fmt on the children.
  4. Format::fmt calls FormatWrite::write. Back to step 1, or return if reached a leaf node.

So it's a recursive traversal of the AST.

Currently, all the field getter methods of AstNode<T> create new AstNode for the child field, allocate it into the arena, and return a &'a AstNode.

Instead:

  • Field getters return an owned AstNode.
  • FormatWrite::write can store that AstNode on the stack, and pass a reference to it into write!.
  • After the write! invocation, the AstNodes for the node's children aren't needed any more.
  • They're dropped when the parent's FormatWrite::write exits.

i.e. Store AstNodes on the stack, instead of in the arena.

This also means that:

  • There's no need for AstNode to contain &Allocator.
  • Removes the unsafe lifetime extension.

Obviously, this PR is slop! I just wanted to explore if the idea works. It does seem to, but Claude hit some problems in a few places, and just commented out code - hence the conformance failures. I think these problems can be overcome.

If we want to pursue this, we'd probably be better off starting again from scratch with a clearer plan to produce a cleaner result. I'm only submitting at this stage to get feedback from @Dunqing and to see what CodSpeed says.


Local benchmark:

cargo bench --bench formatter
File Baseline (main) PR Δ
(small) 33.34 33.97 +1.9%
errors.ts 62.87 59.71 −5.0%
Search.tsx 196.94 175.02 −11.1%
core.js 208.41 182.00 −12.7%
next.ts 298.56 266.19 −10.8%
index.tsx 535.10 476.77 −10.9%
(medium) 359.97 335.38 −6.8%
types.ts 1203.4 1147.6 −4.6%
App.tsx 6147.5 5406.5 −12.1%

Copy link
Copy Markdown
Member Author

overlookmotel commented Apr 29, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 29, 2026

Merging this PR will improve performance by 16.39%

⚡ 9 improved benchmarks
✅ 35 untouched benchmarks
⏩ 7 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation formatter[core.js] 2 ms 1.8 ms +10.22%
Simulation formatter[Search.tsx] 1.9 ms 1.7 ms +10.79%
Simulation formatter[RadixUIAdoptionSection.jsx] 422.8 µs 389.8 µs +8.45%
Simulation formatter[next.ts] 2.8 ms 2.5 ms +10.47%
Simulation formatter[index.tsx] 4.6 ms 4.2 ms +10.28%
Simulation formatter[errors.ts] 777.4 µs 720.5 µs +7.89%
Simulation formatter[App.tsx] 49.5 ms 43.6 ms +13.57%
Simulation formatter[handle-comments.js] 3.6 ms 3.3 ms +9.77%
Simulation formatter[types.ts] 14.6 ms 12.6 ms +16.39%

Comparing om/04-29-formatter-no-alloc (b6574dd) with om/04-29-ci_alloc_add_formatter_to_just_allocs_ (6ec43fc)2

Open in CodSpeed

Footnotes

  1. 7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on om/04-29-ci_alloc_add_formatter_to_just_allocs_ (807e625) during the generation of this report, so 5cbebcb was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@overlookmotel overlookmotel changed the title north star perf(formatter): POC store AstNodes on stack instead of in arena Apr 29, 2026
@overlookmotel
Copy link
Copy Markdown
Member Author

overlookmotel commented Apr 29, 2026

Conformance now all passing.

What's problematic

The problematic parts are all related to this:

This change is premised on formatting being a recursive AST walk. There are about 10 parts of formatter which don't follow this pattern e.g. AstNodes are collected into a Vec, or loops which walk down AST without keeping all the steps in the chain in memory at same time. One example is the flattening of BinaryLikeExpression. These break the fundamental assumption of this change that the ancestors of current node are all live on the stack - so they don't work easily with the new model.

The fix for now is to bring back arena allocation of AstNodes in these cases. Likely all could be changed to not use Allocator via refactoring (e.g. replace loop with recursive function call) or by changing the algorithms to fit the stack-storage model.

Perf of latest version

In the process of fixing conformance, the perf gain has been reduced from the numbers I initially posted. Still positive, but not as dramatic. I'm unclear how much of the initial gain can be won back if we do the refactoring to remove the Allocator entirely. The fact that quite a lot of stuff is still being allocated in arena suggests it's plausible we can get a further perf gain.

Reduction in Max RSS on large files is pretty good, even with many allocations still happening.

Local benchmark (Mac Mini M4)

File main spike (historical) latest latest Δ vs main
RadixUIAdoptionSection.jsx 33.57 33.97 31.73 −5.48%
errors.ts 62.60 59.71 60.05 −4.07%
Search.tsx 198.10 175.02 186.37 −5.92%
core.js 209.05 182.00 196.54 −5.98%
next.ts 301.27 266.19 284.87 −5.44%
index.tsx 539.47 476.77 512.13 −5.07%
handle-comments.js 359.88 335.38 342.26 −4.90%
types.ts 1205.71 1147.60 1133.31 −6.00%
App.tsx 6168.15 5406.50 5839.58 −5.33%

Arena memory

File AST KiB main fmt KiB latest fmt KiB reduction
RadixUIAdoptionSection.jsx 20.7 134.2 90.5 −32.55%
errors.ts 36.0 254.7 174.9 −31.34%
Search.tsx 93.8 669.7 407.4 −39.17%
core.js 83.0 663.4 407.7 −38.54%
next.ts 124.3 1029.3 547.4 −46.82%
index.tsx 193.0 1559.0 1118.1 −28.28%
handle-comments.js 145.2 1184.2 724.2 −38.85%
types.ts 1016.9 6539.5 2798.9 −57.20%
App.tsx 1902.6 19055.0 11918.4 −37.45%

Max RSS

File main MB latest MB reduction
RadixUIAdoptionSection.jsx 2.91 2.75 −5.38%
errors.ts 3.19 3.03 −4.90%
Search.tsx 3.81 3.53 −7.38%
core.js 3.62 3.41 −6.03%
next.ts 4.20 3.78 −10.04%
index.tsx 4.92 4.30 −12.70%
handle-comments.js 4.33 3.92 −9.39%
types.ts 10.05 6.28 −37.48%
App.tsx 23.25 16.14 −30.58%

Alternative approach

I also wonder if we could do away with AstNode entirely, by:

  1. Storing ancestry in a NonEmptyStack, like Traverse does.
  2. Making following_span_start a lazy getter instead of pre-calculating it for every node.

That might or might not be more performant than the approach in this PR, but it'd at least do away with all the lifetimes.


Side note: This is my first real experience of "slop engineering"! I'm impressed how far Claude got unaided, though he did make some stupid decisions and needed guidance at times. I doubt the implementation is optimal.

@overlookmotel overlookmotel changed the base branch from main to graphite-base/21931 April 29, 2026 19:12
@overlookmotel overlookmotel force-pushed the om/04-29-formatter-no-alloc branch from be762c0 to f5c802a Compare April 29, 2026 19:12
@overlookmotel overlookmotel changed the base branch from graphite-base/21931 to om/04-29-ci_alloc_add_formatter_to_just_allocs_ April 29, 2026 19:13
Reverts print/, utils/, parentheses/, formatter/, ir_transform/,
detect_code_removal/, and external_formatter.rs to main.

Retains the validated infrastructure:
- ast_nodes/node.rs (new AstNode<'me, 'a, T> struct, manual Copy)
- ast_nodes/iterator.rs (AstNodeIterator yielding owned wrappers)
- ast_nodes/impls/ast_nodes.rs (AstNodes ancestors)
- ast_nodes/generated/* (regenerated with new shape)
- lib.rs:70 (AstNode::new without allocator)
- tasks/ast_tools/src/generators/formatter/{ast_nodes,format}.rs
- NORTH_STAR.md (with lessons from first attempt)
- REFACTOR_FINDINGS.md (post-mortem)

Codebase will not compile until call sites are re-adapted with the
new with_inner-based strategy described in NORTH_STAR.md.
overlookmotel and others added 28 commits April 29, 2026 22:36
Apply patterns from STEP_1_PLAN.md to all flagged broken sites:
- Pattern A (no-alloc getter on '&'this self'): ReturnAndThrowStatement::argument,
  TemplateLike::quasis, AnyJsxTagWithChildren::children, AssignmentLike::get_right_expression
- Pattern B (chain loop, arena-alloc current wrapper): arrow_function_expression chain,
  TSModuleDeclaration body loop, get_innermost_expression
- Pattern C (Vec-pushing flatten, arena-alloc input): FormalParametersIter::new

Restore stubbed paths:
- ExpressionLeftSide::left and ::iter (threaded 'f' through callsites; named
  'a' in NeedsParentheses impls so f.allocator() lifetimes line up)
- chain_members_iter full traversal
- is_poorly_breakable_member_or_call_chain
- MemberChain gate (call_like_expression/mod.rs)
- object_like.rs should_hug unreachable!()

Conformance: JS 746/753, TS 591/601 — matches main exactly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Conformance back to baseline (746/591), matching main exactly.
- Bench shows a durable ~5% speed-up vs main across all measured files.
- Spike's 10-13% gain on big files was inflated by stubbed paths; step 1's
  realistic gain is smaller but holds with parity restored.
- Replace the spike's "what doesn't work" section with a step-1 resolution
  pointer to STEP_1_PLAN.md and the three patterns (A/B/C) that fixed it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@overlookmotel overlookmotel force-pushed the om/04-29-ci_alloc_add_formatter_to_just_allocs_ branch from 6ec43fc to 807e625 Compare April 29, 2026 21:37
@overlookmotel overlookmotel force-pushed the om/04-29-formatter-no-alloc branch from f5c802a to b6574dd Compare April 29, 2026 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant