perf(formatter): POC store AstNodes on stack instead of in arena#21931
perf(formatter): POC store AstNodes on stack instead of in arena#21931overlookmotel wants to merge 50 commits intoom/04-29-ci_alloc_add_formatter_to_just_allocs_from
AstNodes on stack instead of in arena#21931Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Merging this PR will improve performance by 16.39%
Performance Changes
Comparing Footnotes
|
AstNodes on stack instead of in arena
|
Conformance now all passing. What's problematicThe problematic parts are all related to this: This change is premised on formatting being a recursive AST walk. There are about 10 parts of formatter which don't follow this pattern e.g. The fix for now is to bring back arena allocation of Perf of latest versionIn the process of fixing conformance, the perf gain has been reduced from the numbers I initially posted. Still positive, but not as dramatic. I'm unclear how much of the initial gain can be won back if we do the refactoring to remove the Reduction in Max RSS on large files is pretty good, even with many allocations still happening. Local benchmark (Mac Mini M4)
Arena memory
Max RSS
Alternative approachI also wonder if we could do away with
That might or might not be more performant than the approach in this PR, but it'd at least do away with all the lifetimes. Side note: This is my first real experience of "slop engineering"! I'm impressed how far Claude got unaided, though he did make some stupid decisions and needed guidance at times. I doubt the implementation is optimal. |
be762c0 to
f5c802a
Compare
Reverts print/, utils/, parentheses/, formatter/, ir_transform/,
detect_code_removal/, and external_formatter.rs to main.
Retains the validated infrastructure:
- ast_nodes/node.rs (new AstNode<'me, 'a, T> struct, manual Copy)
- ast_nodes/iterator.rs (AstNodeIterator yielding owned wrappers)
- ast_nodes/impls/ast_nodes.rs (AstNodes ancestors)
- ast_nodes/generated/* (regenerated with new shape)
- lib.rs:70 (AstNode::new without allocator)
- tasks/ast_tools/src/generators/formatter/{ast_nodes,format}.rs
- NORTH_STAR.md (with lessons from first attempt)
- REFACTOR_FINDINGS.md (post-mortem)
Codebase will not compile until call sites are re-adapted with the
new with_inner-based strategy described in NORTH_STAR.md.
Apply patterns from STEP_1_PLAN.md to all flagged broken sites: - Pattern A (no-alloc getter on '&'this self'): ReturnAndThrowStatement::argument, TemplateLike::quasis, AnyJsxTagWithChildren::children, AssignmentLike::get_right_expression - Pattern B (chain loop, arena-alloc current wrapper): arrow_function_expression chain, TSModuleDeclaration body loop, get_innermost_expression - Pattern C (Vec-pushing flatten, arena-alloc input): FormalParametersIter::new Restore stubbed paths: - ExpressionLeftSide::left and ::iter (threaded 'f' through callsites; named 'a' in NeedsParentheses impls so f.allocator() lifetimes line up) - chain_members_iter full traversal - is_poorly_breakable_member_or_call_chain - MemberChain gate (call_like_expression/mod.rs) - object_like.rs should_hug unreachable!() Conformance: JS 746/753, TS 591/601 — matches main exactly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Conformance back to baseline (746/591), matching main exactly. - Bench shows a durable ~5% speed-up vs main across all measured files. - Spike's 10-13% gain on big files was inflated by stubbed paths; step 1's realistic gain is smaller but holds with parity restored. - Replace the spike's "what doesn't work" section with a step-1 resolution pointer to STEP_1_PLAN.md and the three patterns (A/B/C) that fixed it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6ec43fc to
807e625
Compare
f5c802a to
b6574dd
Compare

Proof-of-concept only. Some conformance test failures.
Remove allocation of
AstNodes into arena.Process of the 1st phase of formatter is something like this:
FormatWrite::writecreatesAstNodes for the node's children.write!(which expands to aFormatter::write_fmtcall), passing&AstNodes (and other stuff).Formatter::write_fmtcallsFormat::fmton the children.Format::fmtcallsFormatWrite::write. Back to step 1, or return if reached a leaf node.So it's a recursive traversal of the AST.
Currently, all the field getter methods of
AstNode<T>create newAstNodefor the child field, allocate it into the arena, and return a&'a AstNode.Instead:
AstNode.FormatWrite::writecan store thatAstNodeon the stack, and pass a reference to it intowrite!.write!invocation, theAstNodes for the node's children aren't needed any more.FormatWrite::writeexits.i.e. Store
AstNodes on the stack, instead of in the arena.This also means that:
AstNodeto contain&Allocator.Obviously, this PR is slop! I just wanted to explore if the idea works. It does seem to, but Claude hit some problems in a few places, and just commented out code - hence the conformance failures. I think these problems can be overcome.
If we want to pursue this, we'd probably be better off starting again from scratch with a clearer plan to produce a cleaner result. I'm only submitting at this stage to get feedback from @Dunqing and to see what CodSpeed says.
Local benchmark: