core/vm: port GEVM performance optimisations#2143
core/vm: port GEVM performance optimisations#2143kamuikatsurgi wants to merge 28 commits intodevelopfrom
Conversation
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (66.16%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2143 +/- ##
===========================================
+ Coverage 51.51% 52.04% +0.52%
===========================================
Files 882 884 +2
Lines 154081 155161 +1080
===========================================
+ Hits 79377 80750 +1373
+ Misses 69526 69174 -352
- Partials 5178 5237 +59
... and 34 files with indirect coverage changes
🚀 New features to boost your workflow:
|
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
|
@claude review |
|
Claude finished @adamdossa's task in 3m 13s —— View job Code Review: GEVM Performance Optimisations
Overall this is a well-structured PR. The fixed-size stack, switch dispatch, and code generator are cleanly implemented. Operand ordering for all inlined opcodes (SUB, DIV, SDIV, ADDMOD, MULMOD, JUMPI, etc.) was verified against the original Below are the findings, ordered by severity: 1. Behavior change in
|
| cfg := *w.chain.GetVMConfig() | ||
| return cfg | ||
| } | ||
|
|
||
| // copyReceipts makes a deep copy of the given receipts. | ||
| func copyReceipts(receipts []*types.Receipt) []*types.Receipt { | ||
| result := make([]*types.Receipt, len(receipts)) |
There was a problem hiding this comment.
🔴 Bug: vmConfig() in miner/worker.go no longer clears the Tracer field when copying the blockchain's VM config. Commit a022b70 originally added cfg.Tracer = nil (with comment "tracing disabled, suitable for block production"), but commit 00f3fae ("chore: nits") accidentally removed it. When --vmtrace is configured, the live tracer leaks into block production, which sets debug=true and bypasses the new switch-dispatch fast path, negating the performance optimization this PR introduces.
Extended reasoning...
What the bug is
The vmConfig() method in miner/worker.go (lines 2371-2377) copies the blockchain's entire vm.Config struct but does not clear the Tracer field. This method is called in makeEnv, prepareWork, generateWork, and prefetchFromPool — all block production code paths.
How the Tracer gets set
When the --vmtrace CLI flag is configured, eth/backend.go (line 313) explicitly sets options.VmConfig.Tracer = t on the blockchain-level VM config:
if config.VMTrace != "" {
t, err := tracers.LiveDirectory.New(config.VMTrace, traceConfig)
// ...
options.VmConfig.Tracer = t
}This means w.chain.GetVMConfig().Tracer is non-nil whenever --vmtrace is active.
Git history confirms this is a regression
Commit a022b70 ("fix: use blockchain's vm.Config instead of nil") introduced the correct implementation:
// vmConfig returns blockchain's vm config with tracing disabled, suitable for block production.
func (w *worker) vmConfig() vm.Config {
cfg := *w.chain.GetVMConfig()
cfg.Tracer = nil
return cfg
}The very next commit 00f3fae ("chore: nits") removed the cfg.Tracer = nil line and simplified the comment, losing the intentional safety guard.
Impact on the switch-dispatch fast path
In core/vm/interpreter.go, the new switch-dispatch optimization is gated on !debug, where debug is evm.Config.Tracer != nil. If the tracer leaks through vmConfig(), then debug=true for all block production, and the fast path is never taken — silently negating the core performance improvement of this PR for any node running with --vmtrace.
Precedent in the codebase
The codebase already follows this pattern in core/blockchain.go (lines 779-780):
// Disable tracing for prefetcher executions.
vmCfg := bc.cfg.VmConfig
vmCfg.Tracer = nilThis is the exact same defensive clearing that should be restored in vmConfig().
Step-by-step proof
- User starts a node with
--vmtrace(e.g., for live transaction logging). eth/backend.go:313setsoptions.VmConfig.Tracer = t(a live tracer).- Blockchain is initialized with this config, so
blockchain.GetVMConfig().Tracer != nil. - Miner's
vmConfig()copies the config via*w.chain.GetVMConfig()— theTracerfield carries over. - In
interpreter.go,debug := evm.Config.Tracer != nilevaluates totrue. - The condition
!debugfails, so the switch-dispatch fast path is bypassed. - Block production falls back to the slower jump-table interpreter, losing the optimization.
Fix
Restore the line removed in 00f3fae:
func (w *worker) vmConfig() vm.Config {
cfg := *w.chain.GetVMConfig()
cfg.Tracer = nil
return cfg
}There was a problem hiding this comment.
This is not a bug; if we set --vmtrace, we'd want to skip the switch-dispatch fast path. This is a design choice.
There was a problem hiding this comment.
We want the block producers to inherit the full vm.config, and decide which path to take.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
|
|
|
||
| case shapePushVal: | ||
| e.p("if stack.top >= 1024 {\n") | ||
| e.p("return nil, &ErrStackOverflow{stackLen: stack.top, limit: 1024}\n") | ||
| e.p("}\n") |
There was a problem hiding this comment.
🟡 Nit: Inlined opcodes (PUSH, DUP, PC, MSIZE, PUSH0) report ErrStackOverflow{limit: 1024} while the default fallback case reports ErrStackOverflow{limit: operation.maxStack} where maxStack is 1023 for these net-push-1 operations. The overflow check itself is correct in both paths (stack.top >= 1024 == sLen > 1023), so this is purely a cosmetic inconsistency in the error message. Consider using 1024 - 1 or a named constant to keep the limit field consistent with the jumpTable fallback.
Extended reasoning...
What the bug is
The code generator (gen_dispatch/main.go) emits ErrStackOverflow{stackLen: stack.top, limit: 1024} for all inlined opcodes that use the shapePushVal shape (lines 318-322), emitDup (line 344), emitPushGeneric (line 375), and emitPush (line 393). However, the default fallback case in the generated interpreter_dispatch.go (around line 749) uses ErrStackOverflow{stackLen: sLen, limit: operation.maxStack}, where operation.maxStack is computed as 1024 + pop - push.
For all affected net-push-1 opcodes (DUP1-16, PUSH0-32, PC, MSIZE), maxStack = 1024 + pop - push. For example, DUP1 pops 1 and pushes 2, so maxStack = 1024 + 1 - 2 = 1023. PUSH1 pops 0 and pushes 1, so maxStack = 1024 + 0 - 1 = 1023. The inlined code reports limit: 1024 while the default path reports limit: 1023 for the same operation.
Why the overflow check is still correct
The inlined path checks stack.top >= 1024 and the default path checks sLen > operation.maxStack (i.e., sLen > 1023). Since sLen is stack.top (an integer), stack.top >= 1024 is mathematically equivalent to stack.top > 1023. Both paths correctly prevent the stack from growing beyond 1024 items. The actual overflow detection is identical.
Step-by-step proof of the inconsistency
- Assume the stack has exactly 1024 items (
stack.top = 1024). - Inlined DUP1 path: checks
stack.top >= 1024→ true → returnsErrStackOverflow{stackLen: 1024, limit: 1024}. - Default fallback for DUP1: computes
sLen = stack.len() = 1024, checkssLen > operation.maxStack→1024 > 1023→ true → returnsErrStackOverflow{stackLen: 1024, limit: 1023}. - Same trigger condition, same
stackLen, butlimitdiffers: 1024 vs 1023.
Impact assessment
This has zero functional impact:
- Stack overflow errors consume all remaining gas regardless of the
limitfield value. - The differential test suite (
dispatch_test.go) usesclassifyErrwhich only checks error type viaerrors.As, not field values. - No downstream code in the codebase inspects the
limitfield for decision-making. - The
Error()string representation would show(1024)vs(1023)but this string is never parsed.
Suggested fix
In gen_dispatch/main.go, the shapePushVal emitter (line 320), emitDup (line 344), emitPushGeneric (line 375), and emitPush (line 393) could use 1024 - 1 (i.e., 1023) for the limit field to match the operation.maxStack value that the default path would report. Alternatively, a comment explaining the intentional difference (reporting absolute capacity vs operation-specific limit) would suffice.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
cffls
left a comment
There was a problem hiding this comment.
Looks good overall. The patch coverage is 66%. Any ideas if we can bump it up?
| @@ -0,0 +1,594 @@ | |||
| // Code generator for interpreter_dispatch.go — the EVM switch dispatch. | |||
| // Adapted from GEVM's gen/main.go (github.com/Giulio2002/gevm). | |||
There was a problem hiding this comment.
What's the SOP to update this file since it is taken from another repo? If there is an EVM update in the future, what should we do with it?
| VMTraceJsonConfig string `hcl:"vmtrace.jsonconfig,optional" toml:"vmtrace.jsonconfig,optional"` | ||
|
|
||
| // Use switch-based fast path EVM interpreter | ||
| EnableSwitchDispatch bool `hcl:"switch-dispatch,optional" toml:"switch-dispatch,optional"` |
There was a problem hiding this comment.
Maybe add evm to this flag, e.g. evm-switch-dispatch? It would allow users to understand the flag without looking up the description/doc.
marcello33
left a comment
There was a problem hiding this comment.
LGTM, 2 minor comments
| StatelessSelfValidation bool | ||
|
|
||
| // Use switch-based fast path EVM interpreter | ||
| EnableSwitchDispatch bool |
There was a problem hiding this comment.
Do we need to add this in the generated TOML codec for the marshal/unmarshal helpers? I mean in eth/ethconfig/gen_config.go?
| type Stack struct { | ||
| data []uint256.Int | ||
| top int | ||
| data [1024]uint256.Int |
There was a problem hiding this comment.
Did we quantify the memory tradeoff here? I believe this change removes the append overhead, but also each stack could use more memory even for small contracts. Did we evaluate this during testing? Or maybe worth adding a benchmark test?


Description
[1024]uint256.Intarray stack replacing slice-based stack, eliminating append/slice-header overhead on push/poprunSwitch) with inlined hot opcodes and gas accumulation, gated behind--switch-dispatchflaggen_dispatch/) to produceinterpreter_dispatch.goEnableSwitchDispatchthrough CLI, ethconfig, and blockchain to both block import and block production paths