Skip to content

core/vm: port GEVM performance optimisations#2143

Open
kamuikatsurgi wants to merge 28 commits intodevelopfrom
kamui/port-gevm-perf-optimisations
Open

core/vm: port GEVM performance optimisations#2143
kamuikatsurgi wants to merge 28 commits intodevelopfrom
kamui/port-gevm-perf-optimisations

Conversation

@kamuikatsurgi
Copy link
Member

@kamuikatsurgi kamuikatsurgi commented Mar 17, 2026

Description

  • Fixed-size [1024]uint256.Int array stack replacing slice-based stack, eliminating append/slice-header overhead on push/pop
  • Switch-dispatch fast path (runSwitch) with inlined hot opcodes and gas accumulation, gated behind --switch-dispatch flag
  • Code generator (gen_dispatch/) to produce interpreter_dispatch.go
  • Wire EnableSwitchDispatch through CLI, ethconfig, and blockchain to both block import and block production paths

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 66.16400% with 359 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.04%. Comparing base (256bc94) to head (4df1ff3).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
core/vm/gen_dispatch/main.go 0.00% 277 Missing ⚠️
core/vm/interpreter_dispatch.go 89.08% 58 Missing and 21 partials ⚠️
miner/worker.go 75.00% 2 Missing ⚠️
core/vm/stack.go 96.55% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (66.16%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2143      +/-   ##
===========================================
+ Coverage    51.51%   52.04%   +0.52%     
===========================================
  Files          882      884       +2     
  Lines       154081   155161    +1080     
===========================================
+ Hits         79377    80750    +1373     
+ Misses       69526    69174     -352     
- Partials      5178     5237      +59     
Files with missing lines Coverage Δ
core/vm/interpreter.go 59.76% <100.00%> (+4.82%) ⬆️
eth/backend.go 52.27% <100.00%> (+0.06%) ⬆️
eth/ethconfig/config.go 78.94% <ø> (ø)
internal/cli/server/config.go 65.03% <100.00%> (+0.05%) ⬆️
internal/cli/server/flags.go 100.00% <100.00%> (ø)
core/vm/stack.go 96.61% <96.55%> (+55.23%) ⬆️
miner/worker.go 72.49% <75.00%> (+3.60%) ⬆️
core/vm/interpreter_dispatch.go 89.08% <89.08%> (ø)
core/vm/gen_dispatch/main.go 0.00% <0.00%> (ø)

... and 34 files with indirect coverage changes

Files with missing lines Coverage Δ
core/vm/interpreter.go 59.76% <100.00%> (+4.82%) ⬆️
eth/backend.go 52.27% <100.00%> (+0.06%) ⬆️
eth/ethconfig/config.go 78.94% <ø> (ø)
internal/cli/server/config.go 65.03% <100.00%> (+0.05%) ⬆️
internal/cli/server/flags.go 100.00% <100.00%> (ø)
core/vm/stack.go 96.61% <96.55%> (+55.23%) ⬆️
miner/worker.go 72.49% <75.00%> (+3.60%) ⬆️
core/vm/interpreter_dispatch.go 89.08% <89.08%> (ø)
core/vm/gen_dispatch/main.go 0.00% <0.00%> (ø)

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@0xPolygon 0xPolygon deleted a comment from claude bot Mar 17, 2026
@0xPolygon 0xPolygon deleted a comment from claude bot Mar 17, 2026
@claude
Copy link

claude bot commented Mar 17, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@adamdossa
Copy link
Contributor

@claude review

@claude
Copy link

claude bot commented Mar 18, 2026

Claude finished @adamdossa's task in 3m 13s —— View job


Code Review: GEVM Performance Optimisations

  • Read and analyze core VM changes (stack, interpreter, dispatch)
  • Read and analyze code generator
  • Read and analyze config/wiring changes
  • Read and analyze tests
  • Post review findings

Overall this is a well-structured PR. The fixed-size stack, switch dispatch, and code generator are cleanly implemented. Operand ordering for all inlined opcodes (SUB, DIV, SDIV, ADDMOD, MULMOD, JUMPI, etc.) was verified against the original instructions.go implementations and is correct. The gas accumulator pattern with flushing at control-flow points is sound — unflushed gasAccum on error is safe because non-REVERT errors consume all gas in EVM.Call.

Below are the findings, ordered by severity:


1. Behavior change in miner/worker.go — block production now inherits full VM config (Medium)

miner/worker.go:2369-2373 — The new vmConfig() method copies the entire blockchain VM config:

func (w *worker) vmConfig() vm.Config {
    cfg := *w.chain.GetVMConfig()
    return cfg
}

Previously, block production used vm.Config{} (all fields zero/false). Now it inherits all fields: EnablePreimageRecording, StatelessSelfValidation, EnableWitnessStats, and crucially Tracer. This is likely more correct (the miner should respect the configured settings), but it's a semantic change beyond just wiring EnableSwitchDispatch.

Tracer safety: The !debug guard at interpreter.go:185 prevents runSwitch from being entered when a Tracer is set, so there's no correctness issue. But if --vmtrace is enabled, block production will now also trace, which could slow down block production. Consider whether this is the intended behavior, or if vmConfig() should explicitly clear the Tracer:

func (w *worker) vmConfig() vm.Config {
    cfg := *w.chain.GetVMConfig()
    cfg.Tracer = nil // block production should not trace
    return cfg
}

Fix this →


2. Memory footprint increase from fixed-size stack (Low — Informational)

core/vm/stack.go:40-43 — Each Stack is now ~32KB ([1024]uint256.Int = 1024 × 32 bytes + 8 bytes for top), allocated upfront regardless of actual usage. The old slice-based stack started small and grew dynamically.

With sync.Pool reuse this is acceptable in steady-state. Worst case with maximum EVM call depth (1024 nested calls), this is ~32MB of stack allocations. This is fine, but worth noting for operators monitoring memory usage.


3. returnStack doesn't zero the data array (Low — By Design)

core/vm/stack.go:49-52 — Only top is reset, leaving stale data in the array:

func returnStack(s *Stack) {
    s.top = 0
    stackPool.Put(s)
}

This is safe because: (a) Data() returns st.data[:st.top] so stale elements are never exposed, and (b) push writes before incrementing top. Zeroing 32KB on every return would negate the performance gains. No action needed — just confirming this is intentional.


4. Gas accumulator overflow analysis (Low — Verified Safe)

gasAccum is uint64. Between flushes (JUMP/JUMPI/JUMPDEST), the maximum per-opcode gas is GasSlowStep = 10. Since contract.Gas is bounded by the block gas limit (well under uint64 max), and gas is deducted from contract.Gas at flush points, gasAccum cannot overflow before the contract runs out of gas. No issue.


5. Code generator and generated code consistency (Nit)

The interpreter_dispatch.go has the correct // Code generated by gen_dispatch; DO NOT EDIT. header and //go:generate go run ./gen_dispatch/ directive in interpreter.go:19. Consider adding a CI check (or Makefile target) that re-runs go generate ./core/vm/ and verifies no diff, to catch cases where the generator is modified without regenerating.


6. SonarQube findings (Nit)

SonarQube flags 5% duplication (threshold 3%) and 1 security hotspot. The duplication is from the repetitive DUP1-16 and SWAP1-16 cases in the generated code — this is inherent to the switch-dispatch design and is the point (avoiding indirect calls). The security hotspot likely refers to the os.WriteFile in the generator, which only runs at code-gen time, not at runtime.


Summary

The core changes are sound. The operand semantics match the original instructions.go implementations, gas accounting is correct, and the switch dispatch is properly gated behind both the --switch-dispatch flag and the isShanghai && !isEIP4762 && !debug conditions. The main item worth discussing is finding #1 (whether vmConfig() should clear the Tracer for block production).

Comment on lines +2371 to 2377
cfg := *w.chain.GetVMConfig()
return cfg
}

// copyReceipts makes a deep copy of the given receipts.
func copyReceipts(receipts []*types.Receipt) []*types.Receipt {
result := make([]*types.Receipt, len(receipts))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: vmConfig() in miner/worker.go no longer clears the Tracer field when copying the blockchain's VM config. Commit a022b70 originally added cfg.Tracer = nil (with comment "tracing disabled, suitable for block production"), but commit 00f3fae ("chore: nits") accidentally removed it. When --vmtrace is configured, the live tracer leaks into block production, which sets debug=true and bypasses the new switch-dispatch fast path, negating the performance optimization this PR introduces.

Extended reasoning...

What the bug is

The vmConfig() method in miner/worker.go (lines 2371-2377) copies the blockchain's entire vm.Config struct but does not clear the Tracer field. This method is called in makeEnv, prepareWork, generateWork, and prefetchFromPool — all block production code paths.

How the Tracer gets set

When the --vmtrace CLI flag is configured, eth/backend.go (line 313) explicitly sets options.VmConfig.Tracer = t on the blockchain-level VM config:

if config.VMTrace != "" {
    t, err := tracers.LiveDirectory.New(config.VMTrace, traceConfig)
    // ...
    options.VmConfig.Tracer = t
}

This means w.chain.GetVMConfig().Tracer is non-nil whenever --vmtrace is active.

Git history confirms this is a regression

Commit a022b70 ("fix: use blockchain's vm.Config instead of nil") introduced the correct implementation:

// vmConfig returns blockchain's vm config with tracing disabled, suitable for block production.
func (w *worker) vmConfig() vm.Config {
    cfg := *w.chain.GetVMConfig()
    cfg.Tracer = nil
    return cfg
}

The very next commit 00f3fae ("chore: nits") removed the cfg.Tracer = nil line and simplified the comment, losing the intentional safety guard.

Impact on the switch-dispatch fast path

In core/vm/interpreter.go, the new switch-dispatch optimization is gated on !debug, where debug is evm.Config.Tracer != nil. If the tracer leaks through vmConfig(), then debug=true for all block production, and the fast path is never taken — silently negating the core performance improvement of this PR for any node running with --vmtrace.

Precedent in the codebase

The codebase already follows this pattern in core/blockchain.go (lines 779-780):

// Disable tracing for prefetcher executions.
vmCfg := bc.cfg.VmConfig
vmCfg.Tracer = nil

This is the exact same defensive clearing that should be restored in vmConfig().

Step-by-step proof

  1. User starts a node with --vmtrace (e.g., for live transaction logging).
  2. eth/backend.go:313 sets options.VmConfig.Tracer = t (a live tracer).
  3. Blockchain is initialized with this config, so blockchain.GetVMConfig().Tracer != nil.
  4. Miner's vmConfig() copies the config via *w.chain.GetVMConfig() — the Tracer field carries over.
  5. In interpreter.go, debug := evm.Config.Tracer != nil evaluates to true.
  6. The condition !debug fails, so the switch-dispatch fast path is bypassed.
  7. Block production falls back to the slower jump-table interpreter, losing the optimization.

Fix

Restore the line removed in 00f3fae:

func (w *worker) vmConfig() vm.Config {
    cfg := *w.chain.GetVMConfig()
    cfg.Tracer = nil
    return cfg
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a bug; if we set --vmtrace, we'd want to skip the switch-dispatch fast path. This is a design choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the block producers to inherit the full vm.config, and decide which path to take.

@claude
Copy link

claude bot commented Mar 18, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
3.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Comment on lines +318 to +322

case shapePushVal:
e.p("if stack.top >= 1024 {\n")
e.p("return nil, &ErrStackOverflow{stackLen: stack.top, limit: 1024}\n")
e.p("}\n")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: Inlined opcodes (PUSH, DUP, PC, MSIZE, PUSH0) report ErrStackOverflow{limit: 1024} while the default fallback case reports ErrStackOverflow{limit: operation.maxStack} where maxStack is 1023 for these net-push-1 operations. The overflow check itself is correct in both paths (stack.top >= 1024 == sLen > 1023), so this is purely a cosmetic inconsistency in the error message. Consider using 1024 - 1 or a named constant to keep the limit field consistent with the jumpTable fallback.

Extended reasoning...

What the bug is

The code generator (gen_dispatch/main.go) emits ErrStackOverflow{stackLen: stack.top, limit: 1024} for all inlined opcodes that use the shapePushVal shape (lines 318-322), emitDup (line 344), emitPushGeneric (line 375), and emitPush (line 393). However, the default fallback case in the generated interpreter_dispatch.go (around line 749) uses ErrStackOverflow{stackLen: sLen, limit: operation.maxStack}, where operation.maxStack is computed as 1024 + pop - push.

For all affected net-push-1 opcodes (DUP1-16, PUSH0-32, PC, MSIZE), maxStack = 1024 + pop - push. For example, DUP1 pops 1 and pushes 2, so maxStack = 1024 + 1 - 2 = 1023. PUSH1 pops 0 and pushes 1, so maxStack = 1024 + 0 - 1 = 1023. The inlined code reports limit: 1024 while the default path reports limit: 1023 for the same operation.

Why the overflow check is still correct

The inlined path checks stack.top >= 1024 and the default path checks sLen > operation.maxStack (i.e., sLen > 1023). Since sLen is stack.top (an integer), stack.top >= 1024 is mathematically equivalent to stack.top > 1023. Both paths correctly prevent the stack from growing beyond 1024 items. The actual overflow detection is identical.

Step-by-step proof of the inconsistency

  1. Assume the stack has exactly 1024 items (stack.top = 1024).
  2. Inlined DUP1 path: checks stack.top >= 1024 → true → returns ErrStackOverflow{stackLen: 1024, limit: 1024}.
  3. Default fallback for DUP1: computes sLen = stack.len() = 1024, checks sLen > operation.maxStack1024 > 1023 → true → returns ErrStackOverflow{stackLen: 1024, limit: 1023}.
  4. Same trigger condition, same stackLen, but limit differs: 1024 vs 1023.

Impact assessment

This has zero functional impact:

  • Stack overflow errors consume all remaining gas regardless of the limit field value.
  • The differential test suite (dispatch_test.go) uses classifyErr which only checks error type via errors.As, not field values.
  • No downstream code in the codebase inspects the limit field for decision-making.
  • The Error() string representation would show (1024) vs (1023) but this string is never parsed.

Suggested fix

In gen_dispatch/main.go, the shapePushVal emitter (line 320), emitDup (line 344), emitPushGeneric (line 375), and emitPush (line 393) could use 1024 - 1 (i.e., 1023) for the limit field to match the operation.maxStack value that the default path would report. Alternatively, a comment explaining the intentional difference (reporting absolute capacity vs operation-specific limit) would suffice.

@claude
Copy link

claude bot commented Mar 19, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@adamdossa adamdossa requested review from cffls and marcello33 March 23, 2026 10:13
Copy link
Contributor

@cffls cffls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. The patch coverage is 66%. Any ideas if we can bump it up?

@@ -0,0 +1,594 @@
// Code generator for interpreter_dispatch.go — the EVM switch dispatch.
// Adapted from GEVM's gen/main.go (github.com/Giulio2002/gevm).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the SOP to update this file since it is taken from another repo? If there is an EVM update in the future, what should we do with it?

VMTraceJsonConfig string `hcl:"vmtrace.jsonconfig,optional" toml:"vmtrace.jsonconfig,optional"`

// Use switch-based fast path EVM interpreter
EnableSwitchDispatch bool `hcl:"switch-dispatch,optional" toml:"switch-dispatch,optional"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add evm to this flag, e.g. evm-switch-dispatch? It would allow users to understand the flag without looking up the description/doc.

Copy link
Collaborator

@marcello33 marcello33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 2 minor comments

StatelessSelfValidation bool

// Use switch-based fast path EVM interpreter
EnableSwitchDispatch bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this in the generated TOML codec for the marshal/unmarshal helpers? I mean in eth/ethconfig/gen_config.go?

type Stack struct {
data []uint256.Int
top int
data [1024]uint256.Int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we quantify the memory tradeoff here? I believe this change removes the append overhead, but also each stack could use more memory even for small contracts. Did we evaluate this during testing? Or maybe worth adding a benchmark test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants