Skip to content

feat(test-execute): Add --verify-traces flag for trace comparison against baselines#2535

Open
leolara wants to merge 5 commits intoethereum:forks/amsterdamfrom
leolara:verify-traces
Open

feat(test-execute): Add --verify-traces flag for trace comparison against baselines#2535
leolara wants to merge 5 commits intoethereum:forks/amsterdamfrom
leolara:verify-traces

Conversation

@leolara
Copy link
Copy Markdown
Member

@leolara leolara commented Mar 20, 2026

🗒️ Description

Add a --verify-traces <folder> flag to the fill command that compares EVM execution traces from a baseline run against the current run. This enables developers to confirm that test modifications haven't changed underlying EVM execution behavior.

Comparators:

  • exact — compares all fields including gas and gas_cost. Detects any difference, including opcode gas cost changes.
  • exact-no-gas (default) — excludes gas, gas_cost, and gas_used fields, and applies GAS opcode stack pollution cleanup via remove_gas(). This tolerates remaining gas differences when transaction gas amounts change, but also tolerates opcode gas cost changes since those fields are excluded. Use the exact comparator when gas cost changes need to be detected.

Multiple comparators can be run simultaneously via --verify-traces-comparator exact,exact-no-gas to see how each test differs under different comparison strategies.

Design decisions:

  • Diagnostic, not failing: Unlike the original issue which proposed failing tests on mismatch, this implementation reports all results across all comparators without failing. This lets developers see the full picture — a test might show DIFFERENT under exact but EQUIVALENT under exact-no-gas, which immediately tells them the only changes are gas-related. Failing on the first mismatch would hide this information.

  • Pluggable comparator architecture: The ABC + factory pattern supports future comparators beyond exact matching:

    • Isomorphic comparator (follow-up): will handle address bijection so that address changes in the stack don't trigger differences — the primary use case for ported tests from feat(tests): add script to port static state test fixtures to Python #2455, as described in the issue.
    • Out-of-gas tolerant comparator (follow-up): for tests that intentionally run out of gas, where the execution may diverge at different points but the overall behavior is equivalent.
  • In-memory traces: --verify-traces enables trace collection without writing files to disk (unless --evm-dump-dir is also passed). Baseline traces are loaded from disk; current traces are compared from memory.

  • Cache-aware: The t8n output cache now restores traces on cache hits, so verification works correctly even for cached test formats.

🔗 Related Issues or PRs

Fixes #2506.

Follow-up work:

  • Isomorphic comparator for address-aware comparison (addresses the issue's requirement that address changes should not trigger differences)
  • Out-of-gas tolerant comparator
  • JSON/HTML report formatters

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.35%. Comparing base (3ffa9e7) to head (c792b02).
⚠️ Report is 2 commits behind head on forks/amsterdam.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           forks/amsterdam    #2535      +/-   ##
===================================================
+ Coverage            86.01%   86.35%   +0.33%     
===================================================
  Files                  599      599              
  Lines                36904    36904              
  Branches              3771     3771              
===================================================
+ Hits                 31744    31868     +124     
+ Misses                4551     4485      -66     
+ Partials               609      551      -58     
Flag Coverage Δ
unittests 86.35% <ø> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leolara leolara requested review from marioevz and spencer-tb March 23, 2026 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(test-fill): --verify-traces flag for automatic trace comparison across test runs

1 participant