Skip to content

bench: add criterion benchmark suite and CI workflow#10444

Open
oxarbitrage wants to merge 8 commits intomainfrom
benches
Open

bench: add criterion benchmark suite and CI workflow#10444
oxarbitrage wants to merge 8 commits intomainfrom
benches

Conversation

@oxarbitrage
Copy link
Copy Markdown
Contributor

Motivation

Zebra had minimal benchmarking coverage (only block serialization and RedPallas signatures). This makes it difficult to evaluate PRs for performance regressions or improvements — for example, PR #10436 removes groth16 abstractions but had no way to verify it was performance-neutral.

Solution

Adds a comprehensive benchmark suite covering Zebra's most expensive code paths, plus CI automation for tracking and comparison.

New benchmarks

  • Groth16 (zebra-consensus/benches/groth16.rs) — Sprout JoinSplit proof verification, single and unbatched at sizes 2-64, plus input preparation cost
  • Halo2 (zebra-consensus/benches/halo2.rs) — Orchard proof verification, single and unbatched at sizes 2-32
  • Sapling (zebra-consensus/benches/sapling.rs) — both unbatched and true batch verification at sizes 2-64 using sapling_crypto::BatchValidator
  • Transaction (zebra-chain/benches/transaction.rs) — per-version (V1-V5) deserialization and serialization from real mainnet blocks

CI workflow (.github/workflows/benchmarks.yml)

  • workflow_dispatch: runs all benchmarks, stores results on gh-pages via github-action-benchmark, generates a summary table in the Actions UI
  • PR comparison: adding the C-benchmark label to a PR runs benchmarks on both base and PR branches, posts a critcmp comparison table as a PR comment

Benchmark dashboard

Adds a step to book.yml that copies benchmark data from gh-pages into the docs output, making the historical trend chart available at zebra.zfnd.org/dev/bench/.

Test data

All benchmarks use real transactions from the hardcoded mainnet blocks in zebra-test. Items are cycled (repeated) to fill larger batch sizes — this is valid because cryptographic verification cost is constant per proof regardless of specific bytes. This limitation is documented in each benchmark file.

Alternatives considered

  • Bencher.dev — purpose-built SaaS for continuous benchmarking with statistical regression detection and confidence intervals. Free for open source. Could replace both github-action-benchmark and critcmp with a more sophisticated solution.
  • Codspeed — runs benchmarks in their own infrastructure for consistent results, eliminating CI runner noise. Also free for open source.

We chose github-action-benchmark + critcmp as a lightweight starting point with no external dependencies. Migration to Bencher or Codspeed can be evaluated as the suite matures and if CI runner variance becomes a problem.

Test plan

AI disclosure

Used Claude Code for benchmark implementation, CI workflow development, and testing.

Add criterion benchmarks for Sprout JoinSplit Groth16 proof verification
in zebra-consensus. Measures single and unbatched verification at batch
sizes 2–64, plus input preparation costs (primary_inputs computation
and item creation). Uses cycled items from mainnet test blocks since
verification cost is constant per proof regardless of content.
Add criterion benchmarks for Orchard Halo2 proof verification in
zebra-consensus. Extracts real Orchard bundles from NU5 mainnet test
blocks and measures single and unbatched verification at batch sizes
2-32. Only exercises verify_single() since Item fields and the batch
trait are private.
Add criterion benchmarks for Sapling shielded data verification in
zebra-consensus. Extracts real Sapling bundles from mainnet test blocks
(28 items) and measures both unbatched (one-item batch per bundle) and
true batch verification at sizes 2-64. Batch verification shows ~5x
speedup at 64 bundles, validating the batching architecture.
Add criterion benchmarks for transaction deserialization and
serialization across all five Zcash transaction versions (V1-V5).
Extracts real transactions from mainnet test blocks at the appropriate
network upgrade heights. V5 deserialization is notably slower than
V1-V4 due to consensus branch ID validation and Orchard field parsing.
Adds a workflow_dispatch workflow that runs the full benchmark suite
using cargo-criterion and stores results via github-action-benchmark
on the gh-pages branch for historical tracking.

Features:
- Selective benchmarks via comma-separated input (or 'all')
- Configurable regression alert threshold
- Step summary table visible in the Actions UI
- Converts cargo-criterion JSON to customSmallerIsBetter format
When the "C-benchmark" label is added to a PR, runs all benchmarks on
both the base and PR branches, then posts a critcmp comparison table
as a PR comment. Updates the existing comment on re-runs.
- Pin github-action-benchmark to SHA
- Move GitHub context expressions to env blocks to prevent injection
- Remove unused env.BENCH_COMMAND
- Apply cargo fmt to benchmark files
Remove diagnostic eprintln! calls from benchmark files (flagged by
clippy print_stderr lint) and remove the resulting unused total_actions
variable in halo2.rs.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@oxarbitrage
Copy link
Copy Markdown
Contributor Author

Follow-up: Additional Benchmark Candidates

Analysis of the sync hot path identified several operations that are not yet benchmarked. These are candidates for follow-up PRs, prioritized by expected impact on sync time.

High Priority

Operation Call frequency Notes
Equihash solution verification Per-block Memory-hard PoW check, called for every block. Pure computation, easy to benchmark with test vectors.
Transparent script validation Per-transparent-input FFI to C++ zcash_script. Variable cost by script type (P2PKH vs P2SH). Requires previous outputs for sighash.
UTXO lookups Per-transparent-input RocksDB reads to fetch previous outputs. Often the sync bottleneck due to I/O. Harder to benchmark in isolation.
Sighash computation Per-transaction + per-input BLAKE2b-heavy precomputation and per-input finalization. Pure computation, easy to benchmark.

Medium Priority

Operation Call frequency Notes
Note commitment tree updates Per-block (scales with output count) Sprout/Sapling/Orchard incremental merkle trees.
Block finalization (state writes) Per-block Full RocksDB write batch: UTXOs, trees, indexes. Requires a populated database to benchmark realistically.

Easiest Next Steps

Equihash and sighash are the most straightforward to add — they are pure computation with no database or FFI setup required, and can reuse the existing test vector pattern from the current benchmarks.

Script validation and UTXO lookups are the biggest real-world sync bottlenecks but require more setup (FFI + previous outputs for scripts, populated RocksDB for UTXOs).

@mpguerra mpguerra requested review from arya2 and gustavovalverde and removed request for gustavovalverde April 7, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants