bench: add criterion benchmark suite and CI workflow by oxarbitrage · Pull Request #10444 · ZcashFoundation/zebra

oxarbitrage · 2026-04-01T11:18:46Z

Motivation

Zebra had minimal benchmarking coverage (only block serialization and RedPallas signatures). This makes it difficult to evaluate PRs for performance regressions or improvements — for example, PR #10436 removes groth16 abstractions but had no way to verify it was performance-neutral.

Solution

Adds a comprehensive benchmark suite covering Zebra's most expensive code paths, plus CI automation for tracking and comparison.

New benchmarks

Groth16 (zebra-consensus/benches/groth16.rs) — Sprout JoinSplit proof verification, single and unbatched at sizes 2-64, plus input preparation cost
Halo2 (zebra-consensus/benches/halo2.rs) — Orchard proof verification, single and unbatched at sizes 2-32
Sapling (zebra-consensus/benches/sapling.rs) — both unbatched and true batch verification at sizes 2-64 using sapling_crypto::BatchValidator
Transaction (zebra-chain/benches/transaction.rs) — per-version (V1-V5) deserialization and serialization from real mainnet blocks

CI workflow (`.github/workflows/benchmarks.yml`)

workflow_dispatch: runs all benchmarks, stores results on gh-pages via github-action-benchmark, generates a summary table in the Actions UI
PR comparison: adding the C-benchmark label to a PR runs benchmarks on both base and PR branches, posts a critcmp comparison table as a PR comment

Benchmark dashboard

Adds a step to book.yml that copies benchmark data from gh-pages into the docs output, making the historical trend chart available at zebra.zfnd.org/dev/bench/.

Test data

All benchmarks use real transactions from the hardcoded mainnet blocks in zebra-test. Items are cycled (repeated) to fill larger batch sizes — this is valid because cryptographic verification cost is constant per proof regardless of specific bytes. This limitation is documented in each benchmark file.

Alternatives considered

Bencher.dev — purpose-built SaaS for continuous benchmarking with statistical regression detection and confidence intervals. Free for open source. Could replace both github-action-benchmark and critcmp with a more sophisticated solution.
Codspeed — runs benchmarks in their own infrastructure for consistent results, eliminating CI runner noise. Also free for open source.

We chose github-action-benchmark + critcmp as a lightweight starting point with no external dependencies. Migration to Bencher or Codspeed can be evaluated as the suite matures and if CI runner variance becomes a problem.

Test plan

All benchmarks compile and produce valid output locally (cargo bench, cargo criterion --message-format=json)
CI workflow tested end-to-end on the benches branch (successful run)
Benchmark data stored on gh-pages branch
Used locally with critcmp to evaluate PR cleanup(zebra-consensus): remove remaining groth16 abstractions #10436 — found zero regressions and 8.65x V5 deserialization improvement

AI disclosure

Used Claude Code for benchmark implementation, CI workflow development, and testing.

Add criterion benchmarks for Sprout JoinSplit Groth16 proof verification in zebra-consensus. Measures single and unbatched verification at batch sizes 2–64, plus input preparation costs (primary_inputs computation and item creation). Uses cycled items from mainnet test blocks since verification cost is constant per proof regardless of content.

Add criterion benchmarks for Orchard Halo2 proof verification in zebra-consensus. Extracts real Orchard bundles from NU5 mainnet test blocks and measures single and unbatched verification at batch sizes 2-32. Only exercises verify_single() since Item fields and the batch trait are private.

Add criterion benchmarks for Sapling shielded data verification in zebra-consensus. Extracts real Sapling bundles from mainnet test blocks (28 items) and measures both unbatched (one-item batch per bundle) and true batch verification at sizes 2-64. Batch verification shows ~5x speedup at 64 bundles, validating the batching architecture.

Add criterion benchmarks for transaction deserialization and serialization across all five Zcash transaction versions (V1-V5). Extracts real transactions from mainnet test blocks at the appropriate network upgrade heights. V5 deserialization is notably slower than V1-V4 due to consensus branch ID validation and Orchard field parsing.

Adds a workflow_dispatch workflow that runs the full benchmark suite using cargo-criterion and stores results via github-action-benchmark on the gh-pages branch for historical tracking. Features: - Selective benchmarks via comma-separated input (or 'all') - Configurable regression alert threshold - Step summary table visible in the Actions UI - Converts cargo-criterion JSON to customSmallerIsBetter format

When the "C-benchmark" label is added to a PR, runs all benchmarks on both the base and PR branches, then posts a critcmp comparison table as a PR comment. Updates the existing comment on re-runs.

.github/workflows/benchmarks.yml

- Pin github-action-benchmark to SHA - Move GitHub context expressions to env blocks to prevent injection - Remove unused env.BENCH_COMMAND - Apply cargo fmt to benchmark files

Remove diagnostic eprintln! calls from benchmark files (flagged by clippy print_stderr lint) and remove the resulting unused total_actions variable in halo2.rs. Co-Authored-By: Claude Opus 4.6 <[email protected]>

oxarbitrage · 2026-04-01T12:42:21Z

Follow-up: Additional Benchmark Candidates

Analysis of the sync hot path identified several operations that are not yet benchmarked. These are candidates for follow-up PRs, prioritized by expected impact on sync time.

High Priority

Operation	Call frequency	Notes
Equihash solution verification	Per-block	Memory-hard PoW check, called for every block. Pure computation, easy to benchmark with test vectors.
Transparent script validation	Per-transparent-input	FFI to C++ `zcash_script`. Variable cost by script type (P2PKH vs P2SH). Requires previous outputs for sighash.
UTXO lookups	Per-transparent-input	RocksDB reads to fetch previous outputs. Often the sync bottleneck due to I/O. Harder to benchmark in isolation.
Sighash computation	Per-transaction + per-input	BLAKE2b-heavy precomputation and per-input finalization. Pure computation, easy to benchmark.

Medium Priority

Operation	Call frequency	Notes
Note commitment tree updates	Per-block (scales with output count)	Sprout/Sapling/Orchard incremental merkle trees.
Block finalization (state writes)	Per-block	Full RocksDB write batch: UTXOs, trees, indexes. Requires a populated database to benchmark realistically.

Easiest Next Steps

Equihash and sighash are the most straightforward to add — they are pure computation with no database or FFI setup required, and can reuse the existing test vector pattern from the current benchmarks.

Script validation and UTXO lookups are the biggest real-world sync bottlenecks but require more setup (FFI + previous outputs for scripts, populated RocksDB for UTXOs).

oxarbitrage added 6 commits March 29, 2026 09:28

ci: add benchmark comparison on PRs via critcmp

b8bc417

When the "C-benchmark" label is added to a PR, runs all benchmarks on both the base and PR branches, then posts a critcmp comparison table as a PR comment. Updates the existing comment on re-runs.

oxarbitrage temporarily deployed to dev April 1, 2026 11:18 — with GitHub Actions Inactive

github-advanced-security bot found potential problems Apr 1, 2026

View reviewed changes

oxarbitrage temporarily deployed to dev April 1, 2026 11:21 — with GitHub Actions Inactive

oxarbitrage temporarily deployed to dev April 1, 2026 11:27 — with GitHub Actions Inactive

oxarbitrage temporarily deployed to dev April 1, 2026 11:29 — with GitHub Actions Inactive

oxarbitrage force-pushed the benches branch from 476a4ae to deda887 Compare April 1, 2026 11:31

oxarbitrage temporarily deployed to dev April 1, 2026 11:31 — with GitHub Actions Inactive

oxarbitrage had a problem deploying to dev April 1, 2026 11:31 — with GitHub Actions Error

fix: run cargo fmt and fix zizmor warnings

fc56a14

- Pin github-action-benchmark to SHA - Move GitHub context expressions to env blocks to prevent injection - Remove unused env.BENCH_COMMAND - Apply cargo fmt to benchmark files

oxarbitrage force-pushed the benches branch from deda887 to fc56a14 Compare April 1, 2026 11:33

oxarbitrage temporarily deployed to dev April 1, 2026 11:33 — with GitHub Actions Inactive

oxarbitrage temporarily deployed to dev April 1, 2026 11:34 — with GitHub Actions Inactive

oxarbitrage temporarily deployed to dev April 1, 2026 11:36 — with GitHub Actions Inactive

fix: Remove eprintln and unused variable from benchmarks

52f5627

Remove diagnostic eprintln! calls from benchmark files (flagged by clippy print_stderr lint) and remove the resulting unused total_actions variable in halo2.rs. Co-Authored-By: Claude Opus 4.6 <[email protected]>

oxarbitrage temporarily deployed to dev April 1, 2026 11:47 — with GitHub Actions Inactive

oxarbitrage temporarily deployed to dev April 1, 2026 11:50 — with GitHub Actions Inactive

oxarbitrage mentioned this pull request Apr 1, 2026

cleanup(zebra-consensus): remove remaining groth16 abstractions #10436

Open

6 tasks

mpguerra requested review from arya2 and gustavovalverde and removed request for gustavovalverde April 7, 2026 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: add criterion benchmark suite and CI workflow#10444

bench: add criterion benchmark suite and CI workflow#10444
oxarbitrage wants to merge 8 commits intomainfrom
benches

oxarbitrage commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oxarbitrage commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oxarbitrage commented Apr 1, 2026

Motivation

Solution

New benchmarks

CI workflow (.github/workflows/benchmarks.yml)

Benchmark dashboard

Test data

Alternatives considered

Test plan

AI disclosure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oxarbitrage commented Apr 1, 2026

Follow-up: Additional Benchmark Candidates

High Priority

Medium Priority

Easiest Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CI workflow (`.github/workflows/benchmarks.yml`)