Skip to content

Ci benchmarks#2019

Merged
ArthurZucker merged 39 commits into
mainfrom
ci-benchmarks
Apr 10, 2026
Merged

Ci benchmarks#2019
ArthurZucker merged 39 commits into
mainfrom
ci-benchmarks

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

No description provided.

Add a consolidated benchmark suite (`ci_benchmark`) and a GitHub Actions
workflow that automatically compares performance against the main branch
baseline on every PR.

Benchmark coverage (13 measurements, ~4 min on CI):
  - BPE GPT-2: encode, batch, no-cache, batch-no-cache
  - Llama-3: encode, batch, encode-fast, char-offsets, concurrent-4t
  - Serialization: GPT-2 load, Llama-3 load, Llama-3 save
  - Training: BPE small corpus

CI workflow:
  - On push to main: run benchmarks, store baseline in gh-pages branch
  - On PR: run benchmarks, compare vs baseline, post/update a single
    PR comment with the delta table
  - Alert threshold: 15% regression (warn, don't fail)

Uses benchmark-action/github-action-benchmark with criterion's bencher
output format for machine-readable results.
- pytest-benchmark based test suite covering:
  - BPE GPT-2: encode, encode_batch, multithreaded (4 workers)
  - Llama-3: encode, encode_batch, encode_fast, multithreaded, decode_batch
  - Async: async_encode_batch, async_encode_batch_fast
  - Serialization: from_file, to_str, from_str (roberta, llama3, albert)
  - Training: BPE small corpus
- CI workflow: separate benchmark-python job with sccache + maturin
- Re-enabled ci-benchmarks branch trigger for testing
- New benchmark-trigger.yml: maintainer comments '/benchmark' on a PR
  to dispatch the benchmark workflow on the PR's ref
- Upload steps gated on github.event_name == 'push' so workflow_dispatch
  (PR runs) never overwrite the baseline
- Trigger requires MEMBER/OWNER/COLLABORATOR association
…en done

- benchmark-trigger.yml creates a 'Benchmark Results' check on the PR head SHA
- benchmarks.yml marks it in_progress at start, completed (success/failure) at end
- The check body contains the comparison markdown table
- Can be made a required check in branch protection rules
Rust:
- Push to main: runs with --save-baseline main, uploads criterion
  data (tar.gz) + bencher output to HF Hub
- workflow_dispatch: downloads criterion baseline, runs with --baseline main
  for automatic criterion comparison
- criterion HTML report uploaded as GitHub Actions artifact (30 day retention)

Python:
- bench_output.json uploaded as GitHub Actions artifact
- Baseline stored/compared via HF Hub as before

Both:
- Artifacts downloadable from the workflow run page for manual inspection
- Comparison tables posted to PR comments
ArthurZucker and others added 4 commits April 10, 2026 12:00
… coding

- Switch both jobs to ubuntu-latest-4-cores for more consistent results
- Replace plaintext markdown tables with SVG charts generated by
  .github/scripts/render_bench_svg.py
- Dark theme, monospace font, red/green bars + delta percentages
- SVGs uploaded as artifacts and embedded in PR comments
- Supports both Rust (bencher format) and Python (pytest-benchmark JSON)
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

/benchmark

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread .github/workflows/benchmarks.yml Outdated
Comment thread .github/workflows/benchmarks.yml
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

/benchmark

Comment thread .github/workflows/benchmarks.yml Outdated
ArthurZucker and others added 2 commits April 10, 2026 16:11
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
SVG doesn't render in GitHub PR comments. Now:
- render_bench_svg.py supports --output .png via cairosvg
- PNGs uploaded to hf-internal-testing/tokenizers-bench/charts/
- PR comments embed as ![img](hf_url) which GitHub renders correctly
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

/benchmark

@huggingface huggingface deleted a comment from github-actions Bot Apr 10, 2026
@github-actions
Copy link
Copy Markdown

Rust Benchmark Results

Commit: dcd29c0420df77c06adf6df3b017b02eec5513b3

Rust Benchmarks

@ArthurZucker ArthurZucker merged commit efbcc68 into main Apr 10, 2026
35 checks passed
@ArthurZucker ArthurZucker deleted the ci-benchmarks branch April 10, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants