perf: optimize emulated multi-miller loops via sparse×sparse line multiplications for 0-bits by yelhousni · Pull Request #1701 · Consensys/gnark

yelhousni · 2026-02-06T00:08:14Z

Description

This PR optimizes the Miller loop in emulated pairing circuits by batching sparse line multiplications across pairs. When processing single lines per pair (0-bit iterations), instead of multiplying each line individually with the accumulator, we batch lines 2-by-2 using sparse×sparse multiplication, then multiply the semi-sparse result with the accumulator.

Optimization Pattern

For BLS12-381 and BN254, the optimization applies to:

Main loop case 0: When the loop counter bit is 0, there's only one line per pair
First iteration (k ≥ 2): Initial accumulation of lines beyond the first two pairs

The key insight is that multiplying two sparse lines together produces a semi-sparse result (with fewer non-zero coefficients than a dense element), which can then be multiplied more efficiently with the dense accumulator.

Changes by Curve

BLS12-381 (sw_bls12381/pairing.go, fields_bls12381/e12_pairing.go):

Added Mul02368By02368ThenMul: combines sparse×sparse product with dense multiplication
Added MulBySemiSparse1_7: specialized multiplication where positions 1 and 7 are zero
Updated millerLoopLines to batch lines 2-by-2 across pairs

BN254 (sw_bn254/pairing.go):

Updated case 0 in main loop to use Mul01379By01379 + MulBy012346789 for 2-by-2 batching
Updated first iteration (k ≥ 2) with same optimization
Note: BN254 already had the sparse×sparse batching within pairs for non-zero bits

BW6-761 (sw_bw6761/pairing.go):

Updated first iteration (k ≥ 2) to use Mul023By023 + MulBy02345
Note: Main loop case 0 already had this optimization

Type of change

Performance improvement (non-breaking change that improves efficiency)

Benchmarks

PairingCheck SCS Constraint Counts in a BN254 circuit

Curve	n	Before	After	Δ	Improvement
BLS12-381	2	2,063,666	1,915,970	-147,696	7.2%
	4	3,507,950	3,212,558	-295,392	8.4%
	10	7,840,802	7,102,322	-738,480	9.4%
BN254	2	1,780,197	1,711,117	-69,080	3.9%
	4	2,963,267	2,825,107	-138,160	4.7%
	10	6,512,477	6,167,077	-345,400	5.3%

BW6-761 had already the main optimization present; only the first iteration was updated (minimal impact).
BN254 already had the sparse×sparse batching within pairs for non-zero bits.

Applications

ECPairBLS precompile:

n	Before	After	Δ	%
2	2,785,603	2,763,742	-21,861	0.8%
3	4,023,178	3,990,316	-32,862	0.8%
4	5,260,753	5,216,890	-43,863	0.8%
10	12,686,203	12,576,334	-109,869	0.9%
20	25,061,953	24,842,074	-219,879	0.9%
50	62,189,203	61,639,294	-549,909	0.9%

BLS12-381 gained ~0.9% because the non-zero bit handling changed from 2× MulBy02368 to 1× Mul02368By02368ThenMul
BN254 shows no change because it already had Mul01379By01379 + MulBy012346789 for non-zero bits

How has this been tested?

All existing pairing tests pass for BLS12-381, BN254, and BW6-761
TestPairTestSolve, TestPairFixedTestSolve, TestPairingCheckTestSolve pass
TestPairingMuxes with varying pair counts (0-5) pass

Checklist:

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I did not modify files generated from templates
golangci-lint does not output errors locally
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Note

Medium Risk
Touches core pairing arithmetic (Miller loop multiplication paths) across multiple curves, so any formula/indexing mistake could silently break correctness despite being a performance-focused change.

Overview
Performance optimization for emulated pairing circuits by batching sparse line evaluations during multi-Miller loops, replacing repeated accumulator×line multiplications with sparse×sparse line products followed by a cheaper semi-sparse multiply.

For BLS12-381, adds new Ext12 helpers (Mul02368By02368ThenMul and MulBySemiSparse1_7) and updates millerLoopLines to batch 0-bit iterations across pairs and to combine the two within-pair line multiplications into a single fused operation; it also factors final exponentiation’s hard part into finalExpHardPart (logic preserved).

For BN254 and BW6-761, updates the first-iteration accumulation and 0-bit handling to batch independent lines 2-by-2 using existing sparse×sparse helpers (Mul01379By01379/MulBy012346789, Mul023By023/MulBy02345). Benchmark stats in internal/stats/latest_stats.csv are updated accordingly.

^{Written by Cursor Bugbot for commit 4178589. This will update automatically on new commits. Configure here.}

Copilot

Pull request overview

This PR optimizes the Miller loop in emulated pairing circuits by batching sparse line evaluations 2-by-2 across pairs when processing single lines per pair (0-bit iterations). Instead of multiplying each sparse line individually with the dense accumulator, pairs of sparse lines are first multiplied together using sparse×sparse multiplication to produce a semi-sparse result, which is then more efficiently multiplied with the accumulator. This optimization applies to BLS12-381, BN254, and BW6-761 curves.

Changes:

Added Mul02368By02368ThenMul and MulBySemiSparse1_7 methods for BLS12-381 to support batched sparse×sparse line multiplication
Updated Miller loop implementations in BLS12-381, BN254, and BW6-761 to batch lines 2-by-2 in 0-bit cases and initial iterations
Refactored BLS12-381 FinalExponentiation to extract hard part into separate finalExpHardPart method

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`std/algebra/emulated/fields_bls12381/e12_pairing.go`	Adds `Mul02368By02368ThenMul` for sparse×sparse line multiplication and `MulBySemiSparse1_7` for multiplying by semi-sparse elements with zeros at positions 1 and 7
`std/algebra/emulated/sw_bls12381/pairing.go`	Applies 2-by-2 batching optimization to first iteration and 0-bit cases in Miller loop; refactors final exponentiation hard part into separate method
`std/algebra/emulated/sw_bn254/pairing.go`	Applies 2-by-2 batching to case 0 in main loop and k≥2 in first iteration
`std/algebra/emulated/sw_bw6761/pairing.go`	Applies 2-by-2 batching to k≥2 in first iteration (main loop already had optimization)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This reverts commit 16df6b6.

perf: multi-miller loops

e29007a

yelhousni added this to the v0.14.N milestone Feb 6, 2026

yelhousni requested review from Copilot and ivokub February 6, 2026 00:08

yelhousni self-assigned this Feb 6, 2026

yelhousni added type: perf dep: linea Issues affecting Linea downstream labels Feb 6, 2026

Copilot started reviewing on behalf of yelhousni February 6, 2026 00:08 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

yelhousni added 3 commits February 6, 2026 07:40

test: update stats

8639f89

perf: native bls12-377 to use 2 densexsparse lines

16df6b6

Revert "perf: native bls12-377 to use 2 densexsparse lines"

aed7026

This reverts commit 16df6b6.

yelhousni requested review from ThomasPiellard and gbotrel February 6, 2026 13:37

yelhousni assigned Tabaie and unassigned Tabaie Feb 6, 2026

yelhousni requested review from Tabaie and YaoJGalteland February 6, 2026 13:37

yelhousni added 2 commits February 10, 2026 11:10

Merge branch 'master' into perf/pairing

d1df794

Merge branch 'master' into perf/pairing

4178589

ThomasPiellard reviewed Feb 13, 2026

View reviewed changes

Comment thread std/algebra/emulated/sw_bls12381/pairing.go

ThomasPiellard approved these changes Feb 14, 2026

View reviewed changes

yelhousni merged commit df2294c into master Feb 17, 2026
13 checks passed

yelhousni deleted the perf/pairing branch February 17, 2026 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize emulated multi-miller loops via sparse×sparse line multiplications for 0-bits#1701

perf: optimize emulated multi-miller loops via sparse×sparse line multiplications for 0-bits#1701
yelhousni merged 6 commits intomasterfrom
perf/pairing

yelhousni commented Feb 6, 2026 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yelhousni commented Feb 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Optimization Pattern

Changes by Curve

Type of change

Benchmarks

PairingCheck SCS Constraint Counts in a BN254 circuit

Applications

ECPairBLS precompile:

How has this been tested?

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yelhousni commented Feb 6, 2026 •

edited by cursor bot

Loading