perf: solver optimizations by ivokub · Pull Request #1728 · Consensys/gnark

ivokub · 2026-03-06T12:59:11Z

Description

Several long-standing ideas to speed up solver:

implement batchinverse blueprint. We encounter a lot of inverses in logderivarg, which we use for range check and lookups.
flatten the solver levels -- in logderivarg to compute a sum of the argument we use acc = api.Add(acc, res) etc pattern. But this means that we add a single level every time we add a value. Concretely for range checking it means we could have millions of very small solver levels. This on the other hand means that we cannot utilize parallelization within a level.
in non-native arithmetic use big.Int pool in hints.
in non-native limb decomposition use And instead of Mod by (1 << N)-1 mask

Type of change

New feature (non-breaking change which adds functionality)

How has this been tested?

Test A
Test B

How has this been benchmarked?

Solver benchmark:

func BenchmarkBW6InBN254Commit(b *testing.B) {

	assert := test.NewAssert(b)
	innerCcs, innerVK, innerWitness, innerProof := getInnerCommit(assert, ecc.BW6_761.ScalarField(), ecc.BN254.ScalarField())

	// outer proof
	circuitVk, err := ValueOfVerifyingKey[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine](innerVK)
	assert.NoError(err)
	circuitWitness, err := ValueOfWitness[sw_bw6761.ScalarField](innerWitness)
	assert.NoError(err)
	circuitProof, err := ValueOfProof[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine](innerProof)
	assert.NoError(err)

	outerCircuit := &OuterCircuit[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine, sw_bw6761.GTEl]{
		InnerWitness: PlaceholderWitness[sw_bw6761.ScalarField](innerCcs),
		Proof:        PlaceholderProof[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine](innerCcs),
		VerifyingKey: circuitVk,
	}
	outerAssignment := &OuterCircuit[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine, sw_bw6761.GTEl]{
		InnerWitness: circuitWitness,
		Proof:        circuitProof,
	}
	ccs, err := frontend.Compile(ecc.BN254.ScalarField(), scs.NewBuilder, outerCircuit)
	assert.NoError(err)
	var buf bytes.Buffer
	_, err = ccs.WriteTo(&buf)
	assert.NoError(err)
	b.Log("circuit size (bytes):", buf.Len())
	w, err := frontend.NewWitness(outerAssignment, ecc.BN254.ScalarField())
	assert.NoError(err)
	for b.Loop() {
		_, err = ccs.Solve(w)
		assert.NoError(err)
	}
}

difference (PLONK)

goos: linux
goarch: amd64
pkg: github.com/consensys/gnark/std/recursion/plonk
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics     
                    │  old.txt   │             new3.txt              │
                    │   sec/op   │   sec/op    vs base               │
BW6InBN254Commit-16   2.863 ± 5%   2.513 ± 7%  -12.23% (p=0.002 n=6)

                    │   old.txt    │              new3.txt              │
                    │     B/op     │     B/op      vs base              │
BW6InBN254Commit-16   2.975Gi ± 1%   2.911Gi ± 1%  -2.14% (p=0.002 n=6)

                    │   old.txt   │              new3.txt              │
                    │  allocs/op  │  allocs/op   vs base               │
BW6InBN254Commit-16   24.86M ± 0%   22.25M ± 0%  -10.48% (p=0.002 n=6)

and R1CS

goos: linux
goarch: amd64
pkg: github.com/consensys/gnark/std/recursion/plonk
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics     
                    │ old_r1cs.txt │           new_r1cs.txt            │
                    │    sec/op    │   sec/op    vs base               │
BW6InBN254Commit-16    2.234 ± 13%   1.994 ± 5%  -10.72% (p=0.002 n=6)

                    │ old_r1cs.txt │            new_r1cs.txt            │
                    │     B/op     │     B/op      vs base              │
BW6InBN254Commit-16   1.965Gi ± 0%   1.901Gi ± 0%  -3.23% (p=0.002 n=6)

                    │ old_r1cs.txt │            new_r1cs.txt            │
                    │  allocs/op   │  allocs/op   vs base               │
BW6InBN254Commit-16    23.91M ± 0%   21.31M ± 0%  -10.86% (p=0.002 n=6)

The circuit size also decreased from 541526980 bytes to 324975025 bytes. For R1CS there is increase from 226962625 bytes to 247155166 (perhaps because for R1CS before we used the generic blueprint before?).

Checklist:

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I did not modify files generated from templates
golangci-lint does not output errors locally
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Note

Medium Risk
Touches core constraint solving/compilation paths (new blueprint instruction, new API surface, and blueprint serialization tags), so regressions could affect circuit correctness or compatibility even though changes are performance-oriented.

Overview
Adds a new BlueprintBatchInverse instruction that computes many modular inverses with a single inversion via a prefix-product pass, and registers it for CBOR serialization.

R1CS and SCS builders now implement frontend.BatchInverter by emitting one batch-inversion instruction for variable inputs, computing constant inverses inline, and adding per-output verification constraints.

Optimizes solver-level structure in std/internal/logderivarg by summing quotients/inverses via an addition tree and using batch inversion when available.

Reduces allocation pressure in non-native arithmetic by pooling big.Int in emulated-field hint code and in limb decomposition, and replaces limb Mod by a bitmask And where applicable.

^{Written by Cursor Bugbot for commit cf241dc. This will update automatically on new commits. Configure here.}

gbotrel · 2026-03-10T19:53:43Z

first pass looks good, tried to make it more invisible to the circuit developer by having the compiler detect inverses per level and batch them, but my first (claude) attempt was convoluted and messy, will try again soon

ivokub added 16 commits March 6, 2026 12:34

perf: use non-accumulating sum in logderiv

0675a27

perf: use AND for splitting and sync.Pool

d2ce044

perf: use big int pool in multiplication hints

1b33350

chore: refactor

aec5aa2

chore: rename var

1b531a9

feat: implement BatchInvert method in Field

2d4aefa

fix: check inverse input zero before call to modinverse

bd16976

feat: implement BatchInverse in test engine blueprint solver

4eaa8f5

feat: add batchinvert blueprint

24f350d

feat: register batchinverse blueprint serialization

4ac5cb3

refactor: allow also LEs in batchinverse blueprint

ef82971

feat: use batchinverse blueprint

c6c7361

chore: new stats (better LE splitting)

0f805a1

chore: remove batchinverse

9572754

feat: do batchinverse in blueprint directly

8d4bd4f

chore: regenerate gkr circuit

78e7545

ivokub marked this pull request as ready for review March 9, 2026 09:45

ivokub requested a review from gbotrel March 9, 2026 09:45

gbotrel previously approved these changes Mar 11, 2026

View reviewed changes

Merge branch 'master' into perf/logderiv-flat-level

996bd3c

ivokub dismissed gbotrel’s stale review via 996bd3c March 11, 2026 15:26

ivokub added 2 commits March 11, 2026 16:29

fix: merge conflict resolution

c4fb0ed

Merge branch 'master' into perf/logderiv-flat-level

cf241dc

ivokub merged commit 1c6aa6e into master Mar 12, 2026
15 of 16 checks passed

ivokub deleted the perf/logderiv-flat-level branch March 12, 2026 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: solver optimizations#1728

perf: solver optimizations#1728
ivokub merged 19 commits intomasterfrom
perf/logderiv-flat-level

ivokub commented Mar 6, 2026 •

edited by cursor bot

Loading

Uh oh!

gbotrel commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ivokub commented Mar 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How has this been tested?

How has this been benchmarked?

Checklist:

Uh oh!

gbotrel commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ivokub commented Mar 6, 2026 •

edited by cursor bot

Loading