Skip to content

perf: solver optimizations#1728

Merged
ivokub merged 19 commits intomasterfrom
perf/logderiv-flat-level
Mar 12, 2026
Merged

perf: solver optimizations#1728
ivokub merged 19 commits intomasterfrom
perf/logderiv-flat-level

Conversation

@ivokub
Copy link
Copy Markdown
Collaborator

@ivokub ivokub commented Mar 6, 2026

Description

Several long-standing ideas to speed up solver:

  • implement batchinverse blueprint. We encounter a lot of inverses in logderivarg, which we use for range check and lookups.
  • flatten the solver levels -- in logderivarg to compute a sum of the argument we use acc = api.Add(acc, res) etc pattern. But this means that we add a single level every time we add a value. Concretely for range checking it means we could have millions of very small solver levels. This on the other hand means that we cannot utilize parallelization within a level.
  • in non-native arithmetic use big.Int pool in hints.
  • in non-native limb decomposition use And instead of Mod by (1 << N)-1 mask

Type of change

  • New feature (non-breaking change which adds functionality)

How has this been tested?

  • Test A
  • Test B

How has this been benchmarked?

Solver benchmark:

func BenchmarkBW6InBN254Commit(b *testing.B) {

	assert := test.NewAssert(b)
	innerCcs, innerVK, innerWitness, innerProof := getInnerCommit(assert, ecc.BW6_761.ScalarField(), ecc.BN254.ScalarField())

	// outer proof
	circuitVk, err := ValueOfVerifyingKey[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine](innerVK)
	assert.NoError(err)
	circuitWitness, err := ValueOfWitness[sw_bw6761.ScalarField](innerWitness)
	assert.NoError(err)
	circuitProof, err := ValueOfProof[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine](innerProof)
	assert.NoError(err)

	outerCircuit := &OuterCircuit[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine, sw_bw6761.GTEl]{
		InnerWitness: PlaceholderWitness[sw_bw6761.ScalarField](innerCcs),
		Proof:        PlaceholderProof[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine](innerCcs),
		VerifyingKey: circuitVk,
	}
	outerAssignment := &OuterCircuit[sw_bw6761.ScalarField, sw_bw6761.G1Affine, sw_bw6761.G2Affine, sw_bw6761.GTEl]{
		InnerWitness: circuitWitness,
		Proof:        circuitProof,
	}
	ccs, err := frontend.Compile(ecc.BN254.ScalarField(), scs.NewBuilder, outerCircuit)
	assert.NoError(err)
	var buf bytes.Buffer
	_, err = ccs.WriteTo(&buf)
	assert.NoError(err)
	b.Log("circuit size (bytes):", buf.Len())
	w, err := frontend.NewWitness(outerAssignment, ecc.BN254.ScalarField())
	assert.NoError(err)
	for b.Loop() {
		_, err = ccs.Solve(w)
		assert.NoError(err)
	}
}

difference (PLONK)

goos: linux
goarch: amd64
pkg: github.com/consensys/gnark/std/recursion/plonk
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics     
                    │  old.txt   │             new3.txt              │
                    │   sec/op   │   sec/op    vs base               │
BW6InBN254Commit-16   2.863 ± 5%   2.513 ± 7%  -12.23% (p=0.002 n=6)

                    │   old.txt    │              new3.txt              │
                    │     B/op     │     B/op      vs base              │
BW6InBN254Commit-16   2.975Gi ± 1%   2.911Gi ± 1%  -2.14% (p=0.002 n=6)

                    │   old.txt   │              new3.txt              │
                    │  allocs/op  │  allocs/op   vs base               │
BW6InBN254Commit-16   24.86M ± 0%   22.25M ± 0%  -10.48% (p=0.002 n=6)

and R1CS

goos: linux
goarch: amd64
pkg: github.com/consensys/gnark/std/recursion/plonk
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics     
                    │ old_r1cs.txt │           new_r1cs.txt            │
                    │    sec/op    │   sec/op    vs base               │
BW6InBN254Commit-16    2.234 ± 13%   1.994 ± 5%  -10.72% (p=0.002 n=6)

                    │ old_r1cs.txt │            new_r1cs.txt            │
                    │     B/op     │     B/op      vs base              │
BW6InBN254Commit-16   1.965Gi ± 0%   1.901Gi ± 0%  -3.23% (p=0.002 n=6)

                    │ old_r1cs.txt │            new_r1cs.txt            │
                    │  allocs/op   │  allocs/op   vs base               │
BW6InBN254Commit-16    23.91M ± 0%   21.31M ± 0%  -10.86% (p=0.002 n=6)

The circuit size also decreased from 541526980 bytes to 324975025 bytes. For R1CS there is increase from 226962625 bytes to 247155166 (perhaps because for R1CS before we used the generic blueprint before?).

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I did not modify files generated from templates
  • golangci-lint does not output errors locally
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Note

Medium Risk
Touches core constraint solving/compilation paths (new blueprint instruction, new API surface, and blueprint serialization tags), so regressions could affect circuit correctness or compatibility even though changes are performance-oriented.

Overview
Adds a new BlueprintBatchInverse instruction that computes many modular inverses with a single inversion via a prefix-product pass, and registers it for CBOR serialization.

R1CS and SCS builders now implement frontend.BatchInverter by emitting one batch-inversion instruction for variable inputs, computing constant inverses inline, and adding per-output verification constraints.

Optimizes solver-level structure in std/internal/logderivarg by summing quotients/inverses via an addition tree and using batch inversion when available.

Reduces allocation pressure in non-native arithmetic by pooling big.Int in emulated-field hint code and in limb decomposition, and replaces limb Mod by a bitmask And where applicable.

Written by Cursor Bugbot for commit cf241dc. This will update automatically on new commits. Configure here.

@ivokub ivokub marked this pull request as ready for review March 9, 2026 09:45
@ivokub ivokub requested a review from gbotrel March 9, 2026 09:45
@gbotrel
Copy link
Copy Markdown
Collaborator

gbotrel commented Mar 10, 2026

first pass looks good, tried to make it more invisible to the circuit developer by having the compiler detect inverses per level and batch them, but my first (claude) attempt was convoluted and messy, will try again soon

gbotrel
gbotrel previously approved these changes Mar 11, 2026
@ivokub ivokub merged commit 1c6aa6e into master Mar 12, 2026
15 of 16 checks passed
@ivokub ivokub deleted the perf/logderiv-flat-level branch March 12, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants