Conversation
Collaborator
|
first pass looks good, tried to make it more invisible to the circuit developer by having the compiler detect inverses per level and batch them, but my first (claude) attempt was convoluted and messy, will try again soon |
gbotrel
previously approved these changes
Mar 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Several long-standing ideas to speed up solver:
acc = api.Add(acc, res)etc pattern. But this means that we add a single level every time we add a value. Concretely for range checking it means we could have millions of very small solver levels. This on the other hand means that we cannot utilize parallelization within a level.(1 << N)-1maskType of change
How has this been tested?
How has this been benchmarked?
Solver benchmark:
difference (PLONK)
and R1CS
The circuit size also decreased from 541526980 bytes to 324975025 bytes. For R1CS there is increase from 226962625 bytes to 247155166 (perhaps because for R1CS before we used the generic blueprint before?).
Checklist:
golangci-lintdoes not output errors locallyNote
Medium Risk
Touches core constraint solving/compilation paths (new blueprint instruction, new API surface, and blueprint serialization tags), so regressions could affect circuit correctness or compatibility even though changes are performance-oriented.
Overview
Adds a new
BlueprintBatchInverseinstruction that computes many modular inverses with a single inversion via a prefix-product pass, and registers it for CBOR serialization.R1CS and SCS builders now implement
frontend.BatchInverterby emitting one batch-inversion instruction for variable inputs, computing constant inverses inline, and adding per-output verification constraints.Optimizes solver-level structure in
std/internal/logderivargby summing quotients/inverses via an addition tree and using batch inversion when available.Reduces allocation pressure in non-native arithmetic by pooling
big.Intin emulated-field hint code and in limb decomposition, and replaces limbModby a bitmaskAndwhere applicable.Written by Cursor Bugbot for commit cf241dc. This will update automatically on new commits. Configure here.