perf: optimize emulated ToBits and ToBitsCanonical#1707
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes emulated-field bit decomposition in std/math/emulated by (1) adding a per-Field cache for ToBits results and (2) rewriting ToBitsCanonical to avoid redundant bit decompositions by inlining the in-range comparison.
Changes:
- Add
Field.bitCachekeyed by limb hash codes (+ overflow) and use it inField.ToBits. - Inline the “< modulus” bitwise comparison in
ToBitsCanonicalto reuse the already-computed reduced element bits. - Add cache key computation helper (
computeBitCacheKey) and associated types.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
std/math/emulated/field_binary.go |
Adds ToBits caching and rewrites ToBitsCanonical to avoid redundant ToBits calls via inlined comparison logic. |
std/math/emulated/field.go |
Introduces bitCache storage on Field and implements cache key computation based on limb HashCode() + overflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ivokub
left a comment
There was a problem hiding this comment.
Very quickly skimming through -- I think we don't need to keep the cache in Field, but actually can store it in Element already. We assume that already initialized Element values do not mutate (we take as input pointers and return new pointers from emulated.Field). This is same way how we implement modreduced and internal to indicate if the value is already reduced etc.
Imo this simplifies the implementation a bit as we don't need the per-field cache and is a bit more idiomatic. How it would look: we have field bitsDecomposition in Element and if we do binary decomposition anywhere then we set it to the bits. And then when we do it somewhere else we check if bitsDecomposition != nil and use it.
ivokub
left a comment
There was a problem hiding this comment.
Thanks! Looks good. I have checked we haven't created any unsound paths.
I also refactored the common part into a separate function, I have actually an idea how to optimize the check using range checking (instead of doing bitwise, we can do limb-wise as in function assertBytesLeq in std/conversion/conversion.go)
ivokub
left a comment
There was a problem hiding this comment.
PS! Please have a look at my changes that they make sense before merging. Otherwise it is good to merge on my side!
Summary
This PR implements two optimizations for emulated field bit decomposition in gnark:
ToBitscalls by inlining the comparison logicToBitsis called multiple times on the same elementConstraint Improvements
ToBitsCanonical Optimization
The original
ToBitsCanonicalcalledReduceStrictfollowed byAssertIsInRange, which internally calledToBitstwice on the same reduced element. The optimization inlines the comparison logic to reuse the bits.Bit Caching
Caches bit decompositions at the Field level using element limb hash codes as keys. Benefits circuits that call
ToBitsmultiple times on the same element.Real-world circuits benefiting from caching
sig.RhasToBitscalled twice (once inAssertIsLessOrEqual, once explicitly)AssertIsLessOrEqual, the bound's bits are cached and reusedToBitsonmodPrev(modulus-1) which is constant and reused across allToBitsCanonicalcallsTechnical Details
ToBitsCanonical Optimization
The original implementation:
The optimized implementation inlines the comparison to reuse bits:
Bit Caching Implementation
Cache stored at Field level (not Element level) to avoid determinism issues during circuit recompilation:
The cache key is computed from:
This ensures the same element (same limb variables + overflow) returns cached bits.
Files Changed
std/math/emulated/field.gobitCacheKeytype,bitCachemap, andcomputeBitCacheKeymethodstd/math/emulated/field_binary.goToBitsCanonical, added caching toToBitsTesting
go test ./std/math/emulated/...)Checklist
golangci-lintdoes not output errors locallyNote
Medium Risk
Touches core emulated-field bit decomposition and comparison constraints; bugs in caching invalidation or the inlined range-check could silently weaken constraints or change circuit determinism.
Overview
Improves performance of emulated-field bit decomposition by caching
ToBitsresults on eachElement(keyed byoverflow), including cache reset onInitializeand deep-copy support.Refactors
AssertIsLessOrEqualto reuse a newassertIsLessOrEqualBitshelper, and rewritesToBitsCanonicalto avoid redundantToBits/AssertIsInRangework by comparing the reduced element’s bits directly against(modulus-1).Updates tests to call
ToBits/ToBitsCanonicaltwice and assert the results match, guarding correctness of the new caching behavior.Written by Cursor Bugbot for commit 54dfc30. This will update automatically on new commits. Configure here.