Skip to content

perf: optimize emulated ToBits and ToBitsCanonical#1707

Merged
yelhousni merged 15 commits intomasterfrom
perf/emulated
Mar 4, 2026
Merged

perf: optimize emulated ToBits and ToBitsCanonical#1707
yelhousni merged 15 commits intomasterfrom
perf/emulated

Conversation

@yelhousni
Copy link
Copy Markdown
Contributor

@yelhousni yelhousni commented Feb 10, 2026

Summary

This PR implements two optimizations for emulated field bit decomposition in gnark:

  1. Inline ToBitsCanonical - Avoids redundant ToBits calls by inlining the comparison logic
  2. Cache bit decomposition - Caches computed bits at the Field level to avoid recomputation when ToBits is called multiple times on the same element

Constraint Improvements

ToBitsCanonical Optimization

The original ToBitsCanonical called ReduceStrict followed by AssertIsInRange, which internally called ToBits twice on the same reduced element. The optimization inlines the comparison logic to reuse the bits.

Operation Before After Saved Reduction
ToBitsCanonical (BLS12-381 Fr) ~2,880 2,378 -502 ~17%

Bit Caching

Caches bit decompositions at the Field level using element limb hash codes as keys. Benefits circuits that call ToBits multiple times on the same element.

Circuit Before After Saved Reduction
Triple ToBits (same element) 1,920 896 -1,024 53%
ECDSA Secp256k1 Verify 343,210 342,698 -512 0.15%
Multi-compare (3x same bound) 11,145 10,119 -1,026 9.2%

Real-world circuits benefiting from caching

  1. ECDSA Verification - sig.R has ToBits called twice (once in AssertIsLessOrEqual, once explicitly)
  2. Multi-element comparisons - When comparing multiple elements against the same bound using AssertIsLessOrEqual, the bound's bits are cached and reused
  3. ToBitsCanonical - Calls ToBits on modPrev (modulus-1) which is constant and reused across all ToBitsCanonical calls

Technical Details

ToBitsCanonical Optimization

The original implementation:

func (f *Field[T]) ToBitsCanonical(a *Element[T]) []frontend.Variable {
    ca := f.ReduceStrict(a)        // Calls ToBits internally
    f.AssertIsInRange(ca)          // Calls ToBits again on same element!
    return f.ToBits(ca)[:nbBits]   // Third ToBits call
}

The optimized implementation inlines the comparison to reuse bits:

func (f *Field[T]) ToBitsCanonical(a *Element[T]) []frontend.Variable {
    ca := f.reduce(a, true)        // Strict reduction
    caBits := f.ToBits(ca)         // Single ToBits call
    modPrevBits := f.ToBits(modPrev) // Constant, cached
    // Inline comparison using caBits and modPrevBits
    // ... (no additional ToBits calls)
    return caBits[:nbBits]
}

Bit Caching Implementation

Cache stored at Field level (not Element level) to avoid determinism issues during circuit recompilation:

type bitCacheKey struct {
    limbHashes [8][16]byte  // hash codes of first 8 limbs
    numLimbs   int
    overflow   uint
}

type Field[T FieldParams] struct {
    // ...
    bitCache map[bitCacheKey][]frontend.Variable
}

The cache key is computed from:

  • Hash codes of the element's limb variables
  • Number of limbs
  • Overflow value

This ensures the same element (same limb variables + overflow) returns cached bits.

Files Changed

File Changes
std/math/emulated/field.go Added bitCacheKey type, bitCache map, and computeBitCacheKey method
std/math/emulated/field_binary.go Optimized ToBitsCanonical, added caching to ToBits

Testing

  • All existing tests pass (go test ./std/math/emulated/...)
  • Determinism tests pass (caching at Field level avoids recompilation issues)

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I did not modify files generated from templates
  • golangci-lint does not output errors locally
  • New and existing unit tests pass locally with my changes

Note

Medium Risk
Touches core emulated-field bit decomposition and comparison constraints; bugs in caching invalidation or the inlined range-check could silently weaken constraints or change circuit determinism.

Overview
Improves performance of emulated-field bit decomposition by caching ToBits results on each Element (keyed by overflow), including cache reset on Initialize and deep-copy support.

Refactors AssertIsLessOrEqual to reuse a new assertIsLessOrEqualBits helper, and rewrites ToBitsCanonical to avoid redundant ToBits/AssertIsInRange work by comparing the reduced element’s bits directly against (modulus-1).

Updates tests to call ToBits/ToBitsCanonical twice and assert the results match, guarding correctness of the new caching behavior.

Written by Cursor Bugbot for commit 54dfc30. This will update automatically on new commits. Configure here.

@yelhousni yelhousni added this to the v0.14.N milestone Feb 10, 2026
@yelhousni yelhousni requested review from Copilot and ivokub February 10, 2026 16:36
@yelhousni yelhousni self-assigned this Feb 10, 2026
@yelhousni yelhousni added type: perf dep: linea Issues affecting Linea downstream labels Feb 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes emulated-field bit decomposition in std/math/emulated by (1) adding a per-Field cache for ToBits results and (2) rewriting ToBitsCanonical to avoid redundant bit decompositions by inlining the in-range comparison.

Changes:

  • Add Field.bitCache keyed by limb hash codes (+ overflow) and use it in Field.ToBits.
  • Inline the “< modulus” bitwise comparison in ToBitsCanonical to reuse the already-computed reduced element bits.
  • Add cache key computation helper (computeBitCacheKey) and associated types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
std/math/emulated/field_binary.go Adds ToBits caching and rewrites ToBitsCanonical to avoid redundant ToBits calls via inlined comparison logic.
std/math/emulated/field.go Introduces bitCache storage on Field and implements cache key computation based on limb HashCode() + overflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread std/math/emulated/field_binary.go Outdated
Comment thread std/math/emulated/field.go Outdated
Comment thread std/math/emulated/field.go Outdated
Comment thread std/math/emulated/field_binary.go Outdated
Comment thread std/math/emulated/field_binary.go
Copy link
Copy Markdown
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very quickly skimming through -- I think we don't need to keep the cache in Field, but actually can store it in Element already. We assume that already initialized Element values do not mutate (we take as input pointers and return new pointers from emulated.Field). This is same way how we implement modreduced and internal to indicate if the value is already reduced etc.

Imo this simplifies the implementation a bit as we don't need the per-field cache and is a bit more idiomatic. How it would look: we have field bitsDecomposition in Element and if we do binary decomposition anywhere then we set it to the bits. And then when we do it somewhere else we check if bitsDecomposition != nil and use it.

Comment thread std/math/emulated/element.go
Comment thread std/math/emulated/field_binary.go
Comment thread std/math/emulated/field_binary.go
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Comment thread std/math/emulated/field_binary.go Outdated
Copy link
Copy Markdown
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good. I have checked we haven't created any unsound paths.

I also refactored the common part into a separate function, I have actually an idea how to optimize the check using range checking (instead of doing bitwise, we can do limb-wise as in function assertBytesLeq in std/conversion/conversion.go)

Copy link
Copy Markdown
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS! Please have a look at my changes that they make sense before merging. Otherwise it is good to merge on my side!

@yelhousni yelhousni merged commit 78a257c into master Mar 4, 2026
13 checks passed
@yelhousni yelhousni deleted the perf/emulated branch March 4, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dep: linea Issues affecting Linea downstream type: perf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants