Conversation
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes memory usage in the GKR-MiMC implementation by introducing memory pooling for field element allocations and adding a specialized SumExp17 operation. The changes result in a 20% reduction in heap allocation (from 2267.45 MB to 1798.62 MB) for BLS12-377 benchmarks.
- Memory pool implementation in
gateAPIto reduce heap allocations - New
SumExp17method for optimized computation of (a+b+c)^17 operations - Refactoring from global to instance-based API usage patterns
Reviewed Changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| std/gkrapi/gkr/types.go | Adds SumExp17 method to GateAPI interface |
| std/gkrapi/compile.go | Wraps API with FrontendAPIWrapper for gate evaluation |
| internal/gkr/gkr.go | Adds FrontendAPIWrapper with SumExp17 implementation |
| std/permutation/gkr-mimc/gkr-mimc.go | Optimizes addPow17 function with BLS12-377 specific caching |
| Multiple internal/gkr/*/gkr.go | Implements memory pooling in gateAPI across all curve implementations |
| Multiple internal/gkr/*/solver_hints.go | Updates to use instance-based gateAPI |
| Test files | Renames hashTreeCircuit to merkleTreeCircuit and improves test configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| func (api *gateAPI) newElement() *{{ .ElementType }} { | ||
| api.nbUsed++ | ||
| if api.nbUsed >= len(api.allocated) { |
There was a problem hiding this comment.
Off-by-one causes unnecessary allocations in memory pool
Medium Severity
The newElement function uses >= in its condition if api.nbUsed >= len(api.allocated) when it should use >. After incrementing nbUsed, the function returns allocated[nbUsed-1]. When nbUsed equals len(allocated), the index nbUsed-1 is still valid (within bounds), but the current condition triggers an unnecessary append. This causes an extra allocation every time the pool is reused up to its previous capacity, which undermines the PR's memory optimization goal. The condition should be api.nbUsed > len(api.allocated).
Additional Locations (2)
| api.allocated = append(api.allocated, new(fr.Element)) | ||
| } | ||
| return api.allocated[api.nbUsed-1] | ||
| } |
There was a problem hiding this comment.
Off-by-one causes unnecessary allocations in element pool
Medium Severity
The newElement function uses >= instead of > in its condition, causing an unnecessary allocation every time the pool is reused after freeElements() is called. When nbUsed equals len(api.allocated), the element at index nbUsed-1 already exists and can be returned, but the >= condition triggers an append first. Since this PR's goal is reducing memory allocations, this off-by-one error partially defeats the optimization. This pattern is replicated across all curve implementations.
Additional Locations (1)
| } | ||
| } | ||
|
|
||
| var api gateAPI |
There was a problem hiding this comment.
Missing pool recycling in Complete function loop
Medium Severity
The Complete function calls api.evaluate inside a nested loop iterating over all instances and wires, but never calls api.freeElements() to recycle the memory pool. Unlike solver_hints.go and computeAll which properly call freeElements() after each gate evaluation, this code path allows the pool to grow unboundedly. For large circuits this defeats the memory optimization that is the primary goal of this PR. This pattern is replicated across all curve implementations via the template.
Additional Locations (1)
|
Closed in favor of #1676 |
Improves the amount of heap allocations in the benchmark of a 2^16 BLS12-377 element long hash down to 1798.62 MB on an hpc6a.48xlarge machine. This constitutes a 20% improvement over the linea-monorepo baseline (2267.45 MB.)
Most of the improvement was achieved by introducing a reusable pool / stack for temporary field element variables, and is generic to any use of the GKR API.
A further reduction was due to a hard-coded implementation of the most commonly used MiMC gate in BLS12-377, which is
(state + msg + key)^17.I believe that for small instance sizes (< 2^16) the cost is dominated by the GKR verifier and the PLONK prover itself. From that point on, the slope of the fitted line (<1) suggests an at most linear rate of growth.
Note
Improves GKR performance and memory by reducing heap allocations and optimizing common exponentiation.
gateAPIwithnewElement,cast, andfreeElements; switch all gate evaluations to&apiand callfreeElementsin hot loopsgkr.GateAPIwithSumExp17and addFrontendAPIWrapperimplementation; use for(a+b+key)^17in MiMC and generic GKR pathsfrontend.Variable, change S-Box builders to acceptfrontend.Variablekeys, and useSumExp17where applicablebls12-377/381,bls24-315/317,bn254,bw6-633/761,small_rational) to the newgateAPIand solver hintsbackendoptionsWritten by Cursor Bugbot for commit 1676049. This will update automatically on new commits. Configure here.