Skip to content

Perf: Improve gkr-mimc memory use#1616

Closed
Tabaie wants to merge 117 commits intomasterfrom
perf/mem/gkr-exp17
Closed

Perf: Improve gkr-mimc memory use#1616
Tabaie wants to merge 117 commits intomasterfrom
perf/mem/gkr-exp17

Conversation

@Tabaie
Copy link
Copy Markdown
Contributor

@Tabaie Tabaie commented Sep 24, 2025

Improves the amount of heap allocations in the benchmark of a 2^16 BLS12-377 element long hash down to 1798.62 MB on an hpc6a.48xlarge machine. This constitutes a 20% improvement over the linea-monorepo baseline (2267.45 MB.)

Most of the improvement was achieved by introducing a reusable pool / stack for temporary field element variables, and is generic to any use of the GKR API.
A further reduction was due to a hard-coded implementation of the most commonly used MiMC gate in BLS12-377, which is (state + msg + key)^17.

Log Instance Size computeGJ Solve Total
14 51.05 MB 1102.33 MB 1153.38 MB
15 103.09 MB 1244.49 MB 1347.58 MB
16 228.72 MB 1569.90 MB 1798.62 MB
17 388.39 MB 2943.39 MB 3331.78 MB
18 871.35 MB 4127.38 MB 4998.73 MB
19 1657.63 MB 6413.27 MB 8070.9 MB
20 3352.88 MB 12711.18 MB 16064.06 MB
21 6.63 GB 21.47 GB 28.1 GB
22 13.01 GB 42.69 GB 55.7 GB
23 26.06 GB 83.92 GB 109.98 GB
gkr-mimc mem

I believe that for small instance sizes (< 2^16) the cost is dominated by the GKR verifier and the PLONK prover itself. From that point on, the slope of the fitted line (<1) suggests an at most linear rate of growth.


Note

Improves GKR performance and memory by reducing heap allocations and optimizing common exponentiation.

  • Add pooled, pointer-based gateAPI with newElement, cast, and freeElements; switch all gate evaluations to &api and call freeElements in hot loops
  • Extend gkr.GateAPI with SumExp17 and add FrontendAPIWrapper implementation; use for (a+b+key)^17 in MiMC and generic GKR paths
  • Refactor MiMC registration: return curve-specific constants as frontend.Variable, change S-Box builders to accept frontend.Variable keys, and use SumExp17 where applicable
  • Update generator templates and per-curve backends (bls12-377/381, bls24-315/317, bn254, bw6-633/761, small_rational) to the new gateAPI and solver hints
  • Clean up tests/benchmarks: modernize circuits, rename hash tree benchmarks to Merkle tree, use gnark backend options

Written by Cursor Bugbot for commit 1676049. This will update automatically on new commits. Configure here.

@Tabaie Tabaie marked this pull request as ready for review September 25, 2025 04:09
@Tabaie Tabaie requested a review from Copilot September 25, 2025 04:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes memory usage in the GKR-MiMC implementation by introducing memory pooling for field element allocations and adding a specialized SumExp17 operation. The changes result in a 20% reduction in heap allocation (from 2267.45 MB to 1798.62 MB) for BLS12-377 benchmarks.

  • Memory pool implementation in gateAPI to reduce heap allocations
  • New SumExp17 method for optimized computation of (a+b+c)^17 operations
  • Refactoring from global to instance-based API usage patterns

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
std/gkrapi/gkr/types.go Adds SumExp17 method to GateAPI interface
std/gkrapi/compile.go Wraps API with FrontendAPIWrapper for gate evaluation
internal/gkr/gkr.go Adds FrontendAPIWrapper with SumExp17 implementation
std/permutation/gkr-mimc/gkr-mimc.go Optimizes addPow17 function with BLS12-377 specific caching
Multiple internal/gkr/*/gkr.go Implements memory pooling in gateAPI across all curve implementations
Multiple internal/gkr/*/solver_hints.go Updates to use instance-based gateAPI
Test files Renames hashTreeCircuit to merkleTreeCircuit and improves test configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread std/permutation/gkr-mimc/gkr-mimc.go Outdated
Comment thread internal/gkr/small_rational/gkr.go
@Tabaie Tabaie requested a review from ivokub September 25, 2025 04:12
cursor[bot]

This comment was marked as outdated.

@ivokub ivokub added the feat: gkr PRs related to GKR label Sep 25, 2025

func (api *gateAPI) newElement() *{{ .ElementType }} {
api.nbUsed++
if api.nbUsed >= len(api.allocated) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-by-one causes unnecessary allocations in memory pool

Medium Severity

The newElement function uses >= in its condition if api.nbUsed >= len(api.allocated) when it should use >. After incrementing nbUsed, the function returns allocated[nbUsed-1]. When nbUsed equals len(allocated), the index nbUsed-1 is still valid (within bounds), but the current condition triggers an unnecessary append. This causes an extra allocation every time the pool is reused up to its previous capacity, which undermines the PR's memory optimization goal. The condition should be api.nbUsed > len(api.allocated).

Additional Locations (2)

Fix in Cursor Fix in Web

Comment thread std/hash/mimc/gkr-mimc/gkr-mimc_test.go Outdated
api.allocated = append(api.allocated, new(fr.Element))
}
return api.allocated[api.nbUsed-1]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-by-one causes unnecessary allocations in element pool

Medium Severity

The newElement function uses >= instead of > in its condition, causing an unnecessary allocation every time the pool is reused after freeElements() is called. When nbUsed equals len(api.allocated), the element at index nbUsed-1 already exists and can be returned, but the >= condition triggers an append first. Since this PR's goal is reducing memory allocations, this off-by-one error partially defeats the optimization. This pattern is replicated across all curve implementations.

Additional Locations (1)

Fix in Cursor Fix in Web

}
}

var api gateAPI
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing pool recycling in Complete function loop

Medium Severity

The Complete function calls api.evaluate inside a nested loop iterating over all instances and wires, but never calls api.freeElements() to recycle the memory pool. Unlike solver_hints.go and computeAll which properly call freeElements() after each gate evaluation, this code path allows the pool to grow unboundedly. For large circuits this defeats the memory optimization that is the primary goal of this PR. This pattern is replicated across all curve implementations via the template.

Additional Locations (1)

Fix in Cursor Fix in Web

Base automatically changed from feat/gkr/hashes to master January 12, 2026 22:38
@Tabaie
Copy link
Copy Markdown
Contributor Author

Tabaie commented Jan 21, 2026

Closed in favor of #1676

@Tabaie Tabaie closed this Jan 21, 2026
@Tabaie Tabaie deleted the perf/mem/gkr-exp17 branch January 21, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat: gkr PRs related to GKR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants