fuse RAM ValEvaluation + ValFinal sumchecks (stage4) #1203
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Experimental change: replace Stage 4 RAM ValEvaluation + ValFinal sumchecks with a single fused sumcheck instance, while preserving the same cached opening IDs/points (
RamValEvaluation,RamValFinalEvaluation) so downstream protocols (notably RA reduction) remain unchanged.Precise protocol / identity
We batch two existing RAM value identities into one log(T)-round sumcheck using an explicit fusion challenge γ sampled from the transcript immediately before Stage 4 (label:
ram_val_fused_gamma).Let:
(r_address_rw, r_cycle)be the point fromVirtualPolynomial::RamValunderSumcheckId::RamReadWriteChecking.r_address_rafbe the point fromVirtualPolynomial::RamValFinalunderSumcheckId::RamOutputCheck.inc(j)be the per-cycle RAM increment witness (CommittedPolynomial::RamInc).wa_rw(j) := eq(r_address_rw, addr_j)andwa_raf(j) := eq(r_address_raf, addr_j)whereaddr_jis the remapped RAM address at cyclej.Then the fused sumcheck proves:
(Val(r_address_rw, r_cycle) − Val_init(r_address_rw))
= Σ_j inc(j) · ( wa_rw(j) · LT(j, r_cycle) + γ · wa_raf(j) )
Degree bound is 3 (dominated by the
inc · wa_rw · LTterm).Implementation approach (how we share work)
jolt-core/src/zkvm/ram/val_fused.rs.incMLE witness and oneLtPolynomial(forLT(j, r_cycle)).wa_indices: Arc<Vec<Option<usize>>>from the trace (the remapped address index per cycle), and construct twoRaPolynomials from it:wa_rw = RaPolynomial(wa_indices, eq(r_address_rw, ·))wa_raf = RaPolynomial(wa_indices, eq(r_address_raf, ·))This mirrors
ValEvaluation’s “indices + eq-table” design and avoids ever materializing a densewa: Vec<F>.Openings / compatibility with existing pipeline
Even though Stage 4 now runs a single instance, we still cache the exact same openings under the existing sumcheck IDs:
SumcheckId::RamValEvaluation:CommittedPolynomial::RamIncopened atr_cycle′VirtualPolynomial::RamRaopened at(r_address_rw || r_cycle′)with claim =wa_rw(r_cycle′)SumcheckId::RamValFinalEvaluation:CommittedPolynomial::RamIncopened atr_cycle′VirtualPolynomial::RamRaopened at(r_address_raf || r_cycle′)with claim =wa_raf(r_cycle′)This preserves the coincidence constraints described in
jolt-core/src/zkvm/ram/mod.rsfor the later RA reduction sumcheck.Precise memory savings
Compared to the baseline (two separate Stage-4 RAM sumcheck provers):
wa: Vec<F>of length T (previously allocated byValFinalSumcheckProver).inc: Vec<F>of length T (previously allocated twice: once inValEvaluation, once inValFinal).Net: ~2·T field elements less live heap memory during Stage 4.
2 * T * size_of::<F>()(plus Vec/MLP overhead). For BN254-sized fields (~32 bytes/elem), this is ~64*Tbytes.Perf (single run, sha2-chain trace)
From Chrome traces, summing only the relevant spans:
RamValFusedSumcheckProver::{initialize,compute_message,ingest_challenge}): 308.490 msRamValEvaluationSumcheckProver::*+ValFinalSumcheckProver::*): 340.591 msNotes