transform: equality-saturation MIR optimizer (mz_transform::eqsat)#37160
transform: equality-saturation MIR optimizer (mz_transform::eqsat)#37160antiguru wants to merge 67 commits into
Conversation
Add an e-graph / equality-saturation optimizer over a subset of MIR, wired into the logical optimizer behind the `enable_eqsat_optimizer` feature flag. The pass lowers MIR to a relational IR whose scalar payloads are real MirScalarExprs (column facts computed live, remapping via permute), saturates a ported rule engine over an e-graph, extracts the cheapest plan under a two-axis (memory-primary, size-weighted; time-secondary) cost model, and raises back. Unsupported subtrees bail to opaque leaves carried verbatim, so the pass is equivalence preserving on arbitrary input; a hard arity guard at the transform boundary makes any mismatch a no-op rather than a malformed plan. The defensible win is the cyclic-join decision: on a triangle join the AGM cost model proves worst-case-optimal (delta) evaluation dominates the binary join on both memory and time, where JoinImplementation (cardinality-blind, no arrangements) picks the dominated binary plan. The pass commits that decision by tagging the join JoinImplementation::DeltaQuery, synthesized by reusing the real delta planner (mz_transform::join_implementation::plan_as_delta_query); because JoinImplementation only (re)plans Unimplemented/Differential joins, the choice survives to execution. Rules cover predicate pushdown/fusion, empty propagation, threshold/negate distribution and elision, reduce elision, and negate-over-join. Reduce and TopK lower structurally. A single (e-class) condition evaluator backs all side conditions. The flag is temporarily defaulted on so CI exercises the pass across the test corpus (see the TODO in definitions.rs); revert to off before leaving draft. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…blowup Make the equality-saturation pass viable with enable_eqsat_optimizer on. Correctness: * Map lowering folded each scalar against input-only column types, but a Map scalar may reference columns produced by earlier scalars in the same Map. Fold against a context that grows by each scalar's type (escalars_in_map_context). * The pass runs in the logical optimizer, where joins must stay Unimplemented: ProjectionPushdown runs next with include_joins and panics on a filled-in implementation, and delta planning emits physical-phase structure (arranged inputs, lifted MFPs). Split optimize (commits WcoJoin to DeltaQuery, the offline test path) from optimize_logical (raise with commit_wcoj=false); the live EqSatTransform uses optimize_logical. Performance (a catalog index optimization dropped from 27s to under 0.5s): * Memoize extraction cost by built plan. Cost is compositional but was recomputed over the whole subtree per node per fixpoint pass (O(classes^2)). * Bound the per-call join cost: cap the exact 2^n join-order DP at 8 inputs and cap the cover-LP vertex enumeration, both otherwise combinatorial on wide joins. * Bound saturation: an e-node budget, a per-rule match cap with exponential backoff (egg BackoffScheduler style), and a plan-size gate on the transform. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
The equality-saturation engine already lives in mz_transform::eqsat. Move its Lean 4 soundness spec out of the standalone misc/mir-rewrite-dsl prototype and delete the rest of that workspace. The emitter is now src/transform/src/eqsat/lean.rs, made exhaustive over the live 33-rule AST; the proofs and lakefile move to src/transform/lean/; gen-lean becomes a cargo example that regenerates Generated.lean (33 theorems, 13 proved, 20 sorry). Refresh the status doc with the live-integration state, a pipeline-coverage matrix, and the single-pass roadmap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Stop pinning enable_eqsat_optimizer=true in the SLT runner and rely on the production default instead (currently on), so the corpus exercises whatever the default flag value is rather than a runner-local override. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Store the MirScalarExpr::reduce result instead of discarding it, so the e-graph carries ReduceScalars-equivalent forms and raise emits folded scalars. This subsumes FoldConstants, CoalesceCase, and CaseLiteral on Filter and Map payloads. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Extend lower-time scalar reduction to the Reduce group keys, aggregate input expressions, and TopK limit, which previously stored unfolded payloads verbatim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Fold equivalence-class scalars against the concatenated input column space, completing lower-time scalar reduction across all scalar sites. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Record that EScalar payloads are now reduced canonical forms, the round-trip is canonicalizing not byte-identical, and move ReduceScalars, CoalesceCase, CaseLiteral, and general FoldConstants to Covered. Note the lower-time reduction soundness invariant as a Lean obligation. Measured harness: 3 wins / 4 losses / 13 ties (unchanged); losses are empty-propagation gaps, not scalar-folding failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Final-review polish for the scalar-canonicalization workstream: * Correct the reduced_escalar soundness doc and the STATUS Lean obligation: the condition is per-rule semantic identity, not a blanket no-relaxation rule. Filter pushdown into a join input does move a scalar strict to loose, but stays sound because the join equivalence enforces the strengthened non-null fact on surviving rows, mirroring production predicate_pushdown. * Add direct unit tests for aggregate-expr and TopK-limit reduction (only group_key was covered before). * Fix the stale simplify/canonicalize_scalars reference in the rule DSL to point at reduced_escalar. * Annotate the plan's disproven harness premise (losses are structural empty-propagation gaps, not scalar-folding). * Emit the copyright header from the Lean generator and add it to lakefile; exclude lean-toolchain (a bare version file) from the copyright check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Per-e-class lattice of Option<EquivalenceClasses>, built by mirroring the production Equivalences::derive per operator. Wired into the saturation fixpoint; no consumer yet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
A saturation-time step that rewrites scalar payloads to their equivalence-canonical representatives, the e-graph form of EquivalencePropagation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
The canonicalization loop's per-iteration invariant is that a canon_id either still keys its own class or has been subsumed by an earlier union this iteration, in which case the lookup misses and we skip and recover next round. The prior comment only described the pre-loop state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Add an unsatisfiable(rel) condition backed by the equivalences analysis and a rule that rewrites a relation with contradictory equivalences to an empty constant. Also extend generic_join/solve to enumerate all e-class IDs for unconstrained relation metavariables (pure-RelVar LHS patterns), which is necessary for the bare `r => Empty(r) where unsatisfiable(r)` rule to fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Add `Cond::Equiv { pred, rel }` and a `drop_equiv_filter` rule that
removes a Filter whose single predicate `#a = #b` is already implied by
the input relation's equivalences (e.g. a join on `#0 = #2` makes a
`Filter (#0 = #2)` above it redundant).
Also add `Cond::NotSelfRef { rel }` and a companion guard on
`merge_filters`: when `drop_equiv_filter` merges a Filter class with its
input the Filter becomes self-referential, which caused `merge_filters` to
concatenate predicate lists without bound and `run_analysis` to diverge.
The `not_self_ref(r)` guard prevents `merge_filters` from firing in that
state (safe: both sides are already equivalent in a self-referential class).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
`drop_equiv_filter` removed `Filter[#a=#b]` whenever the Equivalences analysis placed #a and #b in the same equivalence class. This is unsound for null-preserving sources: consider `Map[#1 := #0]` over a nullable #0. The analysis derives {#0, #1} as an equivalence class, so it would drop `Filter[#0=#1]` -- but `NULL = NULL` evaluates to NULL (not TRUE), so the filter rejects null rows while the map keeps them. Dropping the filter changes the result. A sound implementation requires per-column nullability facts, which are only available in the typed/physical phase. The drop is deferred there. The `not_self_ref(r)` guard on `merge_filters` was added solely to prevent the class-blowup caused by `drop_equiv_filter` merging a Filter class with its input. With that rule gone, no rule creates self-referential Filter classes, so the guard is unneeded and is also reverted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Move EquivalencePropagation from Missing to Partial in the coverage matrix: the Equivalences e-class analysis drives scalar-payload canonicalization and unsatisfiable-to-empty collapse; redundant equality-predicate drop is deferred (nullability unavailable at saturation time). Add Lean obligations for the two sound consumers: (a) canonicalization preserves multiplicity denotation; (b) unsatisfiable => Empty is sound. Add a note blocking re-attempts of drop_equiv_filter in the saturation phase. Register compare_real, wcoj_decision, and roundtrip as explicit Cargo test targets (were absent from Cargo.toml, so cargo autodiscovery silently skipped them). Record actual harness finding: the test hangs on case 5 (filter_over_union_with_branch_filters) due to non-convergence of run_analysis for the Equivalences domain. The EquivalenceClasses domain is not finite; the nested fixpoints (run_analysis + minimize) do not jointly converge on Union-of-Filter plans. This is a blocker; fix requires bounding run_analysis iterations or approximating the domain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Three loops were unbounded on Union+Filter plans. Primary fix: Phase 2b (DSL rule application) in `saturate` now rechecks the MAX_ENODES budget after each instantiate+union. rebuild() at the top of each saturation iteration merges equivalent nodes, collapsing the count far below the threshold even after Phase 2b added thousands last iteration. Without the mid-loop guard, pending can grow 4x per iteration (1 052 -> 4 128 -> 16 420 ...) while the top-of-loop check always sees the post-rebuild count and never fires. Adding the guard mid-loop breaks as soon as the budget is exceeded; skipped rewrites are conservatively omitted. Secondary fix: cap run_analysis at MAX_ANALYSIS_ITERS=100 rounds. The three finite-height analyses converge in a handful of rounds. The Equivalences analysis is not finite-height on these plans and could oscillate; stopping early yields a sound under-approximation (all derived facts came from real node structure; canonicalization and unsatisfiable are conservative under fewer known equivalences). Tertiary fix: cap EquivalenceClasses::minimize outer loop at 100 iterations. The expand()/implications() step can add novel equivalences indefinitely in pathological cases; same soundness argument applies. Verified: compare_real now terminates in ~30s with SUMMARY: 3 wins / 0 losses / 17 ties / 0 skips All eqsat unit tests and wcoj_decision pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Factor the 100-iteration cap out of the shared production EquivalenceClasses::minimize into a new pub(crate) minimize_bounded method. Production minimize delegates to minimize_bounded(usize::MAX), restoring unbounded semantics. The eqsat Equivalences::merge caller now uses minimize_bounded(None, 100): Union nodes can force-equate arbitrary expressions, so bounding the merge's minimize is necessary; stopping early is a sound under-approximation (fewer equivalences, never incorrect ones). Refresh the status doc: remove BLOCKER bullets and "harness now hangs" language; record SUMMARY 3w/0l/17t; document the three termination guards. Fix Phase 2a loop-safety comment: "idempotent" -> "convergent under repeated application" with the correct explanation. Replace em-dashes in workstream-introduced egraph.rs comments with colons, parentheses, or sentence splits per CLAUDE.md style. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Run a bottom-up MapFilterProject coalescing pass over the raised plan, reusing extract_non_errors_from_expr_mut and MapFilterProject::optimize. Rebuild in canonical Map-then-Filter-then-Project order via a local rebuild_mfp_canonical (mirrors CanonicalizeMfp::rebuild_mfp but omits the fusion::filter::Filter::action call, which requires boolean-typed predicates not available in the eqsat engine's untyped-predicate inputs). A mfp_chain_valid guard skips chains where a Project references a column beyond the base arity (produced by map_columns_to_projection on self- referential Map output columns); the downstream CanonicalizeMfp handles those after type information is available. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Phase 2a canonicalization rewrote a node's scalar payloads with the node's own e-class reducer. For a Map the Equivalences analysis adds [column(input_arity+pos), defining_expr], and minimize often picks the bare column as canonical, so the reducer maps defining_expr to the column the Map itself produces. Applying that to the Map's own scalar created a forward self-reference, which map_columns_to_projection then turned into an out-of-range Project that failed Typecheck and crashed environmentd at catalog init with the flag on. Fix: guard rewrite_escalars so a rewritten scalar is rejected (original kept) when it would reference a column outside its valid evaluation range (Map scalar at pos: 0..input_arity+pos; Filter: 0..input_arity; Join: the concatenated input arity). Skipping a rewrite is always sound. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Cap the Equivalences inner fixpoint at 4 rounds (vs 100 for the cheap finite-height analyses) via run_analysis_bounded, since each inner round calls minimize_bounded per merge and is expensive. Catalog-init optimization is fast again; stopping early is a sound under-approximation (both consumers are correct with fewer known equivalences). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
… views Three distinct correctness bugs in the eqsat optimizer's scalar canonicalization pass caused miscompiled plans that crashed rendered dataflows with "Non-positive multiplicity" errors. **Bug 1: circular join-condition rewriting.** `rewrite_escalars` used a Join's own output Equivalences (derived from the join conditions) to rewrite those same conditions. For a join on `#a = #b` the reducer maps `#b → #a`; applying it to the condition yielded `[#a, #a]`, silently dropping the equijoin constraint and effectively widening the join to a cross-product. The extra rows then propagated through the outer-join Threshold/Union/Negate pattern and produced negative multiplicities. Fix: return `None` for Join/WcoJoin in `rewrite_escalars` — those nodes' conditions are never rewritten at saturation time. **Bug 2: `factor_negate_join` / `distribute_negate_join` unsound for non-linear aggregates.** Both negate-join rewrite rules merged `join(negate(a), rest)` and `negate(join(a, rest))` into the same e-class. When a `Reduce` with a non-linear aggregate (MAX, MIN, ANY, ALL) sat above the join, the extractor could pick the wrong form as the Reduce input, producing `Reduce_MAX(negate(join(a, rest)))`. This is semantically wrong because `reduce(r) ≠ negate(reduce(negate(r)))` for non-linear aggregates. Fix: remove both rules from `relational.rewrite`; regenerate `Generated.lean`. **Bug 3: circular filter-predicate rewriting.** `rewrite_escalars` for Filter nodes used the Filter's own output Equivalences to rewrite the predicates. The Filter's Equivalences include a class `[pred, true]` for each predicate; the resulting reducer maps sub-expressions back to their canonical column aliases. For a filter `#2 = CurrentUser()` the reducer mapped `CurrentUser() → #2`, which rewrote the predicate to `#2 = #2` (trivially true), causing `drop_true_filter` to silently eliminate the filter. This removed security-relevant `WHERE name = current_user` guards and any other equality predicates that constrain unmaterializable expressions. Fix: pass the *input's* e-class reducer to `rewrite_escalars` for Filter nodes so that only facts from the input (which do not include the predicates themselves) are used. `rewrite_escalars` gains a `filter_input_reducer` parameter; Phase 2a looks it up per node. Gate: `bin/sqllogictest --optimized -- test/sqllogictest/arithmetic.slt` PASS 206/206, zero "Non-positive multiplicity" occurrences. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
* Merge the stacked doc-comment on rewrite_escalars (summary first, then reducer selection, then the column-range validity invariant) and record that a Filter's predicates are governed by the INPUT reducer to avoid dropping security-relevant filters. * Pass filter_input_reducer directly instead of an `as &_` coercion. * Document why the negate-join rules must NOT be re-added with a `where` precondition: the extractor has no non-negativity precondition, so a rule guard cannot prevent a wrong-polarity form from reaching a non-linear Reduce; only a polarity-aware extractor could. * Add unit tests locking in the security guarantee (Filter governed by the input reducer, not its own) and that Join conditions are never rewritten. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Replace the hand-rolled `rebuild_mfp_canonical` in raise.rs with a direct call to `CanonicalizeMfp::rebuild_mfp`, achieving full CanonicalizeMfp subsumption. Non-boolean Filter predicates that prevented this were test-only constructs (not real MIR); fix all affected test fixtures to use boolean-typed predicates. Add eqsat.spec case (i) as an end-to-end witness showing coalesce_mfp firing on a Filter/Map/Filter chain that the eqsat rules cannot fully canonicalize. Update the raise.rs module doc comment to state MFP canonicalization as part of the round-trip contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Add enable_eqsat_physical_optimizer (default off) and PhysicalEqSatTransform, registered before JoinImplementation so the WcoJoin-to-DeltaQuery decision ships live. JoinImplementation skips DeltaQuery, so the decision survives. Feasibility gate: arithmetic.slt (206/206) passes with the flag on; no panics, no "can't deal with filled in join implementations", no multiplicity errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Feed available arrangements/indexes (from ctx.indexes) into the cost model so a join input that is already arranged is not charged the arrangement build, making the WcoJoin-vs-binary decision sound rather than index-blind. CostModel gains a `with_available` constructor accepting a `BTreeMap<GlobalId, Vec<Vec<MirScalarExpr>>>`. In `collect_memory` for WcoJoin, each input that is a direct opaque global Get whose join key (the set of local columns referenced in equivalences) matches an available index key column-set is skipped (arrangement exists for free). The logical pass and existing callers use `CostModel::new()` (empty availability, unchanged behavior). `optimize_with_availability` is the new entry point used by `PhysicalEqSatTransform`, which builds availability from `ctx.indexes` by walking the plan for global Gets and querying the oracle. Three new cost tests cover: indexed input drops its term, wrong-key index does not drop the term, and the index-blind model still prefers WcoJoin for the triangle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Update JoinImplementation coverage row to reflect that workstream D supplies a live physical eqsat placement (PhysicalEqSatTransform, flag enable_eqsat_physical_optimizer, default off) with an index-aware cost model. The WcoJoin-to-DeltaQuery decision is no longer offline-only. Note the known performance concern (~6.5s on large plans) and that deletion phase 3 is gated on flag-on SLT parity. Harness result unchanged: 3 wins / 0 losses / 17 ties / 0 skips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Wire cse::eliminate_common_subexpressions into the live path between extraction and raise, and teach raise to emit a local Get for a CSE-bound LocalGet by threading the bound value's type. CSE ids are made fresh to avoid clashing with lowered Let ids. Restrict CSE to closed subtrees (no lowered-local references) to preserve scoping invariants. Subsumes RelationCSE. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Move RelationCSE to Covered: extraction-time CSE hoists shared e-classes into Let bindings with fresh ids, wired live between extraction and raise. Add NormalizeLets as Partial: eqsat CSE produces well-formed Let/Get, leaving NormalizeLets to renumbering and inlining only. Note workstream E is done; FlatMap/ArrangeBy/LetRec/non-empty Constant de-opaquing is deferred (no rules see through them, low value now). Update deletion-phase 2: both C and E are complete, RelationCSE/ NormalizeLets deletion gated on flag-on SLT parity. Record harness counts: 3 wins / 0 losses / 17 ties / 0 skips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
demand_pushdown reuses ProjectionPushdown, which can introduce redundant nested Project nodes. Mirror the production pipeline (CanonicalizeMfp always runs after ProjectionPushdown) by re-running coalesce_mfp afterward, so the eqsat output is clean rather than relying on the downstream pipeline to tidy it (which is why the optimized-plan SLT gate did not surface the redundancy). Updates the eqsat datadriven golden, which runs the pass in isolation and so exposes the projections the full pipeline hides. SLT arithmetic 206/206, zero plan diff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
The `NonNullRequirements` pass previously left a TODO for `Not(IsNull)` predicates. A filter `e IS NOT NULL` (represented as `NOT(isnull(e))`) passes only on rows where `e` is non-null. Because `isnull` itself does not propagate nulls, the existing `non_null_requirements` traversal on the predicate expression silently skipped these constraints. This commit seeds the columns required to be non-null by calling `expr.non_null_requirements(&mut columns)` on the inner expression `e` whenever the predicate has the shape `Not(IsNull(e))`. The practical effect: when a LEFT or RIGHT OUTER JOIN is followed by a downstream `WHERE r.col IS NOT NULL` filter, the optimizer can now prove that the null-padding branch of the outer join is dead and collapse it to an inner join. This removes the `Negate + Map(null)` null-padding operators from the physical plan, reducing both arrangement count and output cardinality of intermediates. SLT goldens updated in three files: - `not-null-propagation.slt`: SOME-subquery plans benefit because the inner `Filter (#col = ...)` condition implies the column is non-null, triggering the collapse of the corresponding anti-join padding branch. - `joins.slt`: a query with `WHERE EXISTS(...) OR col IS NOT NULL` where `col` is a computed constant (always non-null) has its EXISTS cross-join eliminated entirely. - `chbench.slt`: a projection/CSE reorder in a TPC-H-style query where the new null propagation shifts when `padchar` is computed relative to the join. All changed golden lines are EXPLAIN plan diffs. No result-row tests changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Records the design-spike finding that de-opaquing Get/ArrangeBy is the foundation for index selection (demand-parameterized extraction alone does not unlock it), decomposed into committable sub-phases I0-I4. Cross-references the parity roadmap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
A samply+pollard profile of `catalog_server_explain.slt` showed the eqsat optimizer spending 9-30s per slow-path EXPLAIN of complex catalog views, with `mz_transform::eqsat::optimize_inner` at 83% of CPU. Inside it, `CostModel::cost` was 64.8%, dominated by `join_degree` -> `Hypergraph::agm_degree_subset` (28.7% self / 64.7% total): a fractional-edge-cover LP solved per join node, plus heavy allocation churn from the per-call LP matrices. Extraction re-costs every candidate plan across many passes, and the existing `cost_cache` keyed on whole built `Rel`s misses frequently, so the SAME AGM LP is re-solved thousands of times. `agm_degree_subset` is a pure function of the per-input size degrees and the join hypergraph (arities + per-class input-index sets) and the subset mask, so its result is memoizable. Add an exact `AgmKey` -> `f64` memo to `CostModel`, keyed on the per-input `size_degree` values (as raw f64 bits), the hypergraph's `arities` and `classes`, and the subset mask. This is the complete set of inputs the LP reads, so the cached value is bit-identical to a fresh solve and no cost decision changes: all EXPLAIN output is unchanged (catalog_server_explain stays at its pre-existing 17 output-failures / 271 success / 288 total golden-churn baseline). Route both `join_degree` and `binary_join_terms` through the memo. The cache uses `RefCell` interior mutability because `cost` takes `&self`; a `CostModel` is created fresh per optimization and used single-threaded, so this is sound. Worst EqSatTransform times on catalog_server_explain drop from ~9.2s/27.8s/29.8s to ~0.32s/0.31s/0.29s (~90-100x), well under the slow-optimization WARN threshold. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…duces
An earlier-placement experiment proved that moving the logical eqsat pass
before the logical fixpoints and removing fixpoint_logical_02 crashes bootstrap
in ReducePlan::create_from ("Multiple reduction types detected: left:
Accumulable right: Hierarchical"). A Reduce whose aggregates mix reduction types
(e.g. sum = Accumulable, min = Hierarchical) must be split into one Reduce per
type, joined on the group key, before lowering. Production ReduceReduction does
this inside fixpoint_logical_02; the e-graph search has no equivalent rule, so a
mixed-type Reduce extracted by eqsat reaches lowering and panics.
Add raise::reduce_reduction, a logical-only post-pass that reuses the production
ReduceReduction::action (structural, no TransformCtx, infallible) via
visit_pre_mut, mirroring the existing raise::demand_pushdown / raise::coalesce_mfp
reuse pattern. It is gated to the logical phase (!commit_wcoj) because it
introduces a join that must stay Unimplemented until JoinImplementation runs.
Wire it into optimize_inner after demand_pushdown and the follow-up coalesce_mfp.
This is the #1 prerequisite for deleting fixpoint_logical_02.
A capability-proof integration test builds a Reduce with sum(col) + min(col)
over a Get, runs optimize_logical, and asserts the mixed-type Reduce was split
into two per-type reduces joined on the group key, so create_from would not
panic. At the current placement fixpoint_02 still runs ReduceReduction, so the
post-pass is redundant and all SLTs are unchanged (arithmetic 206/206,
advent-of-code/2023 all pass, ldbc_bi 102/102).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
… ids
eqsat's extraction-time CSE binds shared subterms with fresh Let ids
chosen above the maximum LocalId already present in the plan. The
max-id walk (cse::max_local_id) only inspected Rel::Let and
Rel::LocalGet nodes and recursed through Rel children. But lower bails
unsupported MIR (notably LetRec, the substrate for WITH MUTUALLY
RECURSIVE) verbatim into a Rel::Opaque leaf, which has no Rel children:
its LocalIds, both Let/LetRec binding ids and Get { Id::Local }
references, live inside the carried MirRelationExpr and were never
seen. A fresh CSE id could therefore equal a LocalId buried in an
opaque LetRec, producing a shadowed binding. Raise then feeds the plan
to the production Demand pass, whose LetRec no-shadowing assertion
panics ("expected None found Some(...)" at demand.rs:157) on recursive
CTEs. An earlier-placement experiment surfaced this deterministically:
running eqsat on a pre-fixpoint plan for aoc_1204-style recursive-CTE
queries crashed. It is latent at the current placement (the
post-fixpoint plan happens not to collide on tested inputs) but a real
correctness hazard.
Fix the root cause: max_local_id now walks into Rel::Opaque leaves via
MirRelationExpr::visit_pre and takes the max over every LocalId there,
so a fresh id is always strictly greater than any existing binding or
get id. engine::max_local_id (placeholder ids in optimize_around_scopes
/ hoist_scopes) had the identical blind spot and is fixed the same way,
reusing the new cse::max_mir_local_id helper.
Adds a cse unit test that proves a fresh CSE id cannot collide with a
LocalId inside an opaque LetRec leaf (fails before the fix), and an
end-to-end roundtrip test that runs lower -> CSE -> raise ->
demand_pushdown over a recursive CTE with an inner CSE opportunity and
asserts no panic with preserved arity.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Add the measured next sequence (fixpoint_02 -> fixpoint_01 needs PredicatePushdown equivalence-setting -> fuse_and_collapse -> Phase 2/3/4) and the MIR-replay-harness plan for fast eqsat iteration (capture unoptimized MIR, replay through optimize_logical in an mz-transform-only binary; clusterd-test-driver as a correctness tier). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Extract the inline `fixpoint_logical_02` fixpoint from `logical_optimizer` into a shared `pub fn fixpoint_logical_02()`, and run it over the raised eqsat output via `raise::logical_fixpoint_02`. This replaces the narrower `reduce_reduction` post-pass, which it subsumes (the fixpoint includes ReduceReduction). Reusing the exact production list guarantees parity with fixpoint_02 by construction and makes the planned cutover (deleting the standalone pass and moving eqsat earlier) a one-line deletion. Logical-only: gated on `!commit_wcoj`, clone-and-adopt-on-arity-match, mirroring `demand_pushdown`/`reduce_reduction`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Phase 1 (the fixpoint_logical_02 reuse post-pass) landed in 65acb41. The cutover itself is blocked: moving eqsat before fixpoint_logical_01 is unsound (equivalence-dependent rewrites break), and the conservative conditional strangler-fig is only equivalent to Phase 1 because eqsat consumes fixpoint_02's output and lacks native outer-join / redundant-self-join collapse. eqsat-on therefore already loses fast-path peeks (e.g. a LEFT self-join on a full primary key) regardless of placement. The real prerequisite is a bottom-up keys/FD e-class analysis plus an outer-join collapse rule. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Add a cheap per-e-class analysis mapping each output column known to hold a constant value to that value, stored as a full EScalar so a later generalization to arbitrary scalar equivalence needs no domain change. Wire it through LocalFacts/letrec_local_facts and the per-round Analyses build. Nothing reads the result yet; a later Cond consumes it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…-of-class) Make e-graph extraction polarity-aware so a multiplicity-signed (Negate-rooted) representative is never placed directly as the input of a non-linear reduce or a TopK. Extraction now computes two best-of-class maps per fixpoint: best_any (no sign constraint) and best_nonneg (non-negative output multiplicities). build_rel takes a Demand and pulls each child from the map its computed demand requires. This is the prerequisite for re-enabling the negate-join rewrite rules. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…are extractor) Restore distribute_negate_join and factor_negate_join. They merge join(negate(a), rest) with negate(join(a, rest)). This is sound only because the polarity-aware extractor (commit df91893) never places a Negate-rooted representative directly under a non-linear Reduce or a TopK, so the soundness lives in the extractor, not a rule guard. Add eqsat_negate_join.rs: a soundness gate (Reduce(MAX) and TopK over a plain join whose e-class is contaminated by a negated sibling arm; the extracted plan must never feed a Negate-rooted input to the barrier) and a win test (Union(join(a,rest), join(negate(a),rest)) collapses to empty via union_cancel). Update roundtrip::negate_join_no_longer_cancels to negate_join_cancels, which now asserts the collapse. Regenerate Generated.lean (the two join theorems emit as sorry, like the other structural join rules). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…ss-test note Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
… pass Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…e design Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…section) The merge combined two e-node forms of the same e-class with intersection, so the run_analysis fold d = merge(bottom, make(n)) annihilated every fact (bottom is empty) and an opaque LocalGet poisoned any class it joined. merge must be a union toward more precision (like NonNeg/Keys): a column proven constant in either form holds of the relation; only a value conflict drops it. Intersection remains correct for the Union operator arm. Adds a bottom-is-identity regression guard (the unit tests previously exercised merge/make directly, never the fixpoint fold, and the analysis had no consumer, so the inertness was invisible). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…ad-source drops) Rewrites six goldens to the eqsat-on output where the change is a benign equivalent or an improvement: demand (folds a TopK/Union/Negate/error-filter plan to a Constant fast path), record (record_get over a row literal folds to a column), joins/subquery/reduction_pushdown (drop a dead unreferenced Source line), locally_optimized_plan (Project distributed over Union plus a pruned column). Leaves column_knowledge:480 (a tracked fast-path regression) and the parse_ident / index_too_wide eqsat-on regressions un-rewritten. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
A prior strangler-fig cutover skipped CanonicalizeMfp in logical_cleanup_pass when eqsat was on, assuming eqsat's raise-time MFP canonicalization subsumed it. That is unsound: eqsat does not run on every plan, and its raise canonicalizes only the MFP runs it produces, not a decorrelated scalar subquery that logical cleanup must collapse. With it skipped, such a plan kept a Distinct + self-join form instead of a per-row Map (parse_ident.slt regressed under the flag). Canonicalization is idempotent, so running it after eqsat's raise is harmless, and it can only reduce eqsat-on/off divergence. Restores parse_ident.slt (84/84 with the flag on) with no regression in the validated corpus files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…graph (un-trap facts) For a non-recursive `Let x = v in body`, add the optimized definition `v` into the body's e-graph and union the `Get x` (LocalGet) class with `v`'s root before saturating the body. The union makes `v`'s e-class analysis facts (e.g. constant columns) reach `Get x` via congruence, un-trapping them across the binding boundary so an analysis-gated rule can fire on the reference. The binding's emitted definition stays `v` unchanged; the body extracts to shared `LocalGet x` references (a LocalGet is a free leaf, cheaper than re-materializing the definition), so the reassembled scope is a correct, sharing `Let` with arity preserved. LetRec stays on the opaque path: unioning a recursive reference into its own definition would close an e-graph cycle that breaks extraction. Bodies that nest further scopes fall back to the opaque path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…inition union Add a roundtrip test that proves the step-2 non-recursive `Let`-definition union un-traps an analysis fact across the binding boundary. For `Let l0 = Filter[#0 = 5](src) in Union[Filter[#0 = 6](Get l0), Get l0, Get l0]`, the binding pins column 0 to 5; the body filter `#0 = 6` is then unsatisfiable. The fact lives on the binding definition, trapped behind the multi-use `Get l0`. The eqsat union carries it onto the `Get l0` class, after which the existing `Equivalences`-backed `unsatisfiable` cond fires and `collapse_unsatisfiable` replaces the branch with an empty Constant. No new rule or cond is needed. The test asserts both halves: with the union the impossible branch collapses, without it (new `optimize_without_let_union` control entry, gated by a `union_let_defs` flag on `Optimizer`) the filter survives. This proves the collapse requires the union, not an unrelated post-pass. Honest scope notes, recorded in the test doc: * The dedicated `ConstantColumns` analysis is NOT needed for this win; `Equivalences` (already in `an.eq` at cond-check time) subsumes it. `an.cc` remains computed but unread. * The win is dormant relative to the full production `logical_optimizer`, which reaches the same collapse independently via its own Let-aware `Equivalences` analysis. The union's value is the e-graph-native mechanism (facts on the shared class during saturation), not a plan production cannot otherwise reach. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…rity) A worst-case-optimal join only beats a binary join tree asymptotically when the join is cyclic; acyclic joins are handled optimally by a binary tree (Yannakakis). Previously join_to_wcoj converted every join to WcoJoin, so the AGM cost model regressed plain and acyclic joins to a delta plan where production picks a binary join. Add a structural cyclicity guard (GYO over the join's equivalence hypergraph) so WcoJoin is created only for cyclic joins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…its output (cost B) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
antiguru
left a comment
There was a problem hiding this comment.
Critical review: eqsat MIR optimizer
I read the engine end-to-end plus the design-review / status / roadmap / polarity-extractor docs. This is genuinely impressive work and the self-assessment in the status doc is unusually honest. The notes below are the gaps I'd want closed (or explicitly punted with eyes open) before this leaves draft, followed by the bigger-picture framing.
The bigger picture first
By the team's own framing, the live logical pass is "parity, not improvement." Every active rule duplicates a rewrite the pipeline already performs, and the one genuine divergence (the WCOJ→DeltaQuery decision) ships only behind the second, default-off flag (enable_eqsat_physical_optimizer). So in the default-on configuration this PR proposes, the user-visible plan benefit today is ≈0, while it adds a 16.5k-line optimizer surface, real optimize latency, and new soundness obligations. The justification is strangler-fig: grow eqsat until it can delete pipeline passes. That's a sound strategy — but no deletion has landed yet, and the one cutover that had landed (skipping CanonicalizeMfp under the flag) is reverted in this very PR because it was unsound (lib.rs:958 comment). So the current state is "extra work for parity," and the merge decision should be judged on that basis, not on the eventual vision.
That's the lens for the blockers below.
Blockers (internal contradictions that make the PR un-mergeable in either flag state)
1. The flag default contradicts itself across the PR. The PR body says "the flag is temporarily defaulted on so CI exercises the pass… Revert to off before un-drafting," and the status doc lists the same revert under "What is left." But definitions.rs (enable_eqsat_optimizer) now carries the opposite comment: "Defaulted on… the intended steady state… not a pre-merge revert." Pick one. Shipping a brand-new experimental optimizer default-on to every user, for zero plan benefit today, is a big call that needs an explicit owner sign-off — not an ambiguity buried in a comment.
2. The SLT goldens are silently coupled to the default-on flag, and nothing pins it. Ten SLT files are rewritten to eqsat-on output (not-null-propagation.slt +218/-… , locally_optimized_plan.slt 118 lines, joins, chbench, aoc_*, record, subquery, demand, reduction_pushdown), but no SLT file sets enable_eqsat_optimizer (grep across test/sqllogictest/ finds zero references). So those goldens pass only because the global default is true. This makes #1 not just a doc nit but a hard contradiction: if you "revert to off before un-drafting," these tests immediately fail; if you keep it on, you're shipping on. There is no flag state in which the PR as written is both green and consistent with its own stated plan. Resolution options: (a) commit to default-on and fix the PR body/status doc; (b) gate each eqsat-divergent golden behind an explicit per-file SET enable_eqsat_optimizer (or mzcompose override) and revert the default to off; (c) revert the golden rewrites. (b) is the honest one if the intent is really "off at merge."
3. Synthesized empty-constant types are placeholders, and the pass is now live. raise::repr_type_of_arity (raise.rs:391) types every column of a synthesized Empty (from union_cancel / empty_false_filter) as Int64?. Its own doc comment still says "The pass is offline; the surrounding optimizer recomputes column types when this is ever wired live" — but it is wired live now. The boundary arity guard in EqSatTransform (transform.rs:64) catches arity drift but not column-type drift. So when one of those rules collapses a branch whose true column types aren't Int64 (e.g. a text/numeric column), eqsat emits an internally-consistent-but-wrong-typed plan. Because Typecheck recomputes bottom-up, the wrong type is self-consistent and may not be rejected — which means it can silently change a query's result-column type, or feed a type-incompatible arm to a sibling Union. Please carry the real ColumnType through the Empty node (it's available at lower time), and add a regression test: an empty-collapsed branch with a non-Int64 column must round-trip its type and survive the final strict Typecheck.
Soundness surface that is asserted but not enforced
4. Negate-join rules are enabled; their soundness rests entirely on the polarity-aware extractor; the matching Lean obligations are sorry. The status doc says the two unsound negate-join rules "were removed," but they're live in the rule file (relational.rewrite:131 distribute_negate_join, :136 factor_negate_join). The newest workstream (2026-06-21-polarity-aware-extractor.md) re-enabled them, with soundness moved into the extractor ("no Negate-rooted representative directly under a non-linear Reduce/TopK") rather than a rule guard. That's a defensible design — but it makes the extractor's polarity lattice a load-bearing correctness mechanism, and the corresponding Lean theorems (negative-multiplicity / non-linear-reduce) are among the 24 sorrys. So the formal-soundness story for the riskiest rules is currently aspirational.
5. The Lean spec is not gated by CI and can rot. There's no CI job that builds src/transform/lean/, and no check that gen-lean's Generated.lean is in sync with the rule file (unlike the repo's other generated artifacts, which have sync gates). 24/32 theorems are sorry. Without a gate, (a) a rule change can desync Generated.lean unnoticed, and (b) the sorry count can grow silently. Either wire a real gate (build Lean, fail on new sorry, assert gen-lean output is regenerated) or state plainly in the docs that the Lean artifact is advisory, not enforced — right now the prose implies more assurance than CI provides.
6. raise alone can emit un-renderable plans; correctness depends on a bundled cleanup. lib.rs/raise.rs note that raise can produce a single Reduce mixing reduction types that ReducePlan::create_from panics on, and only the bundled logical_fixpoint_02 re-run (raise.rs:306) splits it via ReduceReduction. That's an implicit invariant on a comment. Any future caller of raise that skips the cleanup crashes rendering. Make it an enforced invariant + test ("raise output never contains a mixed-reduction Reduce"), or split the reduction inside raise itself.
Architecture / cost
7. The keystone the design-review prescribed was only half-done. The 2026-06-19 design review's #1 recommendation was to retarget ENode onto real MirRelationExpr/MirScalarExpr and delete the dual Rel/lower/raise IR, calling the dual IR "structurally unsuited to replace production passes" (lossy round-trips, type loss, per-operator bail treadmill). What landed is a partial retarget — scalar payloads are real MirScalarExpr, but the relational dual IR remains. So every problem that recommendation targeted persists: raise still drops constant-singleton identity inputs via join_scalars (raise.rs:170), still synthesizes placeholder types (#3), and still bails per-operator. Please reconcile this explicitly: is direct-on-MIR still the plan, or is the dual IR the accepted steady state? The docs don't close the loop with the design review's verdict.
8. The default-on path runs fixpoint_logical_02 twice, back-to-back, plus a full Demand/ProjectionPushdown and two coalesce_mfp passes. EqSatTransform is pushed right after fixpoint_logical_02() in logical_optimizer (lib.rs:815 then the push), and optimize_logical re-runs the entire fixpoint_logical_02 inside raise (raise.rs:306), plus coalesce_mfp twice and full Demand+ProjectionPushdown (eqsat.rs:139-164). For zero plan benefit today, that's a lot of repeated optimizer work. Combined with MAX_PLAN_SIZE = 200 (which makes the pass a no-op on most non-trivial plans anyway), the net effect at default-on is "added latency on small plans, parity output." Please measure the added optimize latency (the status doc itself cites 27s and ~6.5s episodes that were tamed by caps, not by being fast) and let that inform the default-on decision.
9. The "wins" aren't wins. compare_real reports 3 wins / 0 losses / 17 ties, but the 3 are self-described as "cost-model artifacts (eqsat omits a canonicalizing Project)." An omitted canonicalizing projection is plausibly a less-canonical plan, not an improvement. I'd treat the harness as showing parity, and double-check the omitted Project isn't a latent canonicalization regression rather than a feature.
Repo hygiene / process
10. ~3k lines of agent working docs are committed under docs/superpowers/ (plus .superpowers/sdd/e1-report.md), several already stale relative to the code: the design review predates the retarget; the status doc claims "32 rules / negate rules removed / CanonicalizeMfp cutover done" while the code has 34 rules, negate rules enabled, and the cutover reverted. Decide whether these belong in the product repo at all (vs. doc/developer/design/ per existing convention) and, if kept, mark the historical ones as such so they don't read as current truth.
11. 61 commits / 63 files / 16.5k lines in one PR is effectively un-reviewable by a human as a unit. Strongly consider splitting for landing: (a) flag plumbing + no-op registration, (b) the engine, (c) the Lean spec + its CI gate, (d) the SLT golden changes. Each is independently reviewable and bisectable.
What I'd do to "complete the work"
- Resolve #1/#2 as a unit — decide the merge-time flag state and make the goldens consistent with it (I'd default off and gate the divergent goldens explicitly until the first real pass deletion lands, since default-on buys nothing today and adds risk).
- Fix the empty-constant type loss (#3) and add the type round-trip test; promote the mixed-
Reduceinvariant (#6) from comment to test. - Add the Lean CI gate or downgrade the soundness claims in prose (#5); track the negate-join
sorrys (#4) as explicit release-blockers for the physical flag, not the logical one. - Write the docs reconciliation: dual-IR-vs-direct-on-MIR decision (#7), refresh the stale status doc, and quantify the latency (#8).
- Then the strangler-fig payoff becomes real: land the first actual fixpoint deletion behind the flag with a green SLT gate — that's the milestone that converts this from "parity at a cost" into "a pass worth defaulting on."
Happy to dig into any single thread (the polarity lattice, the cost model's memory axis, or the empty-type path) in more depth.
Generated by Claude Code
…lysis) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…oundness) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
The eqsat rules union_cancel / empty_false_filter synthesize Empty(r) as a
Constant { card: 0, .. } that previously carried only arity. Raise typed every
column as Int64?, so collapsing a non-Int64 branch emitted a wrong-typed plan
that still typechecked. The replaced relation r is gone by raise time, so
capture its column types at synthesis: derive them structurally over the
e-graph (mirroring MirRelationExpr::typ) and store them on the Constant node.
Raise reads the stored types, falling back to the arity placeholder only when
derivation cannot pin them. The live boundary guard now also rejects per-column
scalar-type drift, closing the residual fallback case.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
…wering panic) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
CI Clippy flagged `ctx.clone()` on a ref-counted `SharedTypecheckingContext` in the Empty-types regression test; use `Arc::clone`. CI Doctests (`cargo doc -D warnings`) flagged seven intra-doc links in the eqsat module docs that resolve to private or renamed items; demote each to a plain code span, preserving the prose. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
The equality-saturation optimizer (default-on) proves four builtin-view String columns non-nullable that the hardcoded descs declared nullable, so `verify_builtin_descs` failed. Each tightening is sound: * `mz_cluster_replica_history.replica_id`: the `creates` CTE filters `details ->> 'replica_id' IS NOT NULL` and the column is projected through the preserved (left) side of the LEFT JOINs. * `mz_cluster_replica_history.cluster_id`: the `creates` CTE filters `details ->> 'cluster_id' !~~ 's%'`, which a NULL never satisfies. * `mz_cluster_replica_name_history.id`: every UNION arm produces a non-null id (audit-event arms filter `like 'u%'`; the system-replica arm reads the non-null replica id). * `mz_console_cluster_utilization_overview.cluster_id`: inherits non-nullability from `mz_cluster_replica_history.cluster_id`. The eqsat boundary guard locks each column's scalar type but not its nullability, so the optimizer is free to tighten. Update the descs to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx
An e-graph / equality-saturation optimizer over a subset of MIR, wired into the logical optimizer behind
enable_eqsat_optimizer(default on) and the physical optimizer behindenable_eqsat_physical_optimizer(default off).It lowers MIR to a relational IR with real
MirScalarExprpayloads, saturates a ported rule engine over an e-graph, extracts the cheapest plan under a memory-primary cost model, and raises back. Unsupported subtrees bail to opaque leaves carried verbatim, so the pass is equivalence preserving on arbitrary input; an arity guard at the transform boundary makes any mismatch a no-op.The intended win is the cyclic-join decision: on a cyclic (e.g. triangle) join the AGM cost model proves worst-case-optimal (delta) evaluation dominates the binary join on memory and time, where
JoinImplementationpicks the dominated binary plan. The physical pass commits that decision by tagging the joinJoinImplementation::DeltaQuery(synthesized by reusing the real delta planner), whichJoinImplementationthen leaves unre-planned. WcoJoin is created only for cyclic joins, and a binary join's cost charges its persistent arrangements rather than its transient N-squared output, so the choice is delta where it wins and binary parity elsewhere.Supporting infrastructure: a polarity-aware extractor that never selects a multiplicity-signed representative as the input of a non-linear
Reduce/TopK(with a graceful no-op when a root is otherwise unextractable), a logical/physicalphaseannotation on rewrite rules, and a constant-column e-class analysis. These are equivalence preserving and scoped to the appropriate pass.A prior cutover that skipped
CanonicalizeMfpunder the flag is reverted, since the e-graph does not process every plan and so cannot subsume it.The logical flag is default-on as the intended steady state (the strangler-fig plan: grow eqsat until it can delete pipeline passes). CI exercises the pass across the SLT corpus with the flag on and surfaces any plan or result diffs; the affected goldens are rewritten to the eqsat-on output. The logical pass is parity, not improvement, today; the genuine plan win is the physical worst-case-optimal-join decision behind the second, default-off flag.
🤖 Generated with Claude Code
https://claude.ai/code/session_016CwwG1wCgqf3v8g6uhf1Nx