[SPARK-56660][SQL] Decompose struct equality into field-level predicates for filter pushdown by yadavay-amzn · Pull Request #56244 · apache/spark

yadavay-amzn · 2026-06-01T06:44:32Z

What changes were proposed in this pull request?

Add optimizer rule DecomposeStructComparison that rewrites struct-level equality (= and <=>) into a conjunction of field-level equalities. This enables filter pushdown for individual struct fields.

For example, struct_col = struct(1, 'a') becomes struct_col.field1 = 1 AND struct_col.field2 = 'a'.

Why are the changes needed?

Struct literal comparisons and tuple comparisons are treated as opaque predicates by the optimizer. Data source filter pushdown only understands scalar predicates, so struct equality cannot be pushed down for file pruning (Parquet row group skipping, partition pruning, etc.), even though the equivalent scalar predicates would be pushed.

Does this PR introduce any user-facing change?

Yes. Queries filtering on struct equality will now benefit from file pruning and filter pushdown, improving performance on large tables.

How was this patch tested?

Added StructPredicateDecomposeSuite with tests covering EqualTo, EqualNullSafe, nested structs, single-field structs, empty structs, tuple comparisons, non-deterministic guard, and GreaterThan exclusion.

Was this patch authored or co-authored using generative AI tooling?

Yes.

yyanyy

Thanks for making this change!

yadavay-amzn · 2026-06-05T01:57:13Z

@yyanyy Thanks for reviewing and great catch on the NULL semantics, you're right!

Spark's struct equality uses InterpretedOrdering which treats null=null within fields as equal (returns TRUE), while EqualTo(null, null) returns NULL.

Fixed: the decomposition now uses EqualNullSafe (<=>) for per-field comparisons, which matches the struct equality semantics exactly:

null <=> null → true (matches struct behavior)
null <=> 2 → false (matches struct behavior)

The only remaining discrepancy is when the entire struct itself is null (original returns NULL, decomposed returns FALSE), but since our rule only fires in Filter context, this is harmless (both NULL and FALSE exclude the row from WHERE).

Also added a width guard (max 100 fields) to prevent stack overflow on very wide/deeply nested structs, per your second concern.

…tes for filter pushdown

…Conf; rework tests Addresses review feedback on PR apache#56244: 1. Correctness fix for NULL handling. The original decomposition rewrote EqualTo(struct, struct) into a plain conjunction of per-field EqualTo comparisons, which silently changed semantics for non-null structs that contained NULL fields: - Before this PR: struct(1, null) = struct(1, null) returned TRUE (Spark's whole-struct EqualTo evaluates ordering.equiv on the row, which treats per-field NULL == NULL as equal). - With original PR apache#56244: returned NULL. The fix wraps the conjunction with a null-check that mirrors the original outer null behavior: - EqualTo(L, R) over nullable structs: IF (L IS NULL OR R IS NULL) THEN NULL ELSE And(EqualNullSafe(L.fi, R.fi)). - EqualNullSafe(L, R): IF (L IS NULL AND R IS NULL) THEN TRUE ELSE IF (L IS NULL OR R IS NULL) THEN FALSE ELSE And(EqualNullSafe(L.fi, R.fi)). The wrappers fold out cleanly when either operand is non-nullable, leaving the simple conjunction in the common `CreateNamedStruct = column` pushdown case. 2. SQLConf gate. Add `spark.sql.optimizer.decomposeStructComparison.enabled` (default false) so users opt in once the behavior has soaked. Add `spark.sql.optimizer.decomposeStructComparison.maxFields` (default 1000) that bounds total decomposed predicates including recursively nested struct fields, replacing the unprincipled per-level field cap of 100. 3. Scaladoc explaining Filter scope. Document why join conditions and aggregate grouping keys are deliberately not rewritten. 4. Tests reworked as oracle tests. The original suite asserted post-rewrite NULL behavior directly, which codified the regression as expected. The rewritten suite uses two patterns: - Catalyst-level: build expressions and assert eval result of original expression equals eval result of rewritten expression on representative inputs (struct(1, null), whole-struct null, Not wrapper, etc.). - End-to-end: run each query with the rule enabled and with the conf disabled; assert row sets are identical. Added tests for: Not(struct = struct) with NULL fields, whole-struct null on one side, conf gating. Removed: 3 wrong-oracle NULL tests, structural- only "nullable fields decomposes" test, duplicate LessThan, duplicate 3-level nested, single-field, duplicate join test in catalyst suite.

yadavay-amzn force-pushed the fix/SPARK-56660-struct-predicate-decompose branch 3 times, most recently from 857e4be to a9a74c4 Compare June 3, 2026 01:15

yyanyy reviewed Jun 5, 2026

View reviewed changes

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala Outdated

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala Outdated

yadavay-amzn force-pushed the fix/SPARK-56660-struct-predicate-decompose branch from a9a74c4 to 76dca41 Compare June 5, 2026 01:53

[SPARK-56660][SQL] Decompose struct equality into field-level predica…

707a859

…tes for filter pushdown

yadavay-amzn force-pushed the fix/SPARK-56660-struct-predicate-decompose branch from 76dca41 to 707a859 Compare June 9, 2026 00:05

yadavay-amzn force-pushed the fix/SPARK-56660-struct-predicate-decompose branch from 7de21ca to 8941106 Compare June 16, 2026 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56660][SQL] Decompose struct equality into field-level predicates for filter pushdown#56244

[SPARK-56660][SQL] Decompose struct equality into field-level predicates for filter pushdown#56244
yadavay-amzn wants to merge 2 commits into
apache:masterfrom
yadavay-amzn:fix/SPARK-56660-struct-predicate-decompose

yadavay-amzn commented Jun 1, 2026 •

edited

Loading

Uh oh!

yyanyy left a comment

Uh oh!

Uh oh!

Uh oh!

yadavay-amzn commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yadavay-amzn commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

yyanyy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yadavay-amzn commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yadavay-amzn commented Jun 1, 2026 •

edited

Loading

yadavay-amzn commented Jun 5, 2026 •

edited

Loading