Fix leaf expression reconciliation#22971
Merged
adriangb merged 4 commits intoJun 16, 2026
Merged
Conversation
Simplify recovery-projection conditions in split_and_push_projection The recovery decision had four overlapping triggers. Two of them are fully subsumed by the field-count check (any schema widening also changes the field count), so remove them along with the now-dead `standalone_columns` tracking: - the `columns_needed`-widening heuristic - the `transformed || !is_column` heuristic What remains are the two genuine, complementary cases: the `needs_alias` branch catches an equal-arity rename (which the count check cannot see), and the field-count check catches a leaked-column widening (which the rename check cannot see). Add comments explaining why a count comparison is used deliberately rather than a stricter ordered/qualified schema equality, which spuriously fires on benign reorders and qualifier differences. No behavior change: all extract_leaf tests and struct.slt pass unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ed934da to
c2ee386
Compare
My earlier simplification removed the `columns_needed`-widening heuristic (and its `standalone_columns` tracking) on the theory that the field-count check subsumed it. It does not: when a merge widens an *inner* projection with column refs that came from inside extracted aliases (e.g. CSE-generated `__common_expr_*` refs), the immediate `base_plan` field count can match the original while the schema has still drifted. Dropping the heuristic reintroduces `Schema error: No field named __common_expr_1` on the extended sqllogictest suite (expr.slt / projection_pushdown.slt), which the unit tests and struct.slt did not exercise. Revert to the original, verified logic. A simplification can be revisited later only if validated against the full extended slt suite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Of the two recovery heuristics I previously tried to remove together, only one is actually redundant. Isolating them one at a time against the full sqllogictest suite shows: - the `transformed || !is_column` heuristic is LOAD-BEARING: removing it reintroduces `Schema error: No field named __common_expr_1` on expr.slt (a transformed/CSE expression whose recovery the field-count check cannot see). It is kept. - the `columns_needed`-widening heuristic (and its `standalone_columns` tracking) is genuinely redundant given the above plus the field-count check: any widening it would catch is already covered. Remove only the redundant one. Validated with no behavior change: all 49 extract_leaf unit tests and the full 485-file sqllogictest suite pass, including expr.slt and projection_pushdown.slt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The recovery decision was spread across three conditions: a field-count delta, a `transformed || !is_column` heuristic, and a `needs_alias` flag. Together they were a proxy for the actual invariant: a recovery projection is needed iff `base_plan` no longer exposes the original projection's set of output columns. That happens two ways — a column is renamed (a transformed expr now surfaces as its internal `__datafusion_extracted_*` alias) or a column is leaked (pushdown widens base_plan with another extraction's aliases). Both are exactly a change in the set of output names. Collapse the three into one comparison of the set of unqualified field names. Unqualified is deliberate: extracted aliases are globally unique so name-only comparison is unambiguous for them, while it ignores benign column reordering and SubqueryAlias re-qualification that a qualified/ordered comparison spuriously treats as drift (stacking redundant recovery projections — confirmed when I first tried qualified equality). Validated as a pure refactor: all 49 extract_leaf unit tests pass with byte-identical inline snapshots, and the full 485-file sqllogictest suite passes (incl. expr.slt / projection_pushdown.slt). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
adriangb
approved these changes
Jun 16, 2026
cetra3
added a commit
to pydantic/datafusion
that referenced
this pull request
Jun 16, 2026
- Closes apache#22955 This fixes a bug with the extract leaf expressions This is a one liner that sanity checks the schema is the same length when we are doing expression pushdown Yes, a couple of tests have been added. Nope! --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This fixes a bug with the extract leaf expressions
What changes are included in this PR?
This is a one liner that sanity checks the schema is the same length when we are doing expression pushdown
Are these changes tested?
Yes, a couple of tests have been added.
Are there any user-facing changes?
Nope!