Support LEFT OUTER JOIN and RIGHT OUTER JOIN#4122
Conversation
alecgrieser
left a comment
There was a problem hiding this comment.
This seems like in general a sound approach to me, and it appears to be correct in its aims. I have a few questions about the approach, as well as some documentation and testing suggestions. There are some comments in the OuterJoinExpression code that could probably be answered with either additional comments (perhaps in the PR or perhaps just in GitHub) or potentially more substantive code changes
| * {@code FULL} are defined here to allow the model to represent them, but no planning rules support them yet. | ||
| */ | ||
| @API(API.Status.EXPERIMENTAL) | ||
| public class OuterJoinExpression extends AbstractRelationalExpressionWithChildren |
There was a problem hiding this comment.
Mainly for my own understanding: what was the reasoning behind introducing a new relational expression (and a rewrite rule to turn it into a SelectExpression) instead of having the PlanGenerator create a SelectExpression with the children set up in the manner we expect (i.e., with null-on-empty on the null-producing side)?
There was a problem hiding this comment.
I had an earlier version do it right in the PlanGenerator, but felt that made the plan generator “too smart” and that rewriting OuterJoinExpression is a very natural fit for a self-contained rewrite rule. In the planner debugger, we can then observe the initial QGM that reflects the original query, and how the OuterJoinExpression gets reduced. Also for future-proofing, I was thinking it'll be convenient to have a OuterJoinExpression that can represent the other OUTER JOIN variants as well and that is trivial to construct from PlanGenerator. And maybe we will at some point add further rules that rewrite OuterJoinExpression in other ways. There are SQL rewrites for purposes of unnesting that even introduce new outer joins during rewriting.
That said, it does add an extra hop to the planning (with non-zero costs) and a certain amount of boilerplate. So I’m somewhat undecided about this myself.
There was a problem hiding this comment.
I think I do like this approach, especially that it keeps the plan generation logic a bit simpler (as it should).
* General cleanup pass over `INNER_JOIN.rst` and `Joins.rst`. * Rename `INNER_JOIN.rst` to `JOIN.rst`. * Add documentation of LEFT OUTER JOIN to `JOIN.rst`. Support for left joins is introduced by PR FoundationDB#4122.
80dd43a to
76e80f0
Compare
ab490f9 to
1659dfa
Compare
…ectness `PullUpNullOnEmptyRule` splits a `SelectExpression` featuring a null-on-empty quantifier into two selects. However, it assigns the predicates only to the lower select. To be correct, it needs to apply them to the upper `SelectExpression` as well (“to act on any nulls produced by this quantifier”, as the Javadoc comment on the rule already says). Without this bugfix, WHERE predicates may get incorrectly pushed past the null-on-empty boundary. Testing: * Introduce `PullUpNullOnEmptyRuleTest` and add a regression test. Fixes FoundationDB#4148.
This change introduces support for left and right outer joins. A dedicated QGM box (`OuterJoinExpression`) represents one outer join. It is strictly binary (unlike the SELECT box) and carries the join type (LEFT/RIGHT/FULL), the ON-clause predicates, and a reference to the “preserved” and the “null-supplying” quantifier. During the rewriting phase, an exploration rule `RewriteOuterJoinRule` canonicalizes the outer join box into two nested select boxes: * The preserved side is connected to the outer `SelectExpression` through a normal FOREACH quantifier. * The null-supplying side is wrapped in an inner `SelectExpression` carrying the ON predicates and is connected through a FOREACH quantifier with `nullOnEmpty` set to true. The rewrite happens during canonicalization so that all subsequent planning rules (predicate push-down, join ordering, implementation) handle the join as a normal nested SELECT. No other rules need to know about `OuterJoinExpression`. The `RewritingCostModel` penalizes any surviving `OuterJoinExpression`, ensuring the rewritten form always wins. Key changes: * `OuterJoinExpression` represents the outer join in the QGM. * `QueryVisitor` parses the `OUTER JOIN` syntax and constructs the logical `OuterJoinExpression`. * `RewriteOuterJoinRule` rewrites the `OuterJoinExpression` into nested `SelectExpression` boxes; it is registered in `RewritingRuleSet` so it fires during the canonicalization phase. * `RewritingCostModel` consults a new `ExpressionCountProperty.outerJoinCount` ahead of `selectCount` so an un-rewritten `OuterJoinExpression` is always more expensive than the canonical two-SELECT form. * Supporting changes in `CardinalitiesProperty`, `RecordTypesProperty`, `LogicalPlanFragment`, and `RelationalExpressionMatchers` to teach existing utilities about `OuterJoinExpression`, so cardinality, ordering, and record-type properties propagate correctly through the new node. Testing: * `join-tests-outer.yamsql` integration tests covering LEFT and RIGHT OUTER JOIN semantics, anti-join patterns, predicate placement (ON vs WHERE) on either side of the join, and predicate push-down into either source. Resolves FoundationDB#4151.
📊 Metrics Diff Analysis ReportSummary
ℹ️ About this analysisThis automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:
The last category in particular may indicate planner regressions that should be investigated. New QueriesCount of new queries by file:
|
* General cleanup pass over `INNER_JOIN.rst` and `Joins.rst`. * Rename `INNER_JOIN.rst` to `JOIN.rst`. * Add documentation of OUTER JOIN to `JOIN.rst`. Support for outer joins is introduced in PR FoundationDB#4122.
| - | ||
| # SELECT * with LEFT JOIN exposes all columns from both sides. | ||
| # Note: The full scan and residual FILTER plan here is suboptimal. The planner misses the opportunity to push down | ||
| # the FILTER because it does not normalize/flip join predicates currently. |
Support LEFT OUTER JOIN and RIGHT OUTER JOIN
This change introduces support for left and right outer joins.
A dedicated QGM box (
OuterJoinExpression) represents one outer join. It is strictly binary (unlike the SELECT box) and carries the join type (LEFT/RIGHT/FULL), the ON-clause predicates, and a reference to the “preserved” and the “null-supplying” quantifier. During the rewriting phase, an exploration ruleRewriteOuterJoinRulecanonicalizes the outer join box into two nested select boxes:SelectExpressionthrough a normal FOREACH quantifier.SelectExpressioncarrying the ON predicates and is connected through a FOREACH quantifier withnullOnEmptyset to true.The rewrite happens during canonicalization so that all subsequent planning rules (predicate push-down, join ordering, implementation) handle the join as a normal nested SELECT. No other rules need to know about
OuterJoinExpression. TheRewritingCostModelpenalizes any survivingOuterJoinExpression, ensuring the rewritten form always wins.Key changes:
OuterJoinExpressionrepresents the outer join in the QGM.QueryVisitorparses theOUTER JOINsyntax and constructs the logicalOuterJoinExpression.RewriteOuterJoinRulerewrites theOuterJoinExpressioninto nestedSelectExpressionboxes; it is registered inRewritingRuleSetso it fires during the canonicalization phase.RewritingCostModelconsults a newExpressionCountProperty.outerJoinCountahead ofselectCountso an un-rewrittenOuterJoinExpressionis always more expensive than the canonical two-SELECT form.CardinalitiesProperty,RecordTypesProperty,LogicalPlanFragment, andRelationalExpressionMatchersto teach existing utilities aboutOuterJoinExpression, so cardinality, ordering, and record-type properties propagate correctly through the new node.Testing:
join-tests-outer.yamsqlintegration tests covering LEFT and RIGHT OUTER JOIN semantics, anti-join patterns, predicate placement (ON vs WHERE) on either side of the join, and predicate push-down into either source.Resolves #4151.