Skip to content

Support LEFT OUTER JOIN and RIGHT OUTER JOIN#4122

Open
RobertBrunel wants to merge 3 commits into
FoundationDB:mainfrom
RobertBrunel:left-join
Open

Support LEFT OUTER JOIN and RIGHT OUTER JOIN#4122
RobertBrunel wants to merge 3 commits into
FoundationDB:mainfrom
RobertBrunel:left-join

Conversation

@RobertBrunel
Copy link
Copy Markdown
Contributor

@RobertBrunel RobertBrunel commented Apr 30, 2026

Support LEFT OUTER JOIN and RIGHT OUTER JOIN

This change introduces support for left and right outer joins.

A dedicated QGM box (OuterJoinExpression) represents one outer join. It is strictly binary (unlike the SELECT box) and carries the join type (LEFT/RIGHT/FULL), the ON-clause predicates, and a reference to the “preserved” and the “null-supplying” quantifier. During the rewriting phase, an exploration rule RewriteOuterJoinRule canonicalizes the outer join box into two nested select boxes:

  • The preserved side is connected to the outer SelectExpression through a normal FOREACH quantifier.
  • The null-supplying side is wrapped in an inner SelectExpression carrying the ON predicates and is connected through a FOREACH quantifier with nullOnEmpty set to true.

The rewrite happens during canonicalization so that all subsequent planning rules (predicate push-down, join ordering, implementation) handle the join as a normal nested SELECT. No other rules need to know about OuterJoinExpression. The RewritingCostModel penalizes any surviving OuterJoinExpression, ensuring the rewritten form always wins.

Key changes:

  • OuterJoinExpression represents the outer join in the QGM.
  • QueryVisitor parses the OUTER JOIN syntax and constructs the logical OuterJoinExpression.
  • RewriteOuterJoinRule rewrites the OuterJoinExpression into nested SelectExpression boxes; it is registered in RewritingRuleSet so it fires during the canonicalization phase.
  • RewritingCostModel consults a new ExpressionCountProperty.outerJoinCount ahead of selectCount so an un-rewritten OuterJoinExpression is always more expensive than the canonical two-SELECT form.
  • Supporting changes in CardinalitiesProperty, RecordTypesProperty, LogicalPlanFragment, and RelationalExpressionMatchers to teach existing utilities about OuterJoinExpression, so cardinality, ordering, and record-type properties propagate correctly through the new node.

Testing:

  • join-tests-outer.yamsql integration tests covering LEFT and RIGHT OUTER JOIN semantics, anti-join patterns, predicate placement (ON vs WHERE) on either side of the join, and predicate push-down into either source.

Resolves #4151.

@RobertBrunel RobertBrunel self-assigned this Apr 30, 2026
@RobertBrunel RobertBrunel added the enhancement New feature or request label Apr 30, 2026
@RobertBrunel RobertBrunel requested a review from normen662 April 30, 2026 16:56
Copy link
Copy Markdown
Collaborator

@alecgrieser alecgrieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like in general a sound approach to me, and it appears to be correct in its aims. I have a few questions about the approach, as well as some documentation and testing suggestions. There are some comments in the OuterJoinExpression code that could probably be answered with either additional comments (perhaps in the PR or perhaps just in GitHub) or potentially more substantive code changes

Comment thread docs/sphinx/source/reference/sql_commands/DQL/JOIN.rst Outdated
Comment thread docs/sphinx/source/reference/sql_commands/DQL/JOIN.rst Outdated
Comment thread docs/sphinx/source/reference/Joins.rst Outdated
* {@code FULL} are defined here to allow the model to represent them, but no planning rules support them yet.
*/
@API(API.Status.EXPERIMENTAL)
public class OuterJoinExpression extends AbstractRelationalExpressionWithChildren
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for my own understanding: what was the reasoning behind introducing a new relational expression (and a rewrite rule to turn it into a SelectExpression) instead of having the PlanGenerator create a SelectExpression with the children set up in the manner we expect (i.e., with null-on-empty on the null-producing side)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an earlier version do it right in the PlanGenerator, but felt that made the plan generator “too smart” and that rewriting OuterJoinExpression is a very natural fit for a self-contained rewrite rule. In the planner debugger, we can then observe the initial QGM that reflects the original query, and how the OuterJoinExpression gets reduced. Also for future-proofing, I was thinking it'll be convenient to have a OuterJoinExpression that can represent the other OUTER JOIN variants as well and that is trivial to construct from PlanGenerator. And maybe we will at some point add further rules that rewrite OuterJoinExpression in other ways. There are SQL rewrites for purposes of unnesting that even introduce new outer joins during rewriting.

That said, it does add an extra hop to the planning (with non-zero costs) and a certain amount of boilerplate. So I’m somewhat undecided about this myself.

Copy link
Copy Markdown
Contributor

@hatyo hatyo May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I do like this approach, especially that it keeps the plan generation logic a bit simpler (as it should).

Comment thread yaml-tests/src/test/java/YamlIntegrationTests.java Outdated
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql Outdated
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql Outdated
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql
RobertBrunel added a commit to RobertBrunel/fdb-record-layer that referenced this pull request May 8, 2026
* General cleanup pass over `INNER_JOIN.rst` and `Joins.rst`.
* Rename `INNER_JOIN.rst` to `JOIN.rst`.
* Add documentation of LEFT OUTER JOIN to `JOIN.rst`. Support for left joins is introduced by PR FoundationDB#4122.
@RobertBrunel RobertBrunel changed the title Support LEFT OUTER JOIN [draft] Support LEFT OUTER JOIN May 8, 2026
@RobertBrunel RobertBrunel force-pushed the left-join branch 2 times, most recently from 80dd43a to 76e80f0 Compare May 8, 2026 19:56
@RobertBrunel RobertBrunel marked this pull request as ready for review May 8, 2026 19:59
@RobertBrunel RobertBrunel force-pushed the left-join branch 3 times, most recently from ab490f9 to 1659dfa Compare May 11, 2026 14:53
@RobertBrunel RobertBrunel requested review from alecgrieser and hatyo May 11, 2026 14:53
…ectness

`PullUpNullOnEmptyRule` splits a `SelectExpression` featuring a null-on-empty quantifier into two selects. However, it assigns the predicates only to the lower select. To be correct, it needs to apply them to the upper `SelectExpression` as well (“to act on any nulls produced by this quantifier”, as the Javadoc comment on the rule already says). Without this bugfix, WHERE predicates may get incorrectly pushed past the null-on-empty boundary.

Testing:
* Introduce `PullUpNullOnEmptyRuleTest` and add a regression test.

Fixes FoundationDB#4148.
This change introduces support for left and right outer joins.

A dedicated QGM box (`OuterJoinExpression`) represents one outer join. It is strictly binary (unlike the SELECT box) and carries the join type (LEFT/RIGHT/FULL), the ON-clause predicates, and a reference to the “preserved” and the “null-supplying” quantifier. During the rewriting phase, an exploration rule `RewriteOuterJoinRule` canonicalizes the outer join box into two nested select boxes:

* The preserved side is connected to the outer `SelectExpression` through a normal FOREACH quantifier.
* The null-supplying side is wrapped in an inner `SelectExpression` carrying the ON predicates and is connected through a FOREACH quantifier with `nullOnEmpty` set to true.

The rewrite happens during canonicalization so that all subsequent planning rules (predicate push-down, join ordering, implementation) handle the join as a normal nested SELECT. No other rules need to know about `OuterJoinExpression`. The `RewritingCostModel` penalizes any surviving `OuterJoinExpression`, ensuring the rewritten form always wins.

Key changes:

* `OuterJoinExpression` represents the outer join in the QGM.
* `QueryVisitor` parses the `OUTER JOIN` syntax and constructs the logical `OuterJoinExpression`.
* `RewriteOuterJoinRule` rewrites the `OuterJoinExpression` into nested `SelectExpression` boxes; it is registered in `RewritingRuleSet` so it fires during the canonicalization phase.
* `RewritingCostModel` consults a new `ExpressionCountProperty.outerJoinCount` ahead of `selectCount` so an un-rewritten `OuterJoinExpression` is always more expensive than the canonical two-SELECT form.
* Supporting changes in `CardinalitiesProperty`, `RecordTypesProperty`, `LogicalPlanFragment`, and `RelationalExpressionMatchers` to teach existing utilities about `OuterJoinExpression`, so cardinality, ordering, and record-type properties propagate correctly through the new node.

Testing:
* `join-tests-outer.yamsql` integration tests covering LEFT and RIGHT OUTER JOIN semantics, anti-join patterns, predicate placement (ON vs WHERE) on either side of the join, and predicate push-down into either source.

Resolves FoundationDB#4151.
@github-actions
Copy link
Copy Markdown

📊 Metrics Diff Analysis Report

Summary

  • New queries: 37
  • Dropped queries: 0
  • Plan changed + metrics changed: 0
  • Plan unchanged + metrics changed: 0
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/join-tests-outer.metrics.yaml: 37

@RobertBrunel RobertBrunel changed the title Support LEFT OUTER JOIN Support LEFT OUTER JOIN and RIGHT OUTER JOIN May 12, 2026
RobertBrunel added a commit to RobertBrunel/fdb-record-layer that referenced this pull request May 13, 2026
* General cleanup pass over `INNER_JOIN.rst` and `Joins.rst`.
* Rename `INNER_JOIN.rst` to `JOIN.rst`.
* Add documentation of OUTER JOIN to `JOIN.rst`. Support for outer joins is introduced in PR FoundationDB#4122.
-
# SELECT * with LEFT JOIN exposes all columns from both sides.
# Note: The full scan and residual FILTER plan here is suboptimal. The planner misses the opportunity to push down
# the FILTER because it does not normalize/flip join predicates currently.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To Do: Mention Issue #4169.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for LEFT OUTER JOIN and RIGHT OUTER JOIN

3 participants