Skip to content

feat(optimizer): Rewrite ROW constructor IN to disjunction for partition pruning#27500

Open
kaikalur wants to merge 1 commit intoprestodb:masterfrom
kaikalur:rewrite-row-constructor-in-to-disjunction
Open

feat(optimizer): Rewrite ROW constructor IN to disjunction for partition pruning#27500
kaikalur wants to merge 1 commit intoprestodb:masterfrom
kaikalur:rewrite-row-constructor-in-to-disjunction

Conversation

@kaikalur
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur commented Apr 3, 2026

Summary

Add a new iterative optimizer rule RewriteRowConstructorInToDisjunction that rewrites predicates of the form:

ROW(pk1, pk2) IN (ROW('a', 1), ROW('b', 2))

into:

(pk1 = 'a' AND pk2 = 1) OR (pk1 = 'b' AND pk2 = 2)

Motivation

The RowExpressionDomainTranslator (used by PickTableLayout) cannot extract per-column TupleDomain constraints from ROW-level IN predicates. This means Hive partition pruning is impossible — the domain translator sees TupleDomain{ALL} (full table scan).

After this rewrite, the domain translator extracts proper per-column domains like {pk1 -> {a,b}, pk2 -> {1,2}}, enabling HivePartitionManager to prune partitions.

Design

  • Rule: RewriteRowConstructorInToDisjunction matches FilterNode -> TableScanNode
  • Guard: Only fires when ALL fields of the left-side ROW constructor are partition key columns
  • Session property: rewrite_row_constructor_in_to_disjunction (default: disabled)
  • Pipeline position: Registered before the first PickTableLayout in PlanOptimizers

Test Plan

8 unit tests. All pass: TestRewriteRowConstructorInToDisjunction, TestFeaturesConfig

== RELEASE NOTES ==

General Changes
* Add optimizer rule RewriteRowConstructorInToDisjunction that rewrites ROW IN ROW predicates into OR of AND equality chains when all ROW fields are partition keys, enabling per-column TupleDomain extraction for partition pruning. Gated behind session property rewrite_row_constructor_in_to_disjunction (default disabled).

@kaikalur kaikalur requested review from a team, feilong-liu and jaystarshot as code owners April 3, 2026 02:52
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Apr 3, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 3, 2026

Reviewer's Guide

Adds a new iterative optimizer rule RewriteRowConstructorInToDisjunction that, when enabled via a new session/feature flag, rewrites ROW-based IN predicates over partition key columns on table scans into disjunctions of per-column equality predicates to enable TupleDomain extraction and partition pruning, wires it into the optimizer pipeline before PickTableLayout, exposes configuration via FeaturesConfig/SystemSessionProperties, and provides comprehensive rule tests including a custom mock connector for partition metadata and a TupleDomain behavior test.

Sequence diagram for RewriteRowConstructorInToDisjunction optimization and partition pruning

sequenceDiagram
    participant Planner
    participant IterativeOptimizer_RewriteRowConstructorInToDisjunction
    participant RewriteRowConstructorInToDisjunction
    participant SystemSessionProperties
    participant Metadata
    participant RowExpressionDomainTranslator
    participant HivePartitionManager

    Planner->>IterativeOptimizer_RewriteRowConstructorInToDisjunction: optimize(plan)
    IterativeOptimizer_RewriteRowConstructorInToDisjunction->>RewriteRowConstructorInToDisjunction: apply(FilterNode, Captures, Context)

    RewriteRowConstructorInToDisjunction->>SystemSessionProperties: isRewriteRowConstructorInToDisjunction(Session)
    SystemSessionProperties-->>RewriteRowConstructorInToDisjunction: boolean enabled

    alt rule_disabled
        RewriteRowConstructorInToDisjunction-->>IterativeOptimizer_RewriteRowConstructorInToDisjunction: Result.empty()
    else rule_enabled
        RewriteRowConstructorInToDisjunction->>Metadata: getTableMetadata(Session, TableHandle)
        Metadata-->>RewriteRowConstructorInToDisjunction: ConnectorTableMetadata

        RewriteRowConstructorInToDisjunction->>Metadata: getColumnHandles(Session, TableHandle)
        Metadata-->>RewriteRowConstructorInToDisjunction: Map columnHandles

        RewriteRowConstructorInToDisjunction-->>IterativeOptimizer_RewriteRowConstructorInToDisjunction: Result.ofPlanNode(new FilterNode with rewritten predicate)
    end

    IterativeOptimizer_RewriteRowConstructorInToDisjunction-->>Planner: optimized plan

    Planner->>RowExpressionDomainTranslator: fromPredicate(rewrittenPredicate)
    RowExpressionDomainTranslator-->>Planner: TupleDomain with per_column_domains

    Planner->>HivePartitionManager: getPartitions(Session, TableHandle, TupleDomain)
    HivePartitionManager-->>Planner: pruned_partitions
Loading

Class diagram for RewriteRowConstructorInToDisjunction and related configuration

classDiagram

    class FeaturesConfig {
        - boolean pullUpExpressionFromLambda
        - boolean rewriteConstantArrayContainsToIn
        - boolean rewriteExpressionWithConstantVariable
        - boolean rewriteRowConstructorInToDisjunction
        + boolean isRewriteRowConstructorInToDisjunction()
        + FeaturesConfig setRewriteRowConstructorInToDisjunction(boolean rewriteRowConstructorInToDisjunction)
    }

    class SystemSessionProperties {
        + static String REWRITE_ROW_CONSTRUCTOR_IN_TO_DISJUNCTION
        + static boolean isRewriteRowConstructorInToDisjunction(Session session)
        - List sessionProperties
    }

    class RewriteRowConstructorInToDisjunction {
        - Metadata metadata
        - FunctionResolution functionResolution
        + RewriteRowConstructorInToDisjunction(Metadata metadata)
        + Pattern getPattern()
        + Result apply(FilterNode filterNode, Captures captures, Context context)
        - Set resolvePartitionVariables(Session session, TableScanNode tableScan)
        - RowExpression rewritePredicate(RowExpression predicate, Set partitionVars)
        - RowExpression tryRewriteRowIn(SpecialFormExpression inExpr, Set partitionVars)
    }

    class PlanOptimizers {
        + PlanOptimizers(Metadata metadata, RuleStats ruleStats, StatsCalculator statsCalculator, ExchangesCostCalculator estimatedExchangesCostCalculator)
        - List planOptimizers
    }

    class FilterNode {
        + RowExpression getPredicate()
        + PlanNode getSource()
    }

    class TableScanNode {
        + TableHandle getTable()
        + Map getAssignments()
    }

    class Metadata {
        + TableMetadata getTableMetadata(Session session, TableHandle tableHandle)
        + Map getColumnHandles(Session session, TableHandle tableHandle)
        + FunctionAndTypeManager getFunctionAndTypeManager()
    }

    class FunctionResolution {
        + FunctionResolution(FunctionAndTypeResolver resolver)
        + FunctionHandle comparisonFunction(OperatorType operatorType, Type leftType, Type rightType)
    }

    class SpecialFormExpression {
        + Form getForm()
        + List getArguments()
    }

    class RowExpression {
    }

    class VariableReferenceExpression {
    }

    class Session {
    }

    class TableHandle {
    }

    class ColumnHandle {
    }

    class ConnectorTableMetadata {
        + Map getProperties()
    }

    class HivePartitionManager {
        + getPartitions(Session session, TableHandle tableHandle, TupleDomain tupleDomain)
    }

    FeaturesConfig ..> SystemSessionProperties : provides_defaults_for
    SystemSessionProperties ..> FeaturesConfig : reads_from

    PlanOptimizers ..> RewriteRowConstructorInToDisjunction : registers_rule
    PlanOptimizers ..> SystemSessionProperties : exposes_session_properties

    RewriteRowConstructorInToDisjunction --> Metadata : uses
    RewriteRowConstructorInToDisjunction ..> FilterNode : matches_and_rewrites
    RewriteRowConstructorInToDisjunction ..> TableScanNode : requires_partition_info
    RewriteRowConstructorInToDisjunction ..> SpecialFormExpression : inspects_IN_and_ROW
    RewriteRowConstructorInToDisjunction ..> RowExpression : rewrites_predicates
    RewriteRowConstructorInToDisjunction ..> VariableReferenceExpression : tracks_partition_vars
    RewriteRowConstructorInToDisjunction ..> Session : reads_session_property

    Metadata ..> ConnectorTableMetadata : returns
    Metadata ..> ColumnHandle : returns
    TableScanNode ..> TableHandle : references
    TableScanNode ..> ColumnHandle : assignments
    ConnectorTableMetadata ..> ColumnHandle : describes
Loading

File-Level Changes

Change Details Files
Introduce RewriteRowConstructorInToDisjunction iterative optimizer rule that rewrites eligible ROW(..) IN (ROW(..),..) predicates into OR-of-ANDs of equality comparisons over partition key variables.
  • Define a Rule pattern matching FilterNodes directly over TableScanNodes and guard execution behind the rewrite_row_constructor_in_to_disjunction session property.
  • Resolve partition key variables by reading connector table metadata partitioned_by property and mapping it through the table scan assignments to VariableReferenceExpressions.
  • Implement recursive predicate rewriting that targets IN special forms whose left operand is a ROW_CONSTRUCTOR of partition-key variables and whose RHS arguments are matching-arity ROW_CONSTRUCTORs, generating comparison CallExpressions joined by AND then OR (using LogicalRowExpressions.and/or).
  • Ensure rewrite only affects supported patterns (ROW-based IN over partition keys) and leaves other predicates and non-partitioned tables unchanged, returning Result.empty() when no transformation applies.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/RewriteRowConstructorInToDisjunction.java
Expose and wire a new optimizer toggle rewrite_row_constructor_in_to_disjunction via FeaturesConfig and SystemSessionProperties, and register the rule in the optimizer pipeline before PickTableLayout.
  • Add boolean rewriteRowConstructorInToDisjunction field, getter, and @Config-mapped setter in FeaturesConfig with config key optimizer.rewrite-row-constructor-in-to-disjunction and default false.
  • Introduce SystemSessionProperties.REWRITE_ROW_CONSTRUCTOR_IN_TO_DISJUNCTION constant, register a boolean session property with default from FeaturesConfig, and add a convenience accessor isRewriteRowConstructorInToDisjunction(Session).
  • Insert a new IterativeOptimizer instance into PlanOptimizers configured with the RewriteRowConstructorInToDisjunction rule, positioned before the first PickTableLayout-related optimizer in the list to influence partition pruning.
  • Update TestFeaturesConfig defaults and explicit property mapping tests to cover the new config key and verify its default and non-default values.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestFeaturesConfig.java
Add comprehensive unit tests for RewriteRowConstructorInToDisjunction including TupleDomain behavior and various positive/negative match scenarios via a custom mock connector with partition metadata.
  • Create TestRewriteRowConstructorInToDisjunction with a RuleTester-based setup that registers a PartitionedMockConnectorFactory providing a partitioned table (partitioned_table) with partitioned_by=["pk1","pk2"] and a non-partitioned table.
  • Add testRewriteEnablesPartitionPruningViaTupleDomain to show that RowExpressionDomainTranslator.fromPredicate() yields TupleDomain.ALL before rewrite and per-column domains for pk1 and pk2 after rewrite (asserting expected Domain contents).
  • Add positive rewrite tests: multi-candidate rewrite into OR-of-ANDs (testRewriteRowInWithAllPartitionKeys), single candidate rewrite into AND chain (testSingleCandidateRewrite), and a scenario where ROW IN is embedded inside a top-level AND with another predicate (testRowInEmbeddedInAndPredicate) to ensure localized rewriting.
  • Add negative tests ensuring the rule does not fire when the session property is disabled, when ROW includes non-partition columns, when the table has no partition keys, or when the IN predicate is not ROW-based (simple IN).
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestRewriteRowConstructorInToDisjunction.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The tests in TestRewriteRowConstructorInToDisjunction (e.g., testRewriteEnablesPartitionPruningViaTupleDomain) print a lot of diagnostic information to stdout; consider removing these System.out.println calls or switching to logging to keep the test output clean and focused.
  • In RewriteRowConstructorInToDisjunction.resolvePartitionVariables, the table property key "partitioned_by" is hard-coded as a string constant; consider reusing an existing constant or centralizing this key to avoid divergence from connector/table property definitions.
  • rewritePredicate currently only descends through top-level AND special forms when searching for rewritable ROW IN expressions; if you expect benefits for predicates under OR or other structures, you may want to extend the traversal logic or at least document this limitation.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The tests in TestRewriteRowConstructorInToDisjunction (e.g., testRewriteEnablesPartitionPruningViaTupleDomain) print a lot of diagnostic information to stdout; consider removing these System.out.println calls or switching to logging to keep the test output clean and focused.
- In RewriteRowConstructorInToDisjunction.resolvePartitionVariables, the table property key "partitioned_by" is hard-coded as a string constant; consider reusing an existing constant or centralizing this key to avoid divergence from connector/table property definitions.
- rewritePredicate currently only descends through top-level AND special forms when searching for rewritable ROW IN expressions; if you expect benefits for predicates under OR or other structures, you may want to extend the traversal logic or at least document this limitation.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@kaikalur kaikalur force-pushed the rewrite-row-constructor-in-to-disjunction branch from 5c10bd0 to 37fe405 Compare April 3, 2026 14:15
@kaikalur
Copy link
Copy Markdown
Contributor Author

kaikalur commented Apr 3, 2026

@feilong-liu Friendly ping for review when you get a chance. Updated the rule to fire when any ROW field is a partition key (not just all). All CI checks green except an unrelated flaky Hudi test. Thanks!

}
}

if (specialForm.getForm() == SpecialFormExpression.Form.AND) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use existing extractConjunct utils?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — updated to use extractConjuncts() from LogicalRowExpressions. This properly handles nested ANDs instead of only walking one level.

…ion pruning

Add a new iterative optimizer rule RewriteRowConstructorInToDisjunction
that rewrites predicates of the form:

  ROW(pk1, pk2) IN (ROW('a', 1), ROW('b', 2))

into:

  (pk1 = 'a' AND pk2 = 1) OR (pk1 = 'b' AND pk2 = 2)

This transformation fires only when ALL fields of the left-side ROW
constructor are partition key columns of the underlying table. The
rewrite enables PickTableLayout's RowExpressionDomainTranslator to
extract per-column TupleDomain constraints for partition pruning,
which is impossible when the predicate uses ROW-level IN comparisons.

Without this rewrite, the domain translator sees TupleDomain{ALL}
(no constraints, full table scan). After the rewrite, it extracts
per-column domains like {pk1 -> {'a','b'}, pk2 -> {1,2}}, enabling
Hive partition pruning via HivePartitionManager.

The rule is gated behind a session property
rewrite_row_constructor_in_to_disjunction (default: disabled) and
runs before the first PickTableLayout invocation in PlanOptimizers.
@kaikalur kaikalur force-pushed the rewrite-row-constructor-in-to-disjunction branch from 37fe405 to 00b53a2 Compare April 6, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants