feat(optimizer): Rewrite ROW constructor IN to disjunction for partition pruning#27500
feat(optimizer): Rewrite ROW constructor IN to disjunction for partition pruning#27500kaikalur wants to merge 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideAdds a new iterative optimizer rule RewriteRowConstructorInToDisjunction that, when enabled via a new session/feature flag, rewrites ROW-based IN predicates over partition key columns on table scans into disjunctions of per-column equality predicates to enable TupleDomain extraction and partition pruning, wires it into the optimizer pipeline before PickTableLayout, exposes configuration via FeaturesConfig/SystemSessionProperties, and provides comprehensive rule tests including a custom mock connector for partition metadata and a TupleDomain behavior test. Sequence diagram for RewriteRowConstructorInToDisjunction optimization and partition pruningsequenceDiagram
participant Planner
participant IterativeOptimizer_RewriteRowConstructorInToDisjunction
participant RewriteRowConstructorInToDisjunction
participant SystemSessionProperties
participant Metadata
participant RowExpressionDomainTranslator
participant HivePartitionManager
Planner->>IterativeOptimizer_RewriteRowConstructorInToDisjunction: optimize(plan)
IterativeOptimizer_RewriteRowConstructorInToDisjunction->>RewriteRowConstructorInToDisjunction: apply(FilterNode, Captures, Context)
RewriteRowConstructorInToDisjunction->>SystemSessionProperties: isRewriteRowConstructorInToDisjunction(Session)
SystemSessionProperties-->>RewriteRowConstructorInToDisjunction: boolean enabled
alt rule_disabled
RewriteRowConstructorInToDisjunction-->>IterativeOptimizer_RewriteRowConstructorInToDisjunction: Result.empty()
else rule_enabled
RewriteRowConstructorInToDisjunction->>Metadata: getTableMetadata(Session, TableHandle)
Metadata-->>RewriteRowConstructorInToDisjunction: ConnectorTableMetadata
RewriteRowConstructorInToDisjunction->>Metadata: getColumnHandles(Session, TableHandle)
Metadata-->>RewriteRowConstructorInToDisjunction: Map columnHandles
RewriteRowConstructorInToDisjunction-->>IterativeOptimizer_RewriteRowConstructorInToDisjunction: Result.ofPlanNode(new FilterNode with rewritten predicate)
end
IterativeOptimizer_RewriteRowConstructorInToDisjunction-->>Planner: optimized plan
Planner->>RowExpressionDomainTranslator: fromPredicate(rewrittenPredicate)
RowExpressionDomainTranslator-->>Planner: TupleDomain with per_column_domains
Planner->>HivePartitionManager: getPartitions(Session, TableHandle, TupleDomain)
HivePartitionManager-->>Planner: pruned_partitions
Class diagram for RewriteRowConstructorInToDisjunction and related configurationclassDiagram
class FeaturesConfig {
- boolean pullUpExpressionFromLambda
- boolean rewriteConstantArrayContainsToIn
- boolean rewriteExpressionWithConstantVariable
- boolean rewriteRowConstructorInToDisjunction
+ boolean isRewriteRowConstructorInToDisjunction()
+ FeaturesConfig setRewriteRowConstructorInToDisjunction(boolean rewriteRowConstructorInToDisjunction)
}
class SystemSessionProperties {
+ static String REWRITE_ROW_CONSTRUCTOR_IN_TO_DISJUNCTION
+ static boolean isRewriteRowConstructorInToDisjunction(Session session)
- List sessionProperties
}
class RewriteRowConstructorInToDisjunction {
- Metadata metadata
- FunctionResolution functionResolution
+ RewriteRowConstructorInToDisjunction(Metadata metadata)
+ Pattern getPattern()
+ Result apply(FilterNode filterNode, Captures captures, Context context)
- Set resolvePartitionVariables(Session session, TableScanNode tableScan)
- RowExpression rewritePredicate(RowExpression predicate, Set partitionVars)
- RowExpression tryRewriteRowIn(SpecialFormExpression inExpr, Set partitionVars)
}
class PlanOptimizers {
+ PlanOptimizers(Metadata metadata, RuleStats ruleStats, StatsCalculator statsCalculator, ExchangesCostCalculator estimatedExchangesCostCalculator)
- List planOptimizers
}
class FilterNode {
+ RowExpression getPredicate()
+ PlanNode getSource()
}
class TableScanNode {
+ TableHandle getTable()
+ Map getAssignments()
}
class Metadata {
+ TableMetadata getTableMetadata(Session session, TableHandle tableHandle)
+ Map getColumnHandles(Session session, TableHandle tableHandle)
+ FunctionAndTypeManager getFunctionAndTypeManager()
}
class FunctionResolution {
+ FunctionResolution(FunctionAndTypeResolver resolver)
+ FunctionHandle comparisonFunction(OperatorType operatorType, Type leftType, Type rightType)
}
class SpecialFormExpression {
+ Form getForm()
+ List getArguments()
}
class RowExpression {
}
class VariableReferenceExpression {
}
class Session {
}
class TableHandle {
}
class ColumnHandle {
}
class ConnectorTableMetadata {
+ Map getProperties()
}
class HivePartitionManager {
+ getPartitions(Session session, TableHandle tableHandle, TupleDomain tupleDomain)
}
FeaturesConfig ..> SystemSessionProperties : provides_defaults_for
SystemSessionProperties ..> FeaturesConfig : reads_from
PlanOptimizers ..> RewriteRowConstructorInToDisjunction : registers_rule
PlanOptimizers ..> SystemSessionProperties : exposes_session_properties
RewriteRowConstructorInToDisjunction --> Metadata : uses
RewriteRowConstructorInToDisjunction ..> FilterNode : matches_and_rewrites
RewriteRowConstructorInToDisjunction ..> TableScanNode : requires_partition_info
RewriteRowConstructorInToDisjunction ..> SpecialFormExpression : inspects_IN_and_ROW
RewriteRowConstructorInToDisjunction ..> RowExpression : rewrites_predicates
RewriteRowConstructorInToDisjunction ..> VariableReferenceExpression : tracks_partition_vars
RewriteRowConstructorInToDisjunction ..> Session : reads_session_property
Metadata ..> ConnectorTableMetadata : returns
Metadata ..> ColumnHandle : returns
TableScanNode ..> TableHandle : references
TableScanNode ..> ColumnHandle : assignments
ConnectorTableMetadata ..> ColumnHandle : describes
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The tests in TestRewriteRowConstructorInToDisjunction (e.g., testRewriteEnablesPartitionPruningViaTupleDomain) print a lot of diagnostic information to stdout; consider removing these System.out.println calls or switching to logging to keep the test output clean and focused.
- In RewriteRowConstructorInToDisjunction.resolvePartitionVariables, the table property key "partitioned_by" is hard-coded as a string constant; consider reusing an existing constant or centralizing this key to avoid divergence from connector/table property definitions.
- rewritePredicate currently only descends through top-level AND special forms when searching for rewritable ROW IN expressions; if you expect benefits for predicates under OR or other structures, you may want to extend the traversal logic or at least document this limitation.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The tests in TestRewriteRowConstructorInToDisjunction (e.g., testRewriteEnablesPartitionPruningViaTupleDomain) print a lot of diagnostic information to stdout; consider removing these System.out.println calls or switching to logging to keep the test output clean and focused.
- In RewriteRowConstructorInToDisjunction.resolvePartitionVariables, the table property key "partitioned_by" is hard-coded as a string constant; consider reusing an existing constant or centralizing this key to avoid divergence from connector/table property definitions.
- rewritePredicate currently only descends through top-level AND special forms when searching for rewritable ROW IN expressions; if you expect benefits for predicates under OR or other structures, you may want to extend the traversal logic or at least document this limitation.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
5c10bd0 to
37fe405
Compare
|
@feilong-liu Friendly ping for review when you get a chance. Updated the rule to fire when any ROW field is a partition key (not just all). All CI checks green except an unrelated flaky Hudi test. Thanks! |
| } | ||
| } | ||
|
|
||
| if (specialForm.getForm() == SpecialFormExpression.Form.AND) { |
There was a problem hiding this comment.
Why not use existing extractConjunct utils?
There was a problem hiding this comment.
Good call — updated to use extractConjuncts() from LogicalRowExpressions. This properly handles nested ANDs instead of only walking one level.
…ion pruning
Add a new iterative optimizer rule RewriteRowConstructorInToDisjunction
that rewrites predicates of the form:
ROW(pk1, pk2) IN (ROW('a', 1), ROW('b', 2))
into:
(pk1 = 'a' AND pk2 = 1) OR (pk1 = 'b' AND pk2 = 2)
This transformation fires only when ALL fields of the left-side ROW
constructor are partition key columns of the underlying table. The
rewrite enables PickTableLayout's RowExpressionDomainTranslator to
extract per-column TupleDomain constraints for partition pruning,
which is impossible when the predicate uses ROW-level IN comparisons.
Without this rewrite, the domain translator sees TupleDomain{ALL}
(no constraints, full table scan). After the rewrite, it extracts
per-column domains like {pk1 -> {'a','b'}, pk2 -> {1,2}}, enabling
Hive partition pruning via HivePartitionManager.
The rule is gated behind a session property
rewrite_row_constructor_in_to_disjunction (default: disabled) and
runs before the first PickTableLayout invocation in PlanOptimizers.
37fe405 to
00b53a2
Compare
Summary
Add a new iterative optimizer rule
RewriteRowConstructorInToDisjunctionthat rewrites predicates of the form:into:
Motivation
The
RowExpressionDomainTranslator(used byPickTableLayout) cannot extract per-columnTupleDomainconstraints from ROW-level IN predicates. This means Hive partition pruning is impossible — the domain translator seesTupleDomain{ALL}(full table scan).After this rewrite, the domain translator extracts proper per-column domains like
{pk1 -> {a,b}, pk2 -> {1,2}}, enablingHivePartitionManagerto prune partitions.Design
RewriteRowConstructorInToDisjunctionmatchesFilterNode -> TableScanNoderewrite_row_constructor_in_to_disjunction(default: disabled)PickTableLayoutinPlanOptimizersTest Plan
8 unit tests. All pass: TestRewriteRowConstructorInToDisjunction, TestFeaturesConfig