Feature Summary
When a region re-executes, RegionExecutionCoordinator recreates each operator's iceberg output and state documents. An operator whose output accumulates across runs — e.g. a loop body that re-executes once per iteration — needs its existing storage preserved instead of clobbered on every region invocation. There is currently no way for an operator to opt into reusing its output storage across re-executions.
Proposed Solution or Design
- Add a
reusesOutputStorageOnReExecution: Boolean = false field to PhysicalOp plus a withReusesOutputStorageOnReExecution builder.
- Add a pure
provisionOutputDocument(uri, reuseExistingStorage, documentExists, createDocument) decision function in RegionExecutionCoordinator: when the owning operator sets the flag and the document already exists, reuse it; otherwise (re)create. Unit-test the reuse×exists matrix.
- Name the flag for the behavior the scheduler checks, not the operator that sets it, so any future operator needing the same treatment can opt in.
Default false keeps every existing operator recreating its storage as before (no behavior change). The for-loop feature sets the flag on Loop End.
Affected Area
- Workflow Engine (Amber)
- Storage / Metadata
Feature Summary
When a region re-executes,
RegionExecutionCoordinatorrecreates each operator's iceberg output and state documents. An operator whose output accumulates across runs — e.g. a loop body that re-executes once per iteration — needs its existing storage preserved instead of clobbered on every region invocation. There is currently no way for an operator to opt into reusing its output storage across re-executions.Proposed Solution or Design
reusesOutputStorageOnReExecution: Boolean = falsefield toPhysicalOpplus awithReusesOutputStorageOnReExecutionbuilder.provisionOutputDocument(uri, reuseExistingStorage, documentExists, createDocument)decision function inRegionExecutionCoordinator: when the owning operator sets the flag and the document already exists, reuse it; otherwise (re)create. Unit-test the reuse×exists matrix.Default
falsekeeps every existing operator recreating its storage as before (no behavior change). The for-loop feature sets the flag on Loop End.Affected Area