Skip to content

feat(update): add OverwriteFiles for overwrite snapshot commits#741

Open
lishuxu wants to merge 1 commit into
apache:mainfrom
lishuxu:feature/overwrite-files
Open

feat(update): add OverwriteFiles for overwrite snapshot commits#741
lishuxu wants to merge 1 commit into
apache:mainfrom
lishuxu:feature/overwrite-files

Conversation

@lishuxu

@lishuxu lishuxu commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary:
Add a production OverwriteFiles builder that brings iceberg-cpp to semantic parity with Java's BaseOverwriteFiles. It supports explicit file replacement (DeleteFile + AddFile) and range-based replacement (OverwriteByRowFilter + AddFile) with the same family of pre-commit concurrency validations. The builder is a thin subclass of MergingSnapshotUpdate and reuses the existing commit kernel (Apply/summary/retry/cleanup) unchanged.

Changes:

  • New OverwriteFiles class (src/iceberg/update/overwrite_files.{h,cc}) and Table::NewOverwrite() / Transaction::NewOverwrite() entry points.
  • Builder surface: AddFile, DeleteFile, bulk DeleteFiles, OverwriteByRowFilter, ValidateFromSnapshot, ConflictDetectionFilter, ValidateNoConflictingData, ValidateNoConflictingDeletes, ValidateAddedFilesMatchOverwriteFilter, WithCaseSensitivity.
  • Validate(): conflict-filter resolution, concurrent add/delete conflict checks, and strict added-file range validation (projection + StrictMetricsEvaluator).
  • Tests (overwrite_files_test.cc, 45 cases) and CMake/meson wiring.

Behavior alignment with Java:

  • operation() returns append/delete/overwrite from builder content.
  • Conflict-filter resolution mirrors BaseOverwriteFiles (explicit -> row filter -> AlwaysTrue); replaced-file delete checks honor ConflictDetectionFilter.
  • Strict added-file validation uses a single DataSpec(), rejecting multi-spec and empty added-file sets.
  • Deviations: public WithCaseSensitivity (vs caseSensitive) to avoid a protected-name clash; ValidateFromSnapshot rejects negative ids early.

Summary:
Add a production OverwriteFiles builder that brings iceberg-cpp to semantic
parity with Java's BaseOverwriteFiles. It supports explicit file replacement
(DeleteFile + AddFile) and range-based replacement (OverwriteByRowFilter +
AddFile) with the same family of pre-commit concurrency validations. The
builder is a thin subclass of MergingSnapshotUpdate and reuses the existing
commit kernel (Apply/summary/retry/cleanup) unchanged.

Changes:
- New OverwriteFiles class (src/iceberg/update/overwrite_files.{h,cc}) and
  Table::NewOverwrite() / Transaction::NewOverwrite() entry points.
- Builder surface: AddFile, DeleteFile, bulk DeleteFiles, OverwriteByRowFilter,
  ValidateFromSnapshot, ConflictDetectionFilter, ValidateNoConflictingData,
  ValidateNoConflictingDeletes, ValidateAddedFilesMatchOverwriteFilter,
  WithCaseSensitivity.
- Validate(): conflict-filter resolution, concurrent add/delete conflict checks,
  and strict added-file range validation (projection + StrictMetricsEvaluator).
- Tests (overwrite_files_test.cc, 45 cases) and CMake/meson wiring.

Behavior alignment with Java:
- operation() returns append/delete/overwrite from builder content.
- Conflict-filter resolution mirrors BaseOverwriteFiles (explicit -> row filter
  -> AlwaysTrue); replaced-file delete checks honor ConflictDetectionFilter.
- Strict added-file validation uses a single DataSpec(), rejecting multi-spec
  and empty added-file sets.
- Deviations: public WithCaseSensitivity (vs caseSensitive) to avoid a
  protected-name clash; ValidateFromSnapshot rejects negative ids early.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant