Skip to content

Comments

Timx 528 merge append deltas#163

Merged
jonavellecuerdo merged 2 commits intoepic-TIMX-515from
TIMX-528-merge-append-deltas
Aug 15, 2025
Merged

Timx 528 merge append deltas#163
jonavellecuerdo merged 2 commits intoepic-TIMX-515from
TIMX-528-merge-append-deltas

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Aug 14, 2025

Purpose and background context

TDA requires a method to support regular merging of append deltas into the static metadata.duckdb database file as new rows and then deletes (the append deltas) once merged. This work is encapsulated in the second commit. The updates in the first commit was a prerequisite for the merging method.

How can a reviewer manually see the effects of these changes?

For review purposes, the added unit test provides a detailed overview of the method's functionality.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review August 14, 2025 15:57
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks excellent! Nice work here. And thanks for the discussions leading up to the PR, I think they were helpful for scoping this work and even catching some other rough edges.

My request is for a bit more testing. While I think a pure count check is good, this merging is pretty important. I think we might benefit from a test that unambiguously confirms that records X,Y,Z from the append deltas are now in the stastic DB after the merge.

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Nice work all around.

Why these changes are being introduced:
* The method for merging append deltas into the static metadata database file
needs the filenames of append deltas to easily identify which files
to delete (once merged). Prior to this change, the append deltas view
only had a 'filename' column, which referred to the path or S3 URI for
the TIMDEXDataset parquet file.

How this addresses that need:
* Set filename='append_delta_filename' when creating metadata.append_deltas view
* Explicitly select metadata column names when creating metadata.records view

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-528
Why these changes are being introduced:
* TDA requires a method to support regular merging of append
deltas into the static metadata.duckdb database file as new rows
and then deletes (the append deltas) once merged.

How this addresses that need:
* Add 'merge_append_deltas' method
* Add unit test

Side effects of this change:
* It's worth noting that this method, when run, will delete the append
delta parquet files that existed in the directory at the time of
execution.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-528
@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-528-merge-append-deltas branch from 90acfa7 to 3a30eaa Compare August 15, 2025 17:44
@coveralls
Copy link

Pull Request Test Coverage Report for Build 16995837471

Details

  • 25 of 30 (83.33%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.6%) to 93.071%

Changes Missing Coverage Covered Lines Changed/Added Lines %
timdex_dataset_api/metadata.py 20 25 80.0%
Totals Coverage Status
Change from base Build 16976537842: -0.6%
Covered Lines: 497
Relevant Lines: 534

💛 - Coveralls

@jonavellecuerdo jonavellecuerdo merged commit 1fba058 into epic-TIMX-515 Aug 15, 2025
2 checks passed
@jonavellecuerdo jonavellecuerdo deleted the TIMX-528-merge-append-deltas branch August 15, 2025 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants