fix(athena): support partition transform expressions with mode=overwrite_partitions (#2845)#3312
Draft
MukundaKatta wants to merge 2 commits intoaws:mainfrom
Draft
Conversation
…ite_partitions athena.to_iceberg(mode="overwrite_partitions") previously failed when partition_cols contained Iceberg partition transform expressions such as day(ts), hour(ts), or truncate(10, col_name). The implementation reused delete_from_iceberg_table with partition_cols passed as merge_cols, which both projects df[merge_cols] (KeyError on transform expressions) and emits target."day(ts)" = source."day(ts)" — invalid SQL, since no literal column with that name exists. Route the overwrite_partitions branch through a dedicated helper _delete_partitions_from_iceberg that: - extracts the underlying raw column names from partition expressions and writes only those to the staging table; - builds the MERGE ON clause by re-applying each partition transform on both target and source sides, e.g. day(target."ts") = day(source."ts"); - raises InvalidArgumentValue with a clear message when the underlying column referenced by a transform is missing from the DataFrame. Adds 10 unit tests covering the new helpers and the fix path (plain cols, day, hour, truncate, bucket, multi-transform combinations, and the missing-underlying-column guard). Fixes aws#2845.
Contributor
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Contributor
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Contributor
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Contributor
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2845.
athena.to_iceberg(mode="overwrite_partitions")previously failed whenpartition_colscontained partition transform expressions likeday(ts),hour(ts), ortruncate(10, col_name).Root cause
In
awswrangler/athena/_write_iceberg.py, themode == "overwrite_partitions"branch reuseddelete_from_iceberg_tableand passedpartition_colsasmerge_cols. That function diddf[merge_cols](KeyErrors on transform expressions) and emittedtarget."day(ts)" = source."day(ts)"(Athena rejects — no literal column with that name).Fix
Added a dedicated
_delete_partitions_from_iceberghelper plus three composable utilities (_apply_partition_transform_to_side,_build_partition_merge_conditions,_build_partition_delete_sql). The helper extracts underlying raw column names, writes only those to staging, and builds the MERGE ON clause by re-applying each transform on both sides —day(target."ts") = day(source."ts"). RaisesInvalidArgumentValuewith a clear message when the underlying column is missing from the DataFrame.Test plan
day,hour,truncate,bucket, mixed multi-transform partition lists, and the missing-underlying-column guard.