Skip to content

fix(athena): support partition transform expressions with mode=overwrite_partitions (#2845)#3312

Draft
MukundaKatta wants to merge 2 commits intoaws:mainfrom
MukundaKatta:fix/overwrite-partitions-special-chars
Draft

fix(athena): support partition transform expressions with mode=overwrite_partitions (#2845)#3312
MukundaKatta wants to merge 2 commits intoaws:mainfrom
MukundaKatta:fix/overwrite-partitions-special-chars

Conversation

@MukundaKatta
Copy link
Copy Markdown

Summary

Fixes #2845. athena.to_iceberg(mode="overwrite_partitions") previously failed when partition_cols contained partition transform expressions like day(ts), hour(ts), or truncate(10, col_name).

Root cause

In awswrangler/athena/_write_iceberg.py, the mode == "overwrite_partitions" branch reused delete_from_iceberg_table and passed partition_cols as merge_cols. That function did df[merge_cols] (KeyErrors on transform expressions) and emitted target."day(ts)" = source."day(ts)" (Athena rejects — no literal column with that name).

Fix

Added a dedicated _delete_partitions_from_iceberg helper plus three composable utilities (_apply_partition_transform_to_side, _build_partition_merge_conditions, _build_partition_delete_sql). The helper extracts underlying raw column names, writes only those to staging, and builds the MERGE ON clause by re-applying each transform on both sides — day(target."ts") = day(source."ts"). Raises InvalidArgumentValue with a clear message when the underlying column is missing from the DataFrame.

Test plan

  • 17/17 partition-related unit tests pass (10 new + 7 existing).
  • New tests cover plain cols, day, hour, truncate, bucket, mixed multi-transform partition lists, and the missing-underlying-column guard.
  • Integration tests not run (require AWS).

…ite_partitions

athena.to_iceberg(mode="overwrite_partitions") previously failed when
partition_cols contained Iceberg partition transform expressions such as
day(ts), hour(ts), or truncate(10, col_name). The implementation reused
delete_from_iceberg_table with partition_cols passed as merge_cols, which
both projects df[merge_cols] (KeyError on transform expressions) and emits
target."day(ts)" = source."day(ts)" — invalid SQL, since no literal column
with that name exists.

Route the overwrite_partitions branch through a dedicated helper
_delete_partitions_from_iceberg that:
- extracts the underlying raw column names from partition expressions and
  writes only those to the staging table;
- builds the MERGE ON clause by re-applying each partition transform on
  both target and source sides, e.g. day(target."ts") = day(source."ts");
- raises InvalidArgumentValue with a clear message when the underlying
  column referenced by a transform is missing from the DataFrame.

Adds 10 unit tests covering the new helpers and the fix path (plain cols,
day, hour, truncate, bucket, multi-transform combinations, and the
missing-underlying-column guard).

Fixes aws#2845.
@kukushking
Copy link
Copy Markdown
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: c60b86b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking
Copy link
Copy Markdown
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: dd30656
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking
Copy link
Copy Markdown
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: c60b86b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking
Copy link
Copy Markdown
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: dd30656
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

athena.to_parquet fails when mode=overwrite_partitions and partition_cols contains something like hour(timestamp_col).

2 participants