Skip to content

Add support for Trino query ID in commit metadata application ID#442

Open
srawat98-dev wants to merge 3 commits intolinkedin:mainfrom
srawat98-dev:srawat/AddTrinoQueryIdInCommitMetadataAppId
Open

Add support for Trino query ID in commit metadata application ID#442
srawat98-dev wants to merge 3 commits intolinkedin:mainfrom
srawat98-dev:srawat/AddTrinoQueryIdInCommitMetadataAppId

Conversation

@srawat98-dev
Copy link
Contributor

@srawat98-dev srawat98-dev commented Feb 1, 2026

Summary

Add support for Trino query IDs in commit metadata collection to ensure proper tracking of commits made via Trino queries, in addition to existing Spark application tracking.

Previously, the commitAppId field only captured Spark application IDs from spark.app.id in the commit summary, and commitAppName only captured spark.app.name. Tables updated via Trino queries store their query IDs under trino_query_id instead, resulting in null values for both fields in Trino-based commits. This PR adds fallback logic to capture Trino query IDs in commitAppId and sets commitAppName to "trino" for Trino-based commits, enabling complete tracking regardless of execution engine.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

@srawat98-dev srawat98-dev marked this pull request as ready for review February 1, 2026 08:45
Add proper integration tests using Iceberg Table API to validate the
coalesce logic for commitAppId and commitAppName in TableStatsCollectorUtil.
Tests cover all four scenarios requested in PR review:
1. Both spark.app.id and trino_query_id null
2. Only trino_query_id present (Trino commit)
3. Only spark.app.id present (Spark commit)
4. Both present (Spark takes precedence)

The tests use table.newAppend().set() to control snapshot summary
properties, bypassing Spark SQL's automatic spark.app.id injection.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants