Add support for Trino query ID in commit metadata application ID#442
Open
srawat98-dev wants to merge 3 commits intolinkedin:mainfrom
Open
Add support for Trino query ID in commit metadata application ID#442srawat98-dev wants to merge 3 commits intolinkedin:mainfrom
srawat98-dev wants to merge 3 commits intolinkedin:mainfrom
Conversation
cbb330
reviewed
Feb 2, 2026
apps/spark/src/main/java/com/linkedin/openhouse/jobs/util/TableStatsCollectorUtil.java
Show resolved
Hide resolved
Add proper integration tests using Iceberg Table API to validate the coalesce logic for commitAppId and commitAppName in TableStatsCollectorUtil. Tests cover all four scenarios requested in PR review: 1. Both spark.app.id and trino_query_id null 2. Only trino_query_id present (Trino commit) 3. Only spark.app.id present (Spark commit) 4. Both present (Spark takes precedence) The tests use table.newAppend().set() to control snapshot summary properties, bypassing Spark SQL's automatic spark.app.id injection. Co-Authored-By: Claude Opus 4.5 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add support for Trino query IDs in commit metadata collection to ensure proper tracking of commits made via Trino queries, in addition to existing Spark application tracking.
Previously, the
commitAppIdfield only captured Spark application IDs fromspark.app.idin the commit summary, andcommitAppNameonly capturedspark.app.name. Tables updated via Trino queries store their query IDs undertrino_query_idinstead, resulting in null values for both fields in Trino-based commits. This PR adds fallback logic to capture Trino query IDs incommitAppIdand setscommitAppNameto "trino" for Trino-based commits, enabling complete tracking regardless of execution engine.Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.