Identify Group by columns by Laukkala · Pull Request #832 · teragrep/pth_10

Laukkala · 2026-02-27T09:28:03Z

resolves #826

Description

This PR adds metadata to outputs of ChartStep, TimeChartStep, StatsStep and EventStatsStep which can be used to identify the columns that were used in a groupBy operation when executing the step.
This information can be accessed by obtaining the Dataset's schema, and then checking for the existence of a key called "dpl_internal_isGroupByColumn".

Metadata is used instead of inspecting the LogicalPlan of the resulting dataset, because Dataset.writeStream().forEachBatch() overrides the LogicalPlan for each batch. Metadata, on the other hand, does not get overwritten.

This functionality is required for zep_01#283, where data formatting needs to get access to column names that were used for grouping data.

Checklists

Testing

General

I have checked that my test files and functions have meaningful names.
I have checked that each test tests only a single behavior.
I have done happy tests.
I have tested only my own code.
I have tested at least all public methods.

Assertions

I have checked that my tests use assertions and not runtime overhead.
I have checked that my tests end in assertions.
I have checked that there is no comparison statements in assertions.
I have checked that assertions are in tests and not in helper functions.
I have checked that assertions for iterables are outside of for loops and both sides of the iteration blocks.
I have checked that assertions are not tested inside consumers.

Testing Data

I have tested algorithms and anything else with the possibility of unbound growth.
I have checked that all testing data is local and fully replaceable or reproducible or both.
I have checked that all test files are standalone.
I have checked that all test-specific fake objects and classes are in the test directory.
I have checked that my tests do not contain anything related to customers, infrastructure or users.
I have checked that my tests do not contain non-generic information.
I have checked that my tests do not do external requests and are not privately or publicly routable.

Statements

I have checked that my tests do not use throws for exceptions.
I have checked that my tests do not use try-catch statements.
I have checked that my tests do not use if-else statements.

Java

I have checked that my tests for Java uses JUnit library.
I have checked that my tests for Java uses JUnit utilities for parameters.

Other

I have only tested public behavior and not private implementation details.
I have checked that my tests are not (partially) commented out.
I have checked that hand-crafted variables in assertions are used accordingly.
I have tested Object Equality.
I have checked that I do not have any manual tests or I have a valid reason for them and I have explained it in the PR description.

Code Quality

I have checked that my code follows metrics set in Procedure: Class Metrics.
I have checked that my code follows metrics set in Procedure: Method Metrics.
I have checked that my code follows metrics set in Procedure: Object Quality.
I have checked that my code does not have any NULL values.
I have checked my code does not contain FIXME or TODO comments.

Fixed Datasets using raw parameters instead of Rows

…ad of whole StructField object

… queries

… statsTransformationStreamingTest

…ist from TimeChartStep

…tadata

Laukkala added 17 commits February 24, 2026 13:56

Added metadata to columns in ChartStep

1505f8c

Added metadata to columns in EventStatsStep

5ffc7ea

Added metadata to columns in StatsStep

acd91c0

Added metadata to columns in TimeChartStep

fba0824

spotless

2695c0d

Fixed metadata being applied to a renamed column, causing errors.

daa349e

Fixed Datasets using raw parameters instead of Rows

spotless

e597eb7

Fixed TimechartStep's schema().contains() to compare fieldnames inste…

59f75cf

…ad of whole StructField object

Fixed metadata from batchDF being dropped during joining in timechart…

3297e6a

… queries

Added metadata to PredictTransformationTest expected schema

a0dfd0e

spotless

e0eb1a1

Fixed Dataset.na() clearing metadata from columns.

9d842d2

Fixed eventStats clearing metadata from columns.

bd0ca5d

Formatting changes, comments

aa1cda7

Added schema checks that includes verification of the new metadata to…

eeea0c3

… statsTransformationStreamingTest

Cleaned up some imports

6b04e9c

Made variables final and applied spotless

3ddd6ee

Laukkala self-assigned this Feb 27, 2026

Laukkala requested a review from eemhu February 27, 2026 09:28

eemhu reviewed Mar 3, 2026

View reviewed changes

Comment thread src/main/java/com/teragrep/pth_10/steps/timechart/TimechartStep.java Outdated

eemhu previously approved these changes Mar 3, 2026

View reviewed changes

kortemik added the review label Mar 6, 2026

Laukkala requested a review from kortemik March 6, 2026 11:56

Removed unnecessary check for "_time" value which is guaranteed to ex…

125f0fd

…ist from TimeChartStep

Laukkala dismissed eemhu’s stale review via 125f0fd March 6, 2026 11:58

eemhu previously approved these changes Mar 6, 2026

View reviewed changes

Laukkala added 2 commits April 16, 2026 12:20

Merge branch 'teragrep:main' into issue_826_add_isGroupByColumn_to_me…

6e9984b

…tadata

Run spotless

41e003f

Laukkala dismissed eemhu’s stale review via 41e003f April 16, 2026 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify Group by columns #832

Identify Group by columns #832
Laukkala wants to merge 20 commits intoteragrep:mainfrom
Laukkala:issue_826_add_isGroupByColumn_to_metadata

Laukkala commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Laukkala commented Feb 27, 2026

Description

Checklists

Testing

General

Assertions

Testing Data

Statements

Java

Other

Code Quality

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants