Skip to content

Identify Group by columns #832

Open
Laukkala wants to merge 20 commits intoteragrep:mainfrom
Laukkala:issue_826_add_isGroupByColumn_to_metadata
Open

Identify Group by columns #832
Laukkala wants to merge 20 commits intoteragrep:mainfrom
Laukkala:issue_826_add_isGroupByColumn_to_metadata

Conversation

@Laukkala
Copy link
Copy Markdown
Contributor

resolves #826

Description

This PR adds metadata to outputs of ChartStep, TimeChartStep, StatsStep and EventStatsStep which can be used to identify the columns that were used in a groupBy operation when executing the step.
This information can be accessed by obtaining the Dataset's schema, and then checking for the existence of a key called "dpl_internal_isGroupByColumn".

Metadata is used instead of inspecting the LogicalPlan of the resulting dataset, because Dataset.writeStream().forEachBatch() overrides the LogicalPlan for each batch. Metadata, on the other hand, does not get overwritten.

This functionality is required for zep_01#283, where data formatting needs to get access to column names that were used for grouping data.

Checklists

Testing

General

  • I have checked that my test files and functions have meaningful names.
  • I have checked that each test tests only a single behavior.
  • I have done happy tests.
  • I have tested only my own code.
  • I have tested at least all public methods.

Assertions

  • I have checked that my tests use assertions and not runtime overhead.
  • I have checked that my tests end in assertions.
  • I have checked that there is no comparison statements in assertions.
  • I have checked that assertions are in tests and not in helper functions.
  • I have checked that assertions for iterables are outside of for loops and both sides of the iteration blocks.
  • I have checked that assertions are not tested inside consumers.

Testing Data

  • I have tested algorithms and anything else with the possibility of unbound growth.
  • I have checked that all testing data is local and fully replaceable or reproducible or both.
  • I have checked that all test files are standalone.
  • I have checked that all test-specific fake objects and classes are in the test directory.
  • I have checked that my tests do not contain anything related to customers, infrastructure or users.
  • I have checked that my tests do not contain non-generic information.
  • I have checked that my tests do not do external requests and are not privately or publicly routable.

Statements

  • I have checked that my tests do not use throws for exceptions.
  • I have checked that my tests do not use try-catch statements.
  • I have checked that my tests do not use if-else statements.

Java

  • I have checked that my tests for Java uses JUnit library.
  • I have checked that my tests for Java uses JUnit utilities for parameters.

Other

  • I have only tested public behavior and not private implementation details.
  • I have checked that my tests are not (partially) commented out.
  • I have checked that hand-crafted variables in assertions are used accordingly.
  • I have tested Object Equality.
  • I have checked that I do not have any manual tests or I have a valid reason for them and I have explained it in the PR description.

Code Quality

  • I have checked that my code follows metrics set in Procedure: Class Metrics.
  • I have checked that my code follows metrics set in Procedure: Method Metrics.
  • I have checked that my code follows metrics set in Procedure: Object Quality.
  • I have checked that my code does not have any NULL values.
  • I have checked my code does not contain FIXME or TODO comments.

@Laukkala Laukkala self-assigned this Feb 27, 2026
@Laukkala Laukkala requested a review from eemhu February 27, 2026 09:28
Comment thread src/main/java/com/teragrep/pth_10/steps/timechart/TimechartStep.java Outdated
eemhu
eemhu previously approved these changes Mar 3, 2026
@kortemik kortemik added the review label Mar 6, 2026
@Laukkala Laukkala requested a review from kortemik March 6, 2026 11:56
eemhu
eemhu previously approved these changes Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add isGroupByColumn to Column metadata whenever using groupBy expressions

3 participants