Skip to content

[SPARK-57566][CONNECT][TESTS] Add test coverage for the TIME data type#56797

Closed
vboo123 wants to merge 2 commits into
apache:masterfrom
vboo123:SPARK-57566
Closed

[SPARK-57566][CONNECT][TESTS] Add test coverage for the TIME data type#56797
vboo123 wants to merge 2 commits into
apache:masterfrom
vboo123:SPARK-57566

Conversation

@vboo123

@vboo123 vboo123 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds end-to-end, golden-file, and proto round-trip test coverage for the TIME data type (TimeType) over Spark Connect. This is a test-only change with no engine or logic modifications. SPARK-57566 is a sub-task of the umbrella SPARK-57550 (extend TIME type support across Spark).

ClientE2ETestSuite (Scala client, end-to-end against a real in-process Connect server):

  • make_time builtin returns a TIME value over Connect
  • hour, minute, second extract fields from a TIME value over Connect
  • current_time returns a TIME-typed column over Connect
  • A TIME column round-trips via createDataFrame over Connect
  • A TIME column round-trips through a parquet datasource over Connect

The parquet datasource round-trip test verifies that a TIME column survives a real write-then-read through a file datasource over Connect, not just in-memory. This exercises parquet's TIME logical-type handling end-to-end and was added to satisfy the ticket's "createDataFrame / datasource round-trip" scope.

PlanGenerationTestSuite (golden plan-generation):

  • Added make_time, current_time, and current_time(precision) function tests
  • Added the missing LocalTime case to the function lit array literal test (it already covered every other temporal type)
  • New and updated golden files generated via SPARK_GENERATE_GOLDEN_FILES=1 and manually reviewed

LiteralExpressionProtoConverterSuite (server-side proto/catalyst round-trip):

  • Added TIME literal round-trip coverage, including non-default precision propagation

Python note: No new Python tests were added because Connect Python TIME coverage already exists through the parity suites. The classic PySpark tests test_make_time, test_current_time, and test_create_dataframe_from_datetime_time are re-run over a Connect session via test_parity_functions and test_parity_creation. The Python side is intentionally covered by existing parity, not omitted.

Why are the changes needed?

The TIME data type was added by SPIP SPARK-51162 and shipped in Spark 4.1.0, but Spark Connect previously had only minimal TIME coverage: a single SELECT TIME '12:13:14' literal test in ClientE2ETestSuite, LocalTime literals in PlanGenerationTestSuite, and schema handling in SparkConnectPlannerSuite. The client-to-protobuf-to-server round-trip and the TIME builtin functions over Connect were unverified. These tests close that gap.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This is a test-only change. All scoped suites were run locally and passed:

  • connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite — 727 tests passed (golden files regenerated and manually reviewed)
  • connect/testOnly org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverterSuite — 44 tests passed
  • connect-client-jvm/testOnly org.apache.spark.sql.connect.ClientE2ETestSuite -- -z "SPARK-57566" — 5 tests passed

Was this patch authored or co-authored using generative AI tooling?

Yes. Generated using Kiro (Claude Opus 4.8)

Vaibhav Garg added 2 commits June 26, 2026 00:48
### What changes were proposed in this pull request?

Extends Spark Connect test coverage for the TIME (TimeType) data type, which
previously had only minimal coverage (a single SELECT TIME literal in
ClientE2ETestSuite, LocalTime literals in PlanGenerationTestSuite, and schema
handling in SparkConnectPlannerSuite).

- ClientE2ETestSuite: end-to-end tests over the Connect client for the TIME
  builtins (make_time, hour/minute/second on a TIME value, current_time), plus
  createDataFrame and parquet datasource round-trips of a TIME column.
- PlanGenerationTestSuite: golden plan-generation coverage for make_time,
  current_time and current_time(precision); adds the missing LocalTime case to
  the function lit array literal test. New golden files generated accordingly.
- LiteralExpressionProtoConverterSuite: proto/catalyst round-trip coverage for
  TIME literals, including non-default precision propagation.

No engine or logic changes; this is test-only coverage. Sub-task of SPARK-57550.

### Why are the changes needed?

The TIME type (SPIP SPARK-51162, shipped in 4.1.0) lacked Connect test coverage
matching DATE/TIMESTAMP, leaving the client -> protobuf -> server round-trip and
TIME builtins over Connect unverified.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

New tests only. Ran locally (scoped suites):
- connect-client-jvm/testOnly ...PlanGenerationTestSuite (727 passed; goldens regenerated and reviewed)
- connect/testOnly ...LiteralExpressionProtoConverterSuite (44 passed)
- connect-client-jvm/testOnly ...ClientE2ETestSuite -- -z "SPARK-57566" (5 passed)

Generated using Kiro (Claude Opus 4.8)
… scalafmt

Fix CI failures from the initial commit:
- Add the ProtoToParsedPlanTestSuite explain-results goldens for the new
  make_time, current_time and current_time(precision) plans, and update
  function_lit_array.explain for the added LocalTime element. These were
  missing because only PlanGenerationTestSuite goldens were regenerated.
- Apply scalafmt formatting to the changed connect test sources (the
  sql/connect modules enforce scalafmt in CI).

Generated using Kiro (Claude Opus 4.8)

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uros-b uros-b requested a review from MaxGekk June 26, 2026 06:18

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 0 non-blocking, 0 nits. LGTM — clean, well-constructed test-only PR adding Spark Connect TIME coverage (e2e, golden plans, proto round-trip).

Verification

Checked the assertions are meaningful (real checkAnswer rows; make_time also asserts the TimeType schema; current_time asserts type only — correct for a non-deterministic value; the proto tests assert both precision and value round-trip) and the golden files are faithful to the generating code: make_time -> DateTimeUtils.makeTime(...), current_time(6, Some(<zone>)) / current_time(3, ...), and the lit-array TIME element captured losslessly as TIME '23:59:59.999999999' (86399999999999 nanos), consistent with the proto round-trip confirming a 9-nanos value survives a TimeType() literal. The createDataFrame / parquet round-trips deliberately use an exact-microsecond value so checkAnswer round-trips without precision-loss flakiness, and suite placement is correct for each layer.

@MaxGekk

MaxGekk commented Jun 26, 2026

Copy link
Copy Markdown
Member

+1, LGTM. Merging to master/4.x.
Thank you, @vboo123 and @uros-b for review.

@MaxGekk MaxGekk closed this in 958e7ce Jun 26, 2026
MaxGekk pushed a commit that referenced this pull request Jun 26, 2026
### What changes were proposed in this pull request?

This PR adds end-to-end, golden-file, and proto round-trip test coverage for the TIME data type (`TimeType`) over Spark Connect. This is a test-only change with no engine or logic modifications. SPARK-57566 is a sub-task of the umbrella SPARK-57550 (extend TIME type support across Spark).

**ClientE2ETestSuite** (Scala client, end-to-end against a real in-process Connect server):
- `make_time` builtin returns a TIME value over Connect
- `hour`, `minute`, `second` extract fields from a TIME value over Connect
- `current_time` returns a TIME-typed column over Connect
- A TIME column round-trips via `createDataFrame` over Connect
- A TIME column round-trips through a parquet datasource over Connect

The parquet datasource round-trip test verifies that a TIME column survives a real write-then-read through a file datasource over Connect, not just in-memory. This exercises parquet's TIME logical-type handling end-to-end and was added to satisfy the ticket's "createDataFrame / datasource round-trip" scope.

**PlanGenerationTestSuite** (golden plan-generation):
- Added `make_time`, `current_time`, and `current_time(precision)` function tests
- Added the missing `LocalTime` case to the `function lit array` literal test (it already covered every other temporal type)
- New and updated golden files generated via `SPARK_GENERATE_GOLDEN_FILES=1` and manually reviewed

**LiteralExpressionProtoConverterSuite** (server-side proto/catalyst round-trip):
- Added TIME literal round-trip coverage, including non-default precision propagation

**Python note:** No new Python tests were added because Connect Python TIME coverage already exists through the parity suites. The classic PySpark tests `test_make_time`, `test_current_time`, and `test_create_dataframe_from_datetime_time` are re-run over a Connect session via `test_parity_functions` and `test_parity_creation`. The Python side is intentionally covered by existing parity, not omitted.

### Why are the changes needed?

The TIME data type was added by SPIP SPARK-51162 and shipped in Spark 4.1.0, but Spark Connect previously had only minimal TIME coverage: a single `SELECT TIME '12:13:14'` literal test in `ClientE2ETestSuite`, `LocalTime` literals in `PlanGenerationTestSuite`, and schema handling in `SparkConnectPlannerSuite`. The client-to-protobuf-to-server round-trip and the TIME builtin functions over Connect were unverified. These tests close that gap.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This is a test-only change. All scoped suites were run locally and passed:

- `connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite` — 727 tests passed (golden files regenerated and manually reviewed)
- `connect/testOnly org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverterSuite` — 44 tests passed
- `connect-client-jvm/testOnly org.apache.spark.sql.connect.ClientE2ETestSuite -- -z "SPARK-57566"` — 5 tests passed

### Was this patch authored or co-authored using generative AI tooling?

Yes. Generated using Kiro (Claude Opus 4.8)

Closes #56797 from vboo123/SPARK-57566.

Lead-authored-by: Vaibhav Garg <89175545+vboo123@users.noreply.github.com>
Co-authored-by: Vaibhav Garg <gargvboo@amazon.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 958e7ce)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants