[SPARK-57566][CONNECT][TESTS] Add test coverage for the TIME data type#56797
[SPARK-57566][CONNECT][TESTS] Add test coverage for the TIME data type#56797vboo123 wants to merge 2 commits into
Conversation
### What changes were proposed in this pull request? Extends Spark Connect test coverage for the TIME (TimeType) data type, which previously had only minimal coverage (a single SELECT TIME literal in ClientE2ETestSuite, LocalTime literals in PlanGenerationTestSuite, and schema handling in SparkConnectPlannerSuite). - ClientE2ETestSuite: end-to-end tests over the Connect client for the TIME builtins (make_time, hour/minute/second on a TIME value, current_time), plus createDataFrame and parquet datasource round-trips of a TIME column. - PlanGenerationTestSuite: golden plan-generation coverage for make_time, current_time and current_time(precision); adds the missing LocalTime case to the function lit array literal test. New golden files generated accordingly. - LiteralExpressionProtoConverterSuite: proto/catalyst round-trip coverage for TIME literals, including non-default precision propagation. No engine or logic changes; this is test-only coverage. Sub-task of SPARK-57550. ### Why are the changes needed? The TIME type (SPIP SPARK-51162, shipped in 4.1.0) lacked Connect test coverage matching DATE/TIMESTAMP, leaving the client -> protobuf -> server round-trip and TIME builtins over Connect unverified. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New tests only. Ran locally (scoped suites): - connect-client-jvm/testOnly ...PlanGenerationTestSuite (727 passed; goldens regenerated and reviewed) - connect/testOnly ...LiteralExpressionProtoConverterSuite (44 passed) - connect-client-jvm/testOnly ...ClientE2ETestSuite -- -z "SPARK-57566" (5 passed) Generated using Kiro (Claude Opus 4.8)
… scalafmt Fix CI failures from the initial commit: - Add the ProtoToParsedPlanTestSuite explain-results goldens for the new make_time, current_time and current_time(precision) plans, and update function_lit_array.explain for the added LocalTime element. These were missing because only PlanGenerationTestSuite goldens were regenerated. - Apply scalafmt formatting to the changed connect test sources (the sql/connect modules enforce scalafmt in CI). Generated using Kiro (Claude Opus 4.8)
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 0 non-blocking, 0 nits. LGTM — clean, well-constructed test-only PR adding Spark Connect TIME coverage (e2e, golden plans, proto round-trip).
Verification
Checked the assertions are meaningful (real checkAnswer rows; make_time also asserts the TimeType schema; current_time asserts type only — correct for a non-deterministic value; the proto tests assert both precision and value round-trip) and the golden files are faithful to the generating code: make_time -> DateTimeUtils.makeTime(...), current_time(6, Some(<zone>)) / current_time(3, ...), and the lit-array TIME element captured losslessly as TIME '23:59:59.999999999' (86399999999999 nanos), consistent with the proto round-trip confirming a 9-nanos value survives a TimeType() literal. The createDataFrame / parquet round-trips deliberately use an exact-microsecond value so checkAnswer round-trips without precision-loss flakiness, and suite placement is correct for each layer.
### What changes were proposed in this pull request? This PR adds end-to-end, golden-file, and proto round-trip test coverage for the TIME data type (`TimeType`) over Spark Connect. This is a test-only change with no engine or logic modifications. SPARK-57566 is a sub-task of the umbrella SPARK-57550 (extend TIME type support across Spark). **ClientE2ETestSuite** (Scala client, end-to-end against a real in-process Connect server): - `make_time` builtin returns a TIME value over Connect - `hour`, `minute`, `second` extract fields from a TIME value over Connect - `current_time` returns a TIME-typed column over Connect - A TIME column round-trips via `createDataFrame` over Connect - A TIME column round-trips through a parquet datasource over Connect The parquet datasource round-trip test verifies that a TIME column survives a real write-then-read through a file datasource over Connect, not just in-memory. This exercises parquet's TIME logical-type handling end-to-end and was added to satisfy the ticket's "createDataFrame / datasource round-trip" scope. **PlanGenerationTestSuite** (golden plan-generation): - Added `make_time`, `current_time`, and `current_time(precision)` function tests - Added the missing `LocalTime` case to the `function lit array` literal test (it already covered every other temporal type) - New and updated golden files generated via `SPARK_GENERATE_GOLDEN_FILES=1` and manually reviewed **LiteralExpressionProtoConverterSuite** (server-side proto/catalyst round-trip): - Added TIME literal round-trip coverage, including non-default precision propagation **Python note:** No new Python tests were added because Connect Python TIME coverage already exists through the parity suites. The classic PySpark tests `test_make_time`, `test_current_time`, and `test_create_dataframe_from_datetime_time` are re-run over a Connect session via `test_parity_functions` and `test_parity_creation`. The Python side is intentionally covered by existing parity, not omitted. ### Why are the changes needed? The TIME data type was added by SPIP SPARK-51162 and shipped in Spark 4.1.0, but Spark Connect previously had only minimal TIME coverage: a single `SELECT TIME '12:13:14'` literal test in `ClientE2ETestSuite`, `LocalTime` literals in `PlanGenerationTestSuite`, and schema handling in `SparkConnectPlannerSuite`. The client-to-protobuf-to-server round-trip and the TIME builtin functions over Connect were unverified. These tests close that gap. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only change. All scoped suites were run locally and passed: - `connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite` — 727 tests passed (golden files regenerated and manually reviewed) - `connect/testOnly org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverterSuite` — 44 tests passed - `connect-client-jvm/testOnly org.apache.spark.sql.connect.ClientE2ETestSuite -- -z "SPARK-57566"` — 5 tests passed ### Was this patch authored or co-authored using generative AI tooling? Yes. Generated using Kiro (Claude Opus 4.8) Closes #56797 from vboo123/SPARK-57566. Lead-authored-by: Vaibhav Garg <89175545+vboo123@users.noreply.github.com> Co-authored-by: Vaibhav Garg <gargvboo@amazon.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 958e7ce) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR adds end-to-end, golden-file, and proto round-trip test coverage for the TIME data type (
TimeType) over Spark Connect. This is a test-only change with no engine or logic modifications. SPARK-57566 is a sub-task of the umbrella SPARK-57550 (extend TIME type support across Spark).ClientE2ETestSuite (Scala client, end-to-end against a real in-process Connect server):
make_timebuiltin returns a TIME value over Connecthour,minute,secondextract fields from a TIME value over Connectcurrent_timereturns a TIME-typed column over ConnectcreateDataFrameover ConnectThe parquet datasource round-trip test verifies that a TIME column survives a real write-then-read through a file datasource over Connect, not just in-memory. This exercises parquet's TIME logical-type handling end-to-end and was added to satisfy the ticket's "createDataFrame / datasource round-trip" scope.
PlanGenerationTestSuite (golden plan-generation):
make_time,current_time, andcurrent_time(precision)function testsLocalTimecase to thefunction lit arrayliteral test (it already covered every other temporal type)SPARK_GENERATE_GOLDEN_FILES=1and manually reviewedLiteralExpressionProtoConverterSuite (server-side proto/catalyst round-trip):
Python note: No new Python tests were added because Connect Python TIME coverage already exists through the parity suites. The classic PySpark tests
test_make_time,test_current_time, andtest_create_dataframe_from_datetime_timeare re-run over a Connect session viatest_parity_functionsandtest_parity_creation. The Python side is intentionally covered by existing parity, not omitted.Why are the changes needed?
The TIME data type was added by SPIP SPARK-51162 and shipped in Spark 4.1.0, but Spark Connect previously had only minimal TIME coverage: a single
SELECT TIME '12:13:14'literal test inClientE2ETestSuite,LocalTimeliterals inPlanGenerationTestSuite, and schema handling inSparkConnectPlannerSuite. The client-to-protobuf-to-server round-trip and the TIME builtin functions over Connect were unverified. These tests close that gap.Does this PR introduce any user-facing change?
No.
How was this patch tested?
This is a test-only change. All scoped suites were run locally and passed:
connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite— 727 tests passed (golden files regenerated and manually reviewed)connect/testOnly org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverterSuite— 44 tests passedconnect-client-jvm/testOnly org.apache.spark.sql.connect.ClientE2ETestSuite -- -z "SPARK-57566"— 5 tests passedWas this patch authored or co-authored using generative AI tooling?
Yes. Generated using Kiro (Claude Opus 4.8)