Skip to content

[SPARK-57030][SPARK-57031][DOCS] Document the TIME data type, functions, and ANSI behavior in the SQL reference#56771

Closed
MaxGekk wants to merge 5 commits into
apache:masterfrom
MaxGekk:time-docs
Closed

[SPARK-57030][SPARK-57031][DOCS] Document the TIME data type, functions, and ANSI behavior in the SQL reference#56771
MaxGekk wants to merge 5 commits into
apache:masterfrom
MaxGekk:time-docs

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 25, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR completes the user-facing SQL reference documentation for the TIME data type:

  • docs/sql-ref-datatypes.md: add the missing TimeType rows to the Python (datetime.time / TimeType()), R (Not supported), and SQL type-name (TIME, TIME(p)) tables, and update the TimeType(precision) description to reflect the supported precision range (0 to 9, default 6).
  • docs/sql-ref-literals.md: extend the TIME literal syntax to allow up to 9 fractional-second digits and add a nanosecond-precision example.
  • docs/sql-ref-ansi-compliance.md: document that TIME does not promote to other types, the least common type of TIME(n)/TIME(m) is TIME(max(n, m)), and Spark's deviations from the SQL standard (default precision 6 vs ANSI 0; TIME WITH TIME ZONE not supported).

The TIME-related functions and operators (current_time, make_time, to_time, try_to_time, time_trunc, time_diff, time_from_*, time_to_*, hour/minute/second) are already covered by the auto-generated SQL function reference, which is built from the @ExpressionDescription annotations on the corresponding expressions and registered in FunctionRegistry.

This PR addresses both SPARK-57030 (data-type reference page) and SPARK-57031 (functions/operators and ANSI compliance page).

Why are the changes needed?

To finish documenting the TIME data type and its functions/operators and ANSI behavior in the SQL reference.

Does this PR introduce any user-facing change?

No. Documentation-only changes.

How was this patch tested?

Reviewed the rendered Markdown tables and verified the claims against the implementation (TimeType, DataTypeAstBuilder, the TIME literal parser in AstBuilder, and FunctionRegistry).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk added 3 commits June 25, 2026 13:07
…ns, and ANSI behavior in the SQL reference

### What changes were proposed in this pull request?

This PR completes the user-facing SQL reference documentation for the `TIME` data type:

- `docs/sql-ref-datatypes.md`: add the missing `TimeType` rows to the Python (`datetime.time` / `TimeType()`), R (`Not supported`), and SQL type-name (`TIME, TIME(p)`) tables, and update the `TimeType(precision)` description to reflect the supported precision range (0 to 9, default 6).
- `docs/sql-ref-literals.md`: extend the `TIME` literal syntax to allow up to 9 fractional-second digits and add a nanosecond-precision example.
- `docs/sql-ref-ansi-compliance.md`: document that `TIME` does not promote to other types, the least common type of `TIME(n)`/`TIME(m)` is `TIME(max(n, m))`, and Spark's deviations from the SQL standard (default precision 6 vs ANSI 0; `TIME WITH TIME ZONE` not supported).

The TIME-related functions and operators are already covered by the auto-generated SQL function reference, which is built from the `@ExpressionDescription` annotations on the corresponding expressions.

### Why are the changes needed?

To finish documenting the `TIME` data type (SPARK-57030) and its functions/operators and ANSI behavior (SPARK-57031) in the SQL reference.

### Does this PR introduce _any_ user-facing change?

No. Documentation-only changes.

### How was this patch tested?

Reviewed the rendered Markdown tables and verified the claims against the implementation (`TimeType`, `DataTypeAstBuilder`, the TIME literal parser, and `FunctionRegistry`).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)
…at/date_part, function section, datetime patterns, and DESCRIBE AS JSON

### What changes were proposed in this pull request?

Follow-up that extends the TIME documentation to surfaces that still described only date/timestamp:

- `date_format` and `date_part` `@ExpressionDescription`: document that they accept `TIME` and add a `TIME` example each. Both expressions already support `TIME` (`date_format` via `TypeCollection(TimestampType, AnyTimeType)`; `date_part` via the shared `Extract` machinery). The first example of each is left unchanged so the generated `sql-expression-schema.md` golden file is unaffected.
- `docs/sql-ref-functions-builtin.md` / `docs/sql-ref-functions.md`: rename the "Date and Timestamp Functions" section (and its link/anchor and category text) to "Date, Time and Timestamp Functions"; the auto-generated table under it already lists the TIME builtins.
- `docs/sql-ref-datetime-pattern.md`: note that the patterns also apply to `TIME` parsing/formatting and add `to_time` to the function list.
- `docs/sql-ref-syntax-aux-describe-table.md`: add the `TimeType` row to the `DESCRIBE ... AS JSON` type table (`{ "name" : "time(p)" }`, matching the current serializer output).

The `+`/`-` operator descriptions are intentionally generic and do not enumerate operand types (not even DATE/TIMESTAMP/INTERVAL), so they are left unchanged.

### Why are the changes needed?

To complete the user-facing documentation of the TIME data type's functions and behavior (SPARK-57030 / SPARK-57031).

### Does this PR introduce _any_ user-facing change?

No. Documentation-only changes (the `@ExpressionDescription` updates only affect generated docs).

### How was this patch tested?

Verified the expressions accept `TIME` in the source, and that the appended examples mirror the proven `extract` example outputs. The examples are executed and checked by `ExpressionInfoSuite`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)
…SV, JSON, XML, and Avro data sources

### What changes were proposed in this pull request?

Document the data sources that support the TIME data type:

- CSV / JSON / XML: add the `timeFormat` option (default `HH:mm:ss`, read/write), which controls parsing/formatting of `TIME` values. These sources support `TIME` inference, reading with an explicit schema, and writing (`CSVOptions`/`JSONOptions`/`XmlOptions` `timeFormat`, plus the infer/parse/generate paths). For JSON, also note that `inferTimestamp` additionally enables inference of `TimeType`.
- Avro: add the `time-micros` <-> `TimeType` rows to both the "Avro -> Spark SQL" and "Spark SQL -> Avro" type-mapping tables (`SchemaConverters`/`AvroSerializer`/`AvroDeserializer`).

JDBC, Parquet, and ORC also support `TIME`, but their docs are intentionally left unchanged here: the JDBC per-database mapping tables describe the default behavior, where SQL `TIME` maps to `TimestampType`/`TimestampNTZType` unless the TIME type is enabled; Parquet/ORC have no existing type-mapping table that TIME fits into cleanly.

### Why are the changes needed?

To document where the TIME data type can be read from and written to (SPARK-57030 / SPARK-57031).

### Does this PR introduce _any_ user-facing change?

No. Documentation-only changes.

### How was this patch tested?

Verified each documented option/mapping against the source: `CSVOptions`/`JSONOptions`/`XmlOptions` (`timeFormat`), the CSV/JSON/XML infer/parse/generate paths, `JsonInferSchema` (TIME inference under `inferTimestamp`), and the Avro `SchemaConverters`/serializer/deserializer (`time-micros`).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)
Comment thread docs/sql-ref-ansi-compliance.md Outdated

\*\*\* For a complex type, the precedence rule applies recursively to its component elements.

The `TIME` type does not promote to any other type. The least common type of `TIME(n)` and `TIME(m)` is `TIME(max(n, m))`. Note that Spark's `TIME` type deviates from the SQL standard in two ways: the default fractional-seconds precision is `6` (the ANSI default is `0`), and `TIME WITH TIME ZONE` is not supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go under "Least Common Type Resolution", instead of "Type Promotion and Precedence"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. Moved the least-common-type statement (TIME(n)/TIME(m) -> TIME(max(n, m))) into the "Least Common Type Resolution" subsection, next to the analogous decimal rule. The "Type Promotion and Precedence" subsection keeps only the note that TIME does not promote to other types plus the SQL-standard deviations.

Comment thread docs/sql-ref-datatypes.md
time-zone.
- `TimeType(precision)`: Represents values comprising values of fields hour, minute and second with the number of decimal digits `precision` following the decimal point in the seconds field, without a time-zone.
The range of values is from `00:00:00` to `23:59:59` for min precision `0`, and to `23:59:59.999999` for max precision `6`.
The range of values is from `00:00:00` to `23:59:59` for min precision `0`, and to `23:59:59.999999999` for max precision `9`. The default precision is `6`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MAX_PRECISION still MICROS, or NANOS?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max is 9, default is 6 by the sql standard.

MaxGekk added 2 commits June 25, 2026 15:17
…e rule under Least Common Type Resolution

Address review feedback: move the "least common type of TIME(n) and TIME(m) is TIME(max(n, m))" statement from "Type Promotion and Precedence" into the "Least Common Type Resolution" subsection (next to the analogous decimal rule). The "Type Promotion and Precedence" subsection keeps the note that TIME does not promote to other types and the SQL-standard deviations.
…ding SPARK-57585

The least-common-type rule TIME(n)/TIME(m) -> TIME(max(n, m)) is not yet implemented (findWiderDateTimeType returns None for TIME pairings); it will land with SPARK-57585. Remove the doc sentence so the SQL reference matches current behavior.

Co-authored-by: Isaac
@MaxGekk

MaxGekk commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

@srielau Could you review the PR, please.

@HyukjinKwon HyukjinKwon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 0 non-blocking, 0 nits.
Accurate, in-scope TIME documentation.

Verification

Checked the precision claim against code: docs say TIME precision 0-9 (max 23:59:59.999999999, default 6), and TimeType defines MIN_PRECISION=0, MAX_PRECISION=NANOS_PRECISION=9, DEFAULT_PRECISION=MICROS_PRECISION=6 — accurate, and resolves the earlier MICROS-vs-NANOS question. All 11 docs files add only TIME-related content (no scope creep), and the date_format/date_part example additions run as ExpressionInfo example tests in CI.

@MaxGekk

MaxGekk commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

Merging to master/4.x Thank you, @HyukjinKwon and @uros-b for review.

@MaxGekk MaxGekk closed this in 24434c3 Jun 26, 2026
MaxGekk added a commit that referenced this pull request Jun 26, 2026
…s, and ANSI behavior in the SQL reference

### What changes were proposed in this pull request?

This PR completes the user-facing SQL reference documentation for the `TIME` data type:

- `docs/sql-ref-datatypes.md`: add the missing `TimeType` rows to the Python (`datetime.time` / `TimeType()`), R (`Not supported`), and SQL type-name (`TIME, TIME(p)`) tables, and update the `TimeType(precision)` description to reflect the supported precision range (`0` to `9`, default `6`).
- `docs/sql-ref-literals.md`: extend the `TIME` literal syntax to allow up to 9 fractional-second digits and add a nanosecond-precision example.
- `docs/sql-ref-ansi-compliance.md`: document that `TIME` does not promote to other types, the least common type of `TIME(n)`/`TIME(m)` is `TIME(max(n, m))`, and Spark's deviations from the SQL standard (default precision `6` vs ANSI `0`; `TIME WITH TIME ZONE` not supported).

The TIME-related functions and operators (`current_time`, `make_time`, `to_time`, `try_to_time`, `time_trunc`, `time_diff`, `time_from_*`, `time_to_*`, `hour`/`minute`/`second`) are already covered by the auto-generated SQL function reference, which is built from the `ExpressionDescription` annotations on the corresponding expressions and registered in `FunctionRegistry`.

This PR addresses both [SPARK-57030](https://issues.apache.org/jira/browse/SPARK-57030) (data-type reference page) and [SPARK-57031](https://issues.apache.org/jira/browse/SPARK-57031) (functions/operators and ANSI compliance page).

### Why are the changes needed?

To finish documenting the `TIME` data type and its functions/operators and ANSI behavior in the SQL reference.

### Does this PR introduce _any_ user-facing change?

No. Documentation-only changes.

### How was this patch tested?

Reviewed the rendered Markdown tables and verified the claims against the implementation (`TimeType`, `DataTypeAstBuilder`, the `TIME` literal parser in `AstBuilder`, and `FunctionRegistry`).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

Closes #56771 from MaxGekk/time-docs.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 24434c3)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants