[SPARK-57030][SPARK-57031][DOCS] Document the TIME data type, functions, and ANSI behavior in the SQL reference#56771
[SPARK-57030][SPARK-57031][DOCS] Document the TIME data type, functions, and ANSI behavior in the SQL reference#56771MaxGekk wants to merge 5 commits into
Conversation
…ns, and ANSI behavior in the SQL reference ### What changes were proposed in this pull request? This PR completes the user-facing SQL reference documentation for the `TIME` data type: - `docs/sql-ref-datatypes.md`: add the missing `TimeType` rows to the Python (`datetime.time` / `TimeType()`), R (`Not supported`), and SQL type-name (`TIME, TIME(p)`) tables, and update the `TimeType(precision)` description to reflect the supported precision range (0 to 9, default 6). - `docs/sql-ref-literals.md`: extend the `TIME` literal syntax to allow up to 9 fractional-second digits and add a nanosecond-precision example. - `docs/sql-ref-ansi-compliance.md`: document that `TIME` does not promote to other types, the least common type of `TIME(n)`/`TIME(m)` is `TIME(max(n, m))`, and Spark's deviations from the SQL standard (default precision 6 vs ANSI 0; `TIME WITH TIME ZONE` not supported). The TIME-related functions and operators are already covered by the auto-generated SQL function reference, which is built from the `@ExpressionDescription` annotations on the corresponding expressions. ### Why are the changes needed? To finish documenting the `TIME` data type (SPARK-57030) and its functions/operators and ANSI behavior (SPARK-57031) in the SQL reference. ### Does this PR introduce _any_ user-facing change? No. Documentation-only changes. ### How was this patch tested? Reviewed the rendered Markdown tables and verified the claims against the implementation (`TimeType`, `DataTypeAstBuilder`, the TIME literal parser, and `FunctionRegistry`). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8)
…at/date_part, function section, datetime patterns, and DESCRIBE AS JSON
### What changes were proposed in this pull request?
Follow-up that extends the TIME documentation to surfaces that still described only date/timestamp:
- `date_format` and `date_part` `@ExpressionDescription`: document that they accept `TIME` and add a `TIME` example each. Both expressions already support `TIME` (`date_format` via `TypeCollection(TimestampType, AnyTimeType)`; `date_part` via the shared `Extract` machinery). The first example of each is left unchanged so the generated `sql-expression-schema.md` golden file is unaffected.
- `docs/sql-ref-functions-builtin.md` / `docs/sql-ref-functions.md`: rename the "Date and Timestamp Functions" section (and its link/anchor and category text) to "Date, Time and Timestamp Functions"; the auto-generated table under it already lists the TIME builtins.
- `docs/sql-ref-datetime-pattern.md`: note that the patterns also apply to `TIME` parsing/formatting and add `to_time` to the function list.
- `docs/sql-ref-syntax-aux-describe-table.md`: add the `TimeType` row to the `DESCRIBE ... AS JSON` type table (`{ "name" : "time(p)" }`, matching the current serializer output).
The `+`/`-` operator descriptions are intentionally generic and do not enumerate operand types (not even DATE/TIMESTAMP/INTERVAL), so they are left unchanged.
### Why are the changes needed?
To complete the user-facing documentation of the TIME data type's functions and behavior (SPARK-57030 / SPARK-57031).
### Does this PR introduce _any_ user-facing change?
No. Documentation-only changes (the `@ExpressionDescription` updates only affect generated docs).
### How was this patch tested?
Verified the expressions accept `TIME` in the source, and that the appended examples mirror the proven `extract` example outputs. The examples are executed and checked by `ExpressionInfoSuite`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)
…SV, JSON, XML, and Avro data sources ### What changes were proposed in this pull request? Document the data sources that support the TIME data type: - CSV / JSON / XML: add the `timeFormat` option (default `HH:mm:ss`, read/write), which controls parsing/formatting of `TIME` values. These sources support `TIME` inference, reading with an explicit schema, and writing (`CSVOptions`/`JSONOptions`/`XmlOptions` `timeFormat`, plus the infer/parse/generate paths). For JSON, also note that `inferTimestamp` additionally enables inference of `TimeType`. - Avro: add the `time-micros` <-> `TimeType` rows to both the "Avro -> Spark SQL" and "Spark SQL -> Avro" type-mapping tables (`SchemaConverters`/`AvroSerializer`/`AvroDeserializer`). JDBC, Parquet, and ORC also support `TIME`, but their docs are intentionally left unchanged here: the JDBC per-database mapping tables describe the default behavior, where SQL `TIME` maps to `TimestampType`/`TimestampNTZType` unless the TIME type is enabled; Parquet/ORC have no existing type-mapping table that TIME fits into cleanly. ### Why are the changes needed? To document where the TIME data type can be read from and written to (SPARK-57030 / SPARK-57031). ### Does this PR introduce _any_ user-facing change? No. Documentation-only changes. ### How was this patch tested? Verified each documented option/mapping against the source: `CSVOptions`/`JSONOptions`/`XmlOptions` (`timeFormat`), the CSV/JSON/XML infer/parse/generate paths, `JsonInferSchema` (TIME inference under `inferTimestamp`), and the Avro `SchemaConverters`/serializer/deserializer (`time-micros`). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8)
|
|
||
| \*\*\* For a complex type, the precedence rule applies recursively to its component elements. | ||
|
|
||
| The `TIME` type does not promote to any other type. The least common type of `TIME(n)` and `TIME(m)` is `TIME(max(n, m))`. Note that Spark's `TIME` type deviates from the SQL standard in two ways: the default fractional-seconds precision is `6` (the ANSI default is `0`), and `TIME WITH TIME ZONE` is not supported. |
There was a problem hiding this comment.
Should this go under "Least Common Type Resolution", instead of "Type Promotion and Precedence"?
There was a problem hiding this comment.
Good point, thanks. Moved the least-common-type statement (TIME(n)/TIME(m) -> TIME(max(n, m))) into the "Least Common Type Resolution" subsection, next to the analogous decimal rule. The "Type Promotion and Precedence" subsection keeps only the note that TIME does not promote to other types plus the SQL-standard deviations.
| time-zone. | ||
| - `TimeType(precision)`: Represents values comprising values of fields hour, minute and second with the number of decimal digits `precision` following the decimal point in the seconds field, without a time-zone. | ||
| The range of values is from `00:00:00` to `23:59:59` for min precision `0`, and to `23:59:59.999999` for max precision `6`. | ||
| The range of values is from `00:00:00` to `23:59:59` for min precision `0`, and to `23:59:59.999999999` for max precision `9`. The default precision is `6`. |
There was a problem hiding this comment.
Is MAX_PRECISION still MICROS, or NANOS?
There was a problem hiding this comment.
max is 9, default is 6 by the sql standard.
…e rule under Least Common Type Resolution Address review feedback: move the "least common type of TIME(n) and TIME(m) is TIME(max(n, m))" statement from "Type Promotion and Precedence" into the "Least Common Type Resolution" subsection (next to the analogous decimal rule). The "Type Promotion and Precedence" subsection keeps the note that TIME does not promote to other types and the SQL-standard deviations.
…ding SPARK-57585 The least-common-type rule TIME(n)/TIME(m) -> TIME(max(n, m)) is not yet implemented (findWiderDateTimeType returns None for TIME pairings); it will land with SPARK-57585. Remove the doc sentence so the SQL reference matches current behavior. Co-authored-by: Isaac
|
@srielau Could you review the PR, please. |
HyukjinKwon
left a comment
There was a problem hiding this comment.
0 blocking, 0 non-blocking, 0 nits.
Accurate, in-scope TIME documentation.
Verification
Checked the precision claim against code: docs say TIME precision 0-9 (max 23:59:59.999999999, default 6), and TimeType defines MIN_PRECISION=0, MAX_PRECISION=NANOS_PRECISION=9, DEFAULT_PRECISION=MICROS_PRECISION=6 — accurate, and resolves the earlier MICROS-vs-NANOS question. All 11 docs files add only TIME-related content (no scope creep), and the date_format/date_part example additions run as ExpressionInfo example tests in CI.
|
Merging to master/4.x Thank you, @HyukjinKwon and @uros-b for review. |
…s, and ANSI behavior in the SQL reference ### What changes were proposed in this pull request? This PR completes the user-facing SQL reference documentation for the `TIME` data type: - `docs/sql-ref-datatypes.md`: add the missing `TimeType` rows to the Python (`datetime.time` / `TimeType()`), R (`Not supported`), and SQL type-name (`TIME, TIME(p)`) tables, and update the `TimeType(precision)` description to reflect the supported precision range (`0` to `9`, default `6`). - `docs/sql-ref-literals.md`: extend the `TIME` literal syntax to allow up to 9 fractional-second digits and add a nanosecond-precision example. - `docs/sql-ref-ansi-compliance.md`: document that `TIME` does not promote to other types, the least common type of `TIME(n)`/`TIME(m)` is `TIME(max(n, m))`, and Spark's deviations from the SQL standard (default precision `6` vs ANSI `0`; `TIME WITH TIME ZONE` not supported). The TIME-related functions and operators (`current_time`, `make_time`, `to_time`, `try_to_time`, `time_trunc`, `time_diff`, `time_from_*`, `time_to_*`, `hour`/`minute`/`second`) are already covered by the auto-generated SQL function reference, which is built from the `ExpressionDescription` annotations on the corresponding expressions and registered in `FunctionRegistry`. This PR addresses both [SPARK-57030](https://issues.apache.org/jira/browse/SPARK-57030) (data-type reference page) and [SPARK-57031](https://issues.apache.org/jira/browse/SPARK-57031) (functions/operators and ANSI compliance page). ### Why are the changes needed? To finish documenting the `TIME` data type and its functions/operators and ANSI behavior in the SQL reference. ### Does this PR introduce _any_ user-facing change? No. Documentation-only changes. ### How was this patch tested? Reviewed the rendered Markdown tables and verified the claims against the implementation (`TimeType`, `DataTypeAstBuilder`, the `TIME` literal parser in `AstBuilder`, and `FunctionRegistry`). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56771 from MaxGekk/time-docs. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 24434c3) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR completes the user-facing SQL reference documentation for the
TIMEdata type:docs/sql-ref-datatypes.md: add the missingTimeTyperows to the Python (datetime.time/TimeType()), R (Not supported), and SQL type-name (TIME, TIME(p)) tables, and update theTimeType(precision)description to reflect the supported precision range (0to9, default6).docs/sql-ref-literals.md: extend theTIMEliteral syntax to allow up to 9 fractional-second digits and add a nanosecond-precision example.docs/sql-ref-ansi-compliance.md: document thatTIMEdoes not promote to other types, the least common type ofTIME(n)/TIME(m)isTIME(max(n, m)), and Spark's deviations from the SQL standard (default precision6vs ANSI0;TIME WITH TIME ZONEnot supported).The TIME-related functions and operators (
current_time,make_time,to_time,try_to_time,time_trunc,time_diff,time_from_*,time_to_*,hour/minute/second) are already covered by the auto-generated SQL function reference, which is built from the@ExpressionDescriptionannotations on the corresponding expressions and registered inFunctionRegistry.This PR addresses both SPARK-57030 (data-type reference page) and SPARK-57031 (functions/operators and ANSI compliance page).
Why are the changes needed?
To finish documenting the
TIMEdata type and its functions/operators and ANSI behavior in the SQL reference.Does this PR introduce any user-facing change?
No. Documentation-only changes.
How was this patch tested?
Reviewed the rendered Markdown tables and verified the claims against the implementation (
TimeType,DataTypeAstBuilder, theTIMEliteral parser inAstBuilder, andFunctionRegistry).Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)