[SPARK-55444][SQL] Types Framework - Phase 3a - Storage Formats (Parquet) by davidm-db · Pull Request #55326 · apache/spark

davidm-db · 2026-04-13T17:35:46Z

What changes were proposed in this pull request?

This PR implements Phase 3a (Storage Formats - Parquet) of the Spark Types Framework (SPARK-55444, parent: SPARK-53504). It adds a new optional ParquetTypeOps trait that enables framework-managed types to participate in Parquet read/write paths with zero per-type changes to Parquet infrastructure files.

Scope of this PR: the schema-conversion, write-path, and row-based read-path integration sites — 6 Scala files modified plus 2 added. Vectorized-read (ParquetVectorUpdaterFactory, VectorizedColumnReader) and filter-pushdown (ParquetFilters) integration are deferred — see "Follow-ups" below.

New trait: ParquetTypeOps in sql/core (package o.a.s.sql.execution.datasources.parquet.types.ops) following the Phase 1c pattern (ConnectArrowTypeOps — separate module, separate factory). The trait surface implemented in this PR:

Schema conversion (Spark DataType → Parquet schema type, write path)
Value write (writing values to the Parquet RecordConsumer)
Row-based read (creating Parquet converters for reading into InternalRow)
Type gates (supportDataType, plus the isBatchReadSupported capability flag)
Schema clipping (parquetStructSchema for column pruning of struct-backed types)

Not yet on the trait (deferred to Follow-ups): vectorized-read batch updaters and filter-pushdown predicates. Today the trait only exposes the isBatchReadSupported gate for the vectorized path and has no filter surface.

Reference implementation: TimeTypeParquetOps validates the schema, write, row-read, and type-gate paths for a primitive INT64-backed type.

Dispatch pattern: Framework FIRST at all integration sites, with the entire original code extracted to *Default methods as fallback — the same Ops(dt).map(_.method).getOrElse(methodDefault(dt)) pattern established in Phase 1a (PR #54223) and Phase 1c (PR #54905). TimeType is always tested through the framework path when the feature flag is ON.

Integration sites (in this PR):

File	Dispatch
`ParquetSchemaConverter`	Write-path schema: framework-first `convertField` → `convertToParquetType`, original code extracted to `convertFieldDefault`. (The Parquet→Spark read/inference direction is unchanged apart from a cosmetic `case _ => illegalType()` line merge; `TimeType` inference still uses the hardcoded `TimeLogicalTypeAnnotation` case.)
`ParquetWriteSupport`	Framework-first split of `makeWriter` into `makeWriter` + `makeWriterDefault`. (No companion-utility extraction — `consumeGroup`/`writeFields` are untouched.)
`ParquetRowConverter`	Framework-first `newConverter` with method overloading (simple for primitive, extended for struct-backed)
`ParquetFileFormat`	Framework-first `supportDataType`
`ParquetUtils`	Framework-first `isBatchReadSupported`
`ParquetReadSupport`	Framework-first `clipParquetType` for struct-backed types via `parquetStructSchema`

Integration sites (follow-up, not in this PR):

File	Dispatch
`ParquetVectorUpdaterFactory` (Java)	Framework-first `getUpdater` via Java-friendly `getVectorUpdaterOrNull`
`VectorizedColumnReader` (Java)	Framework-first `isLazyDecodingSupported` via Java-friendly `isLazyDecodingSupportedFor`
`ParquetFilters`	`FrameworkFilterOps` custom extractor + `orElse` on 7 PartialFunctions + framework-first `valueCanMakeFilterOn`

Design decisions:

ParquetTypeOps is a separate trait in sql/core (not on TypeOps in sql/catalyst) because Parquet types (RecordConsumer, ParquetVectorUpdater, etc.) live in sql/core and catalyst cannot reference them.
rowRepresentationType (Phase 1b) is NOT used for Parquet — it is scoped to row infrastructure only. Using it would erase type identity in Parquet value paths, create dispatch asymmetry between struct-backed and primitive types, and extend it beyond its designed scope.
parquetStructSchema is independent of PhysicalDataType — Parquet storage representation may differ from internal row representation.
recordConsumer is passed as () => RecordConsumer (lazy supplier) because makeWriter is called during init() when recordConsumer is still null (set later in prepareForWrite()).
Filter dispatch (deferred to the follow-up) will use a FrameworkFilterOps custom extractor inside ParquetFilters because ParquetSchemaType is a private inner class that cannot be referenced from outside.

Follow-ups (not in this PR):

Vectorized-read integration — ParquetVectorUpdaterFactory and VectorizedColumnReader (both Java). Will use Java-friendly methods (getVectorUpdaterOrNull, isLazyDecodingSupportedFor) to dispatch from Java code paths into the Scala ParquetTypeOps.
Filter-pushdown integration — ParquetFilters via a FrameworkFilterOps custom extractor added to 7 PartialFunctions plus framework-first valueCanMakeFilterOn. Requires a custom extractor because ParquetSchemaType is a private inner class that cannot be referenced from outside ParquetFilters.
Second reference implementation (struct-backed) — TimeType is INT64-backed and exercises only the primitive paths. A struct-backed reference type would harden parquetStructSchema, clipParquetType framework-first dispatch, and the extended newConverter overload before downstream consumers (e.g., extended-range nanosecond timestamps, DECFLOAT) land.
Read-path guard ON/OFF reconciliation (SPARK-57416) — reconcile the TimeType read-path guard divergence described under "user-facing change" below: either relax the framework guard (TimeTypeParquetOps.requireCompatibleParquetType) to mirror the legacy ParquetRowConverter guard, or apply the same tightening to the *Default path and add a test documenting the intended change.

Why are the changes needed?

Adding a new data type to Spark currently requires modifying 8+ Parquet files with scattered, type-specific logic. This PR enables the framework to handle all Parquet concerns for new types — a new type implements ParquetTypeOps and registers in the companion's apply(), and the Parquet infrastructure files dispatch through it automatically.

This is the first storage format integration (Phase 3a). TimeType serves as the reference implementation validating the paths wired in this PR.

Does this PR introduce any user-facing change?

No. This is a refactoring that adds framework dispatch to Parquet infrastructure. When the framework flag (spark.sql.types.framework.enabled) is ON (default in tests), TimeType's Parquet handling goes through the framework; when OFF, the original *Default code paths execute unchanged.

Behavior is identical between ON and OFF for the schema-conversion, write, and row-based read paths wired here, with one known, documented exception: the framework read-path guard TimeTypeParquetOps.requireCompatibleParquetType is stricter than the legacy ParquetRowConverter guard it replaces. For an INT64 TIME(MICROS, isAdjustedToUTC=true) column read as TimeType, framework ON rejects it (cannotCreateParquetConverterForDataTypeError / FAILED_READ_FILE) while OFF succeeds. This is reachable only via an explicit read schema (schema inference already maps isAdjustedToUTC=true to illegalType()), is orthogonal to the framework wiring in this PR, and its reconciliation is tracked as a follow-up in SPARK-57416.

How was this patch tested?

All existing Parquet test suites pass with the framework enabled (default in tests):

ParquetSchemaSuite: 131 tests passed
ParquetIOSuite: 88 tests passed (including "Read TimeType for the logical TIME type")
ParquetVectorizedSuite: 25 tests passed
ParquetV1FilterSuite + ParquetV2FilterSuite: 101 tests passed (including "SPARK-51687: filter pushdown - time")

Framework ON/OFF equivalence: all tests pass identically with spark.sql.types.framework.enabled = true and false.

New unit test TimeTypeParquetOpsSuite pins the accept/reject set of requireCompatibleParquetType, including the intentional isAdjustedToUTC=true rejection noted above (cross-referenced to SPARK-57416).

Note: ParquetVectorizedSuite and the V1/V2 filter suites exercise the existing (non-framework) code paths for TimeType, since vectorized-read and filter-pushdown integration are in the follow-ups above. Framework ON/OFF equivalence is verified for the schema-conversion, write, and row-based read paths actually wired in this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.6)

stevomitric · 2026-06-09T12:01:07Z

cc @MaxGekk please review.

stevomitric · 2026-06-09T13:33:05Z

cc @dejankrak-db please review.

…rces/parquet/types/ops/ParquetTypeOps.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>

…rces/parquet/types/ops/TimeTypeParquetOps.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>

MaxGekk · 2026-06-09T16:15:19Z

Read-path guard is stricter than the original → ON/OFF behavior difference

TimeTypeParquetOps.requireCompatibleParquetType requires INT64 && unit == MICROS && !isAdjustedToUTC. The original ParquetRowConverter guard it replaces only checked TimeLogicalTypeAnnotation && unit == MICROS — it did not check isAdjustedToUTC. So the new guard is strictly tighter, even though the code comment states it "Mirrors the inline guard that existed in ParquetRowConverter before the framework dispatch".

This produces a difference between framework ON and OFF when reading a TIME(MICROS, isAdjustedToUTC=true) column as TimeType (reachable via an explicit read schema, since schema inference already maps isAdjustedToUTC=true to illegalType()). I verified it empirically:

framework OFF → succeeds (returns 23:59:59.123456)
framework ON → fails with FAILED_READ_FILE / cannotCreateParquetConverterForDataTypeError

This contradicts the "behavior is identical in both cases" statement in the PR description.

Since this is an edge case (only via explicit read schema) and orthogonal to the framework wiring in this PR, I'd suggest handling it in a separate follow-up rather than blocking this one. The follow-up should decide intent and either:

mirror the original guard exactly (drop the !isAdjustedToUTC / INT64 checks) to preserve ON/OFF equivalence, or
if the stricter check is intentional, apply the same tightening to the *Default path too and add a test documenting the (now intended) change, plus update the comment.

MaxGekk · 2026-06-09T16:21:14Z

+  def newConverter(
+      parquetType: org.apache.parquet.schema.Type,
+      updater: ParentContainerUpdater): org.apache.parquet.io.api.Converter
+    with HasParentContainerUpdater


Minor design suggestion: the abstract method here is the simple overload, while the extended overload (below) has a default that delegates to it. As the scaladoc warns, struct-backed types must override the extended overload, yet they are still forced to provide an implementation of this simple one that the docs say is wrong for them ("will compile but produce incorrect behavior at runtime").

Consider making the extended overload the abstract one (and giving the simple overload a default that throws / delegates), so future struct-backed implementations aren't required to ship a misleading simple impl. Not blocking for this PR since the only reference type (TimeType) is primitive and correctly uses the simple version - just something to tidy before a struct-backed type lands.

Will followup with the change.

MaxGekk · 2026-06-09T16:23:13Z

+  def newConverter(
+      parquetType: org.apache.parquet.schema.Type,
+      updater: ParentContainerUpdater,
+      schemaConverter: org.apache.spark.sql.execution.datasources.parquet
+        .ParquetToSparkSchemaConverter,
+      convertTz: Option[java.time.ZoneId],
+      datetimeRebaseSpec: RebaseSpec,
+      int96RebaseSpec: RebaseSpec): org.apache.parquet.io.api.Converter
+    with HasParentContainerUpdater =


Nit (style): org.apache.parquet.schema.Type is already imported at the top of the file (and used unqualified in convertToParquetType), but here both newConverter overloads spell out fully-qualified names: org.apache.parquet.schema.Type, org.apache.parquet.io.api.Converter, org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter, and java.time.ZoneId. Mixing imported and fully-qualified references for the same Type is inconsistent and a bit noisy.

Suggest adding imports for Converter, ParquetToSparkSchemaConverter, and ZoneId (the latter two are in this package / java.time) and using the unqualified names throughout for readability.

MaxGekk · 2026-06-09T16:25:56Z

PR description is out of sync with the diff (doc-only)

A few things in the description don't match the actual changes; worth trimming so reviewers aren't looking for code that isn't there:

"ParquetWriteSupport | Companion utility extraction (consumeGroup, writeFields)" - not in the diff. The only change in ParquetWriteSupport is the framework-first split of makeWriter into makeWriter + makeWriterDefault; consumeGroup/writeFields are untouched.
"ParquetSchemaConverter | ... + read-path reverse lookup" - the only change in the Parquet->Spark direction is a cosmetic line merge (case _ => illegalType()). No framework dispatch is added on the read/inference path; TimeType inference still uses the hardcoded TimeLogicalTypeAnnotation case.
The ParquetTypeOps scaladoc says the trait "covers all Parquet concerns: ... Vectorized read: creating batch updaters ... Filter pushdown: creating Parquet filter predicates". The trait actually has neither (they're correctly deferred to the follow-ups) - it only exposes the isBatchReadSupported gate. Suggest rewording the scaladoc and the description's trait-surface bullet list to reflect what's implemented now vs. deferred.

None of these affect correctness; just aligning the narrative with the code.

MaxGekk

0 blocking, 2 non-blocking, 0 nits.
Clean wiring of the framework dispatch into the six integration sites. Two non-blocking notes on API extensibility and test coverage.

Design / architecture (1)

types/ops/ParquetTypeOps.scala:78: convertToParquetType drops inShredded — see inline

Suggestions (1)

types/ops/TimeTypeParquetOps.scala:105: no unit test for requireCompatibleParquetType reject paths — see inline

MaxGekk · 2026-06-09T21:53:44Z

+   * @param repetition REQUIRED, OPTIONAL, or REPEATED
+   * @return the Parquet Type for this DataType
+   */
+  def convertToParquetType(fieldName: String, repetition: Repetition): Type


SparkToParquetSchemaConverter.convertField calls convertToParquetType(field.name, repetition) but drops inShredded. convertFieldDefault uses that flag at line 726 for timestamp types inside shredded Variant schemas. No current framework type needs it, but a future struct-backed type might differ in Parquet schema when written inside a shredded context.

Adding a default parameter now is backward-compatible — existing impls ignore it, future ones opt in:

Suggested change

def convertToParquetType(fieldName: String, repetition: Repetition): Type

def convertToParquetType(fieldName: String, repetition: Repetition, inShredded: Boolean = false): Type

And the call site in convertField would forward it: .map(_.convertToParquetType(field.name, repetition, inShredded)). Avoids a breaking API change when the first struct-backed type lands.

Done, added inShredded: Boolean = false

MaxGekk · 2026-06-09T21:53:45Z

+   * the legacy ParquetRowConverter path so reads fail loudly instead of
+   * silently misinterpreting bytes.
+   */
+  private[ops] def requireCompatibleParquetType(


requireCompatibleParquetType is only exercised through integration tests for the accepted encoding. The reject paths — raw INT64 (no annotation), INT64 TIME(NANOS), INT32 TIME(MILLIS), INT64 TIME(MICROS, isAdjustedToUTC=true) — are not directly tested. A small unit test (e.g. a TimeTypeParquetOpsTest, or cases in ParquetSchemaSuite) would:

Document exactly which encodings are rejected and why.

Provide a regression hook for the isAdjustedToUTC=true ON/OFF behavior difference flagged in the conversation thread — whichever resolution is chosen (mirror the original guard, or tighten both paths), the test pins the intended behavior.

added TimeTypeParquetOpsSuite covering all four reject paths

- Add inShredded: Boolean = false default param to ParquetTypeOps.convertToParquetType and forward from SparkToParquetSchemaConverter.convertField call site, so future struct-backed types can branch on shredded Variant context without an API break. Existing TimeTypeParquetOps override accepts the param (ignored). - Replace fully-qualified type names in newConverter overloads with imports for ZoneId, Converter, and ParquetToSparkSchemaConverter so the trait surface reads consistently. - Add TimeTypeParquetOpsSuite with 9 unit tests pinning the requireCompatibleParquetType reject set (raw INT64, INT64 TIME(NANOS), INT32 TIME(MILLIS), INT64 TIME(MICROS, isAdjustedToUTC=true), plus TIMESTAMP/DECIMAL/group/BINARY safety cases). The isAdjustedToUTC=true case serves as the ON/OFF divergence regression hook. Co-authored-by: Isaac

stevomitric · 2026-06-10T14:00:09Z

Will update description once we reach the final version. cc @MaxGekk

The parquet-mr Types builder itself rejects this combination with IllegalStateException("TIME(MICROS,false) can only annotate INT64") at schema-construction time, so the wrong-primitive branch of requireCompatibleParquetType is unreachable for the TIME annotation. The raw-INT64 / TIMESTAMP / DECIMAL / group tests already cover the !isPrimitive and "non-TIME annotation" branches. Verified locally: TimeTypeParquetOpsSuite now 8/8 pass (sql/testOnly). Co-authored-by: Isaac

dejankrak-db · 2026-06-11T01:29:35Z

+  // ==================== Row-Based Read ====================
+
+  override def newConverter(
+      parquetType: org.apache.parquet.schema.Type,


The same FQN cleanup applied to ParquetTypeOps after Max's note was missed here: Type is already imported, and Converter could be.
Import Converter and drop the org.apache.parquet... prefixes on both, for consistency with the rest of the file.

@stevomitric Could you address this one as well? Same cleanup as in ParquetTypeOps after my earlier note — import Converter and use the already-imported Type unqualified.

dejankrak-db · 2026-06-11T01:31:46Z

+ * by 1000x).
+ *
+ * These tests document the exact reject set and pin the intended behavior of
+ * the read-path guard. They also serve as a regression hook for the


This scaladoc (and the test comment at line 80) describe the behavior in terms of the PR review history - "whichever resolution lands ... update this test". Once merged, that framing reads as provisional and leaves a future reader only a PR link for context.
Suggest stating the current invariant as intended behavior, and pointing to a durable follow-up reference (a JIRA ticket) for the pending isAdjustedToUTC resolution.

@stevomitric Please address this one too: reword the scaladoc and the comment at line 80 to state the current invariant as the intended behavior, and point to a JIRA ticket for the pending isAdjustedToUTC resolution instead of the PR-review framing.

Max please create the jira.

https://issues.apache.org/jira/browse/SPARK-57416

dejankrak-db · 2026-06-11T01:32:11Z

LGTM, left minor comments.

MaxGekk

2 addressed, 0 remaining, 1 new. (newly introduced)
Both prior findings addressed cleanly — inShredded is forwarded at the convertField call site, and the new TimeTypeParquetOpsSuite pins the guard's accept/reject set. I also verified the removed BINARY+TIME(MICROS) test's rationale: parquet-column's Types builder does reject that combination ("%s can only annotate INT64").

Nits: 1 minor item (see inline comment).

MaxGekk · 2026-06-11T20:22:53Z

+   * For struct-backed types: returns a GroupType with sub-fields and a logical type annotation.
+   *
+   * @param fieldName the column/field name in the Parquet schema
+   * @param repetition REQUIRED, OPTIONAL, or REPEATED


The new inShredded parameter isn't documented, and its meaning (shredded Variant context) isn't obvious from the name:

Suggested change

* @param repetition REQUIRED, OPTIONAL, or REPEATED

* @param repetition REQUIRED, OPTIONAL, or REPEATED

* @param inShredded whether the field is nested within a shredded Variant schema

Added docs.

MaxGekk

1 addressed, 0 remaining, 0 new.
Clean polish round — the round-2 nit (@param inShredded doc) and both external reviewer comments (FQN cleanup in TimeTypeParquetOps; durable-reference rewording → SPARK-57416) are resolved with no logic change. No new findings.

Two items remain open but are already on the record and deferred by the author, so I'm not re-posting them: the ParquetTypeOps trait scaladoc still lists "Vectorized read" / "Filter pushdown" among concerns it covers, but the trait implements neither yet; and a couple of PR-description bullets (ParquetWriteSupport companion-utility extraction, ParquetSchemaConverter read-path reverse lookup) don't match the diff. Both are from the 2026-06-09 thread — worth folding into the description pass before merge.

cloud-fan

0 blocking, 2 non-blocking, 1 nit.
Clean, well-scoped refactoring that faithfully follows the established TypeOps framework-first pattern; flag ON/OFF behavioral equivalence verified at all 6 wired integration sites. The non-blocking findings are all extension-surface ergonomics that affect the next (struct-backed) type rather than this PR — a natural fit for follow-up #3.

Design / architecture (1)

ParquetTypeOps.scala:81: struct-backed extension surface is asymmetric — convertToParquetType lacks the sub-field recursion hook that makeWriter (callback) and the extended newConverter (schemaConverter) carry, and the abstract newConverter is the unsafe simple overload — see inline

Suggestions (1)

ParquetTypeOps.scala:43: trait scaladoc overstates the implemented surface (lists vectorized/filter methods that don't exist); isBatchReadSupported precondition — see inline

Nits: 1 minor item (see inline comments).

Verification

Verified flag ON vs OFF behavioral equivalence for TimeType at all 6 integration sites by tracing the framework path against each extracted *Default: schema conversion, value write, row read, supportDataType, isBatchReadSupported, and schema clipping all produce identical results, and each *Default recurses for nested fields through the framework-first public method so nested dispatch stays consistent. The one intended difference is the read-path guard (INT64 TIME(MICROS, isAdjustedToUTC=true) rejected ON, accepted OFF) — documented and tracked by SPARK-57416.

cloud-fan · 2026-06-16T22:21:22Z

+   * @param inShredded whether the field is nested within a shredded Variant schema
+   * @return the Parquet Type for this DataType
+   */
+  def convertToParquetType(


The struct-backed paths pass framework-recursion context inconsistently: makeWriter gets a makeFieldWriter: DataType => ValueWriter callback and the extended newConverter gets schemaConverter, but convertToParquetType gets only (fieldName, repetition, inShredded). The legacy struct case recurses via convertField(f, ...), which carries SparkToParquetSchemaConverter config (timestamp output type, legacy format, field-id) — but this method is called on the ops instance, which has no converter reference, so a struct-backed type would have to hand-build its sub-schema and lose that config. Suggest a minimal StructField => Type callback here (matching makeWriter's style) for symmetry.

Relatedly, since the abstract newConverter is the simple 2-arg overload that the scaladoc says is wrong for struct-backed types, making the extended overload the sole abstract method (per @MaxGekk's thread above) is the same shape of fix — both worth settling alongside the struct-backed reference type in follow-up #3.

Agreed on both - the StructField => Type recursion callback on convertToParquetType (to carry SparkToParquetSchemaConverter config for sub-fields, matching makeWriter's style) and making the extended newConverter the sole abstract method. As you noted, both are best settled against a real struct-backed reference type so the callback signature is validated by an actual consumer rather than added speculatively, so I'm deferring them to follow-up #3 (the struct-backed reference type) alongside Max's same point. Left the current primitive surface as-is for this PR.

cloud-fan · 2026-06-16T22:21:22Z

+ *   - Schema conversion: Spark DataType <-> Parquet schema type
+ *   - Value write: writing values to Parquet RecordConsumer
+ *   - Row-based read: creating Parquet converters for reading into InternalRow
+ *   - Vectorized read: creating batch updaters for columnar reading


The class scaladoc says the trait "covers all Parquet concerns" and lists "Vectorized read: creating batch updaters" and "Filter pushdown: creating Parquet filter predicates", but the trait exposes neither — only the isBatchReadSupported capability gate. Since the PR description is explicit that those are deferred, consider rewording to mark them as not-yet-on-the-trait.

Relatedly, isBatchReadSupported returning true only works for types the legacy Java vectorized path already handles (there is no framework vectorized hook yet) — worth a one-line note in its scaladoc so a future type doesn't return true and route into a factory that doesn't know it. TimeType is safe for exactly that reason.

Done. Reworded the class scaladoc to drop the "covers all Parquet concerns" framing and the Vectorized-read Filter-pushdown bullets

cloud-fan · 2026-06-16T22:21:22Z

+private[parquet] trait ParquetTypeOps extends Serializable {
+
+  /** The DataType this Ops instance handles. */
+  def dataType: DataType


This abstract member doesn't appear to have a consumer — dispatch keys off ParquetTypeOps(field.dataType), not the ops instance's dataType. It's mirrored from TypeOps, but there the reference impl gets it from a base class; here every implementer must write override def dataType for no reader. Consider dropping it (or giving it a default) to keep the required surface minimal.

Done - removed the abstract dataType from the trait and the override def dataType from TimeTypeParquetOps.

Resolves a silent semantic conflict from SPARK-57372 (removal of the Types Framework feature flag): master deleted spark.sql.types.framework.enabled, so the gate in ParquetTypeOps.apply (if (!SQLConf.get.typesFrameworkEnabled) return None) is removed and the framework now dispatches unconditionally, matching how SPARK-57372 adapted TypeOps/TypeApiOps/ConnectTypeOps. SQLConf import retained (used as a parameter type). Co-authored-by: Isaac

davidm-db force-pushed the davidm-db/types_framework_3a branch from 2d1d82b to e788aef Compare April 13, 2026 17:39

initial commit - phase 3a

eb12469

davidm-db force-pushed the davidm-db/types_framework_3a branch from e788aef to eb12469 Compare April 13, 2026 18:40

base changes

039c6e9

davidm-db marked this pull request as ready for review June 9, 2026 10:45

fix time type

e7335bb

MaxGekk reviewed Jun 9, 2026

View reviewed changes

Comment thread ...main/scala/org/apache/spark/sql/execution/datasources/parquet/types/ops/ParquetTypeOps.scala Outdated

Comment thread .../scala/org/apache/spark/sql/execution/datasources/parquet/types/ops/TimeTypeParquetOps.scala Outdated

stevomitric and others added 2 commits June 9, 2026 16:38

Update sql/core/src/main/scala/org/apache/spark/sql/execution/datasou…

e69cd17

…rces/parquet/types/ops/ParquetTypeOps.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>

Update sql/core/src/main/scala/org/apache/spark/sql/execution/datasou…

93de268

…rces/parquet/types/ops/TimeTypeParquetOps.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>

stevomitric approved these changes Jun 9, 2026

View reviewed changes

MaxGekk reviewed Jun 9, 2026

View reviewed changes

dejankrak-db reviewed Jun 11, 2026

View reviewed changes

dejankrak-db approved these changes Jun 11, 2026

View reviewed changes

MaxGekk reviewed Jun 11, 2026

View reviewed changes

stevomitric added 2 commits June 12, 2026 09:20

resolve comments

94a0629

add jira

c0c7a0d

MaxGekk reviewed Jun 15, 2026

View reviewed changes

cloud-fan reviewed Jun 16, 2026

View reviewed changes

resolve comments

f55fa6e

	def convertToParquetType(fieldName: String, repetition: Repetition): Type
	def convertToParquetType(fieldName: String, repetition: Repetition, inShredded: Boolean = false): Type

	* @param repetition REQUIRED, OPTIONAL, or REPEATED
	* @param repetition REQUIRED, OPTIONAL, or REPEATED
	* @param inShredded whether the field is nested within a shredded Variant schema

Conversation

davidm-db commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

stevomitric commented Jun 9, 2026

Uh oh!

stevomitric commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

MaxGekk commented Jun 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 9, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Design / architecture (1)

Suggestions (1)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevomitric commented Jun 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dejankrak-db commented Jun 11, 2026

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Design / architecture (1)

Suggestions (1)

Verification

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidm-db commented Apr 13, 2026 •

edited

Loading