fix(schema): permit opt-in timestamp-precision evolution#19029
Conversation
2528a56 to
30810d2
Compare
Adds a write config `hoodie.write.schema.allow.timestamp.precision.evolution` (default false) that, when true, lets the internal-schema reconcile path correct the logical type of a column between timestamp-millis and timestamp-micros (and between the local-timestamp variants), and attach a missing local-timestamp logical type on top of a bare long. Default false preserves the existing strict rejection so no caller sees a behavior change. The non-reconcile write path was already lenient via Avro reader/writer compatibility (both logical types share the same Avro long primitive). The internal-schema reconcile path, triggered when `hoodie.write.set.null.for.missing.columns=true`, instead rejected the logical-type correction through `SchemaChangeUtils.isTypeUpdateAllow`. This closes the parity gap and enables forward-fixing tables that earlier versions persisted with a timestamp-micros logical type but timestamp-millis values, or that dropped the local-timestamp logical type entirely and stored the column as bare long. Threaded through SchemaChangeUtils -> TableChanges.ColumnUpdateChange -> AvroSchemaEvolutionUtils.reconcileSchema, with HoodieSchemaUtils, BaseHoodieWriteClient, HoodieMergeHelper, and FileGroupReaderBasedMergeHandle reading the config from the write properties. Tests: - TestAvroSchemaEvolutionUtils.testReconcileSchemaTimestampPrecisionEvolution covers default-strict reject and opt-in permit for all three shapes (timestamp precision swap, local-timestamp precision swap, long -> local-timestamp logical-type attach). - testCOWLogicalRepair / testMORLogicalRepair parameterize on both setNullForMissingColumns and allowTimestampPrecisionEvolution; positive variants exercise the gated repair path on v6/v8/CURRENT fixtures; a negative variant asserts SchemaCompatibilityException when the reconcile path is on with the gate closed.
30810d2 to
dc16a7a
Compare
…s/SchemaChangeUtils.java
| .defaultValue(false) | ||
| .markAdvanced() | ||
| .sinceVersion("1.3.0") | ||
| .withDocumentation("Controls whether schema evolution may change a column between timestamp-millis and " |
There was a problem hiding this comment.
This omits the third behavior the flag gates: attaching a logical type to a bare long (long -> local-timestamp-millis/micros), the logical-type-loss repair the PR description calls a primary motivation. A user whose 0.x table stored the column as bare long would not learn from "precision-only evolution between these logical types" that this flag applies. Suggest documenting the long -> local-timestamp attach case explicitly.
There was a problem hiding this comment.
long -> local-timestamp is backward compatible for readers. The logical type local-timestamp is now supported on master, so we can keep this part out for simplicity.
|
Took a pass. Logic looks correct: the gated transitions are exactly the same-zone precision relabels plus long-to-local-timestamp, cross-zone stays rejected, and
Left a couple smaller things as inline notes on the diff. |
voonhous
left a comment
There was a problem hiding this comment.
Inlined the specifics from my summary comment so they're easy to find in the diff.
| .defaultValue(false) | ||
| .markAdvanced() | ||
| .sinceVersion("1.3.0") | ||
| .withDocumentation("Controls whether schema evolution may change a column between timestamp-millis and " |
There was a problem hiding this comment.
This relabels the logical type but never rescales the stored longs. Worth spelling that out in the doc: values aren't converted, it's a one-time migration aid, and enabling it on a healthy timestamp-micros table where an incoming batch declares millis will silently read every value 1000x off. Also worth noting correct reads of existing files depend on the #14161 read repair.
| return isDecimalFixedUpdateAllowInternal(src, dst); | ||
| case STRING: | ||
| return dst == Types.DateType.get() || dst.typeId() == Type.TypeID.DECIMAL || dst.typeId() == Type.TypeID.DECIMAL_FIXED || dst == Types.BinaryType.get(); | ||
| case TIMESTAMP: |
There was a problem hiding this comment.
The gate is bidirectional, but the read-side repair only recovers toward millis (and long-to-local). The reverse (toward micros) has no read-side recovery, so old files keep the stale label and you get mixed-precision reads. Since the corruption is always "values are really millis," could we drop the micros directions and only allow micros-to-millis, local-micros-to-local-millis, and long-to-local? A one-line "relabel only, longs not rescaled" comment here would help too.
| } | ||
|
|
||
| @Test | ||
| public void testReconcileSchemaTimestampPrecisionEvolution() { |
There was a problem hiding this comment.
Could we add a gate-open assertThrows that cross-zone (UTC vs local) is still rejected, e.g. timestamp-micros vs local-timestamp-micros? That's the key invariant of the feature and nothing pins it right now.
|
|
||
| public static ColumnUpdateChange get(InternalSchema schema, boolean caseSensitive) { | ||
| return new ColumnUpdateChange(schema, caseSensitive); | ||
| public static ColumnUpdateChange get(InternalSchema schema, boolean caseSensitive, boolean allowTimestampPrecisionEvolution) { |
There was a problem hiding this comment.
Two small things here: (a) the description says pre-existing overloads were kept as delegates, but this get(schema, caseSensitive, ...), isTypeUpdateAllow, and reconcileSchema actually changed signature in place - no real breakage since they're all internal.schema.* with no external callers, just worth fixing the wording. (b) the 1-arg get(schema) just below stays hardcoded false, so explicit ALTER TABLE ... ALTER COLUMN TYPE won't honor the flag. Intentional? Fine either way, just undocumented.
|
Two more correctness things after a closer look, both about the fix being narrower/less durable than it first looks:
Lower priority, didn't verify: column stats / data-skipping bounds for the relabeled column were computed under the old logical type - worth a sanity check that pruning stays correct after the relabel. |
Describe the issue this Pull Request addresses
Earlier Hudi versions mishandled long-backed timestamp logical types in
AvroInternalSchemaConverter:timestamp-millisandtimestamp-microsboth collapsed into a single internalTimestampTypeand were always re-emitted astimestampMicros()on serialize. A source schema declaringtimestamp-millisgot persisted in the table with the wrongtimestamp-microslogical type, while the underlyinglongvalues written to parquet remained epoch-millis. Pure logical-type drift.local-timestamp-millisandlocal-timestamp-microshad no branch at all. They fell through to the bareLongType, and the logical type was dropped from the table schema entirely. Logical-type loss.In both cases the parquet values are correct; only the logical type on the field is wrong. Current converters recognize all four logical types as distinct, so the writer schema now declares the correct logical type. On every subsequent write the reconcile path compares writer schema against the persisted table schema, finds the logical-type mismatch, and rejects it.
With
hoodie.write.set.null.for.missing.columns=falsethe table schema already self-repairs on the next commit:HoodieSchemaUtils.deduceWriterSchemaskipsreconcileSchemaentirely and letsAvroSchemaCompatibility.checkReaderWriterCompatibilityvalidate. That check is logical-type-blind (both timestamps arelongunderneath), so it accepts the corrected logical type from the writer schema and the next commit rewrites the table schema's logical type accordingly. No change is needed for this path.With
hoodie.write.set.null.for.missing.columns=truethe repair is blocked.HoodieSchemaUtils.deduceWriterSchemainstead callsAvroSchemaEvolutionUtils.reconcileSchema, which goes throughTableChanges.ColumnUpdateChange.updateColumnTypeandSchemaChangeUtils.isTypeUpdateAllow. That switch had no case forTIMESTAMPorTIMESTAMP_MILLIS, so any logical-type change fell intodefault: return falseand threwSchemaCompatibilityException. The reconcile path was strictly stricter than the non-reconcile path for the same scenario; this PR fixes only that one path.Complements the read-side repair from #14161, which handles parquet files carrying the wrong logical type transparently. This PR closes the write-side gap so the table schema itself can be brought into agreement with the writer schema even when
set.null.for.missing.columns=true.Summary and Changelog
Users gain a per-write opt-in to forward-fix tables whose persisted schema carries a wrong or missing timestamp logical type, by allowing the internal-schema reconcile path to update the column's logical type to match the writer schema. The non-reconcile path (
set.null.for.missing.columns=false) already repaired the logical type implicitly via the Avro reader/writer compatibility check; this PR brings the reconcile path to parity. Default behavior is unchanged.Changes:
hoodie.write.schema.allow.timestamp.precision.evolutiononHoodieCommonConfig(defaultfalse,sinceVersion("1.3.0")).SchemaChangeUtils.isTypeUpdateAllowgains aboolean allowTimestampPrecisionEvolutionparameter. Whentrue, the switch permits:timestamp-millis ↔ timestamp-micros(logical-type drift case, both directions)local-timestamp-millis ↔ local-timestamp-micros(precision swap among the recognized variants)long → local-timestamp-millis/long → local-timestamp-micros(logical-type loss case, attach the missing logical type)AvroSchemaEvolutionUtils.reconcileSchemagains an overload that threads the flag through toTableChanges.ColumnUpdateChange, which stores it on the change and passes it toisTypeUpdateAllow. Pre-existingreconcileSchema/ColumnUpdateChange.getoverloads kept as delegates.HoodieSchemaUtils.scala,BaseHoodieWriteClient,HoodieMergeHelper,FileGroupReaderBasedMergeHandle) read the config from the write properties and pass it through toreconcileSchema.TestAvroSchemaEvolutionUtils.testReconcileSchemaTimestampPrecisionEvolutionasserts default-strict reject and opt-in permit for all three permitted shapes.testCOWLogicalRepair/testMORLogicalRepairparameterize on bothsetNullForMissingColumnsandallowTimestampPrecisionEvolution; positive variants exercise the gated repair path on the v6/v8/CURRENT logical-repair fixtures from fix(ingest): Repair affected logical timestamp milli tables #14161; a negative variant assertsSchemaCompatibilityExceptionwhen the reconcile path is on with the gate closed.Impact
AvroSchemaEvolutionUtils.reconcileSchema, one new factory onTableChanges.ColumnUpdateChange.get. Pre-existing overloads kept as delegates, so no public-API breakage.Risk Level
low
The gate defaults off, so existing writers are unaffected. Test coverage adds positive variants on the existing logical-repair fixtures and a negative variant locking in the default-strict behavior.
Documentation Update
New config documented inline on
HoodieCommonConfig.ALLOW_TIMESTAMP_PRECISION_EVOLUTIONwithsinceVersion("1.3.0"). No website update needed.Contributor's checklist