feat: Add PR review skill for Comet expression reviews by andygrove · Pull Request #3711 · apache/datafusion-comet

andygrove · 2026-03-16T16:22:40Z

Which issue does this PR close?

N/A

Rationale for this change

We have a large backlog of PRs. Some are from months ago. I would like to make it easier for maintainers to review these PRs.

What changes are included in this PR?

Add a Claude Code skill that provides structured guidance for reviewing Comet pull requests. Note that it does not automate reviews. It provides guidance for a human reviewer.

The skill checks Spark compatibility by reading Spark source, verifies performance benchmarks, validates tests and documentation, and reviews implementation against project guidelines.

Rendered Version

How are these changes tested?

I have been using a version of this skill for a while now and I have found that it saves me a lot of time.

Here is an example:

/review-comet-pr 3710

  Here's my review:

  ---
  PR Summary

  This PR fixes a bug where casting a string to TimestampNTZ incorrectly produced an Arrow array with UTC timezone metadata. The fix splits the Timestamp(_, _) catch-all pattern in the Rust cast code
  into separate Timestamp(_, Some(_)) and Timestamp(_, None) arms, and extends the cast_utf8_to_timestamp! macro to conditionally apply timezone metadata. The Scala side adds TimestampNTZType support
  levels in CometCast.

  CI Status

  All CI checks pass across all Spark versions (3.4, 3.5, 4.0) and all test categories (expressions, exec, fuzz, sql, etc.).

  Findings

  Spark Compatibility

  The approach is semantically correct. Spark's stringToTimestampWithoutTimeZone parses the string into local date/time components and converts to epoch micros using
  LocalDateTime.toEpochSecond(ZoneOffset.UTC). The PR achieves the same thing by passing &Utc to timestamp_parser, which constructs the datetime in UTC and computes epoch micros. Since UTC has zero
  offset, the results are identical.

  One nuance worth noting: Spark's stringToTimestampWithoutTimeZone can parse strings with explicit timezone offsets (like "2020-01-01T12:34:56+05:00") and simply ignores the timezone component. Comet's
   timestamp_parser regex patterns don't match timezone-offset strings at all, so those strings return null instead. This is a pre-existing limitation (not introduced by this PR) and is covered by the
  Incompatible("Not all valid formats are supported") support level.

  Rust Implementation

  The macro change is clean. Adding the $with_tz parameter and conditionally calling .with_timezone() is the right approach. Moving the timezone::Tz::from_str(timezone_str).unwrap() inside the Some(_)
  arm avoids unnecessary parsing for the NTZ path.

  One thing to double check: the &Utc timezone passed for NTZ parsing. chrono::Utc implements TimeZone and has zero offset, so tz.with_ymd_and_hms(...) will create a UTC datetime. The epoch microseconds
   from .timestamp_micros() will then represent the naive datetime values. This is correct.

  Scala Side

  The CometCast.scala changes correctly add TimestampNTZType with appropriate support levels. The ANSI mode case is properly handled separately.

  One thing I noticed: for TimestampType, there's a non-UTC timezone guard (case DataTypes.TimestampType if timeZoneId.exists(tz => tz != "UTC") -> Incompatible). The NTZ path doesn't have this guard,
  which is correct since NTZ semantics are timezone-independent.

  Tests

  The Rust unit tests are solid. They verify:
  - The output data type has no timezone metadata (Timestamp(Microsecond, None))
  - Actual parsed values are correct
  - Invalid strings produce null
  - Both direct macro invocation and the cast_array path are tested

  However, there are no end-to-end SQL file tests. I see you already asked for this in your comment.

  Related Code

  There's an interesting comment in datetime.scala:296-298:
  // Note: TimestampNTZType is not supported because Comet incorrectly applies
  // timezone conversion to TimestampNTZ values.
  This PR fixes that underlying bug. CometUnixTimestamp could potentially be updated in a follow-up to support NTZ inputs now that the metadata issue is resolved.

  Suggested Review Comments

  1. native/spark-expr/src/conversion_funcs/string.rs (NTZ arm, around line 686):
  "Small nit: the comment says 'reuse timestamp_parser with Utc (identity timezone)' but it might be worth expanding this slightly to explain why Utc gives the right NTZ semantics. Something like: 'Use
  Utc as the timezone for parsing so the resulting epoch microseconds represent the naive local datetime values (no offset applied).' This makes it clearer for someone unfamiliar with the NTZ vs TZ
  distinction."
  2. spark/src/main/scala/org/apache/comet/expressions/CometCast.scala (around line 225):
  "Should these two new cases share the same Incompatible reason text as TimestampType? The reason \"Not all valid formats are supported\" is accurate, but it might also be worth mentioning the specific
   NTZ limitation, like \"Not all valid formats are supported. Timezone offsets in input strings are not handled.\" That said, keeping it consistent with TimestampType is also reasonable."
  3. End-to-end test (already requested): Your existing comment about adding a SQL file test and a test with explicit timezone offset in the input string covers the most important gap. The Rust unit
  tests are good but the end-to-end path through serde is what catches integration issues.
  4. Follow-up opportunity: The CometUnixTimestamp serde in datetime.scala:296 explicitly blocks TimestampNTZType with a comment referencing this exact bug. It might be worth filing a follow-up issue
  (or noting it in the PR description) so that CometUnixTimestamp can be updated to accept NTZ inputs now.

Add a Claude Code skill that provides structured guidance for reviewing Comet pull requests. The skill checks Spark compatibility by reading Spark source, verifies performance benchmarks, validates tests and documentation, and reviews implementation against project guidelines.

wForget

Thanks @andygrove

andygrove added 4 commits March 16, 2026 10:20

chore: exclude .claude directory from RAT license checks

18589be

prettier

744f161

chore: remove AI disclosure section from PR review skill

6fe69dc

wForget approved these changes Mar 17, 2026

View reviewed changes

mbutrovich merged commit 1fdc450 into apache:main Mar 17, 2026
61 checks passed

andygrove deleted the add-pr-review-skill branch March 18, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add PR review skill for Comet expression reviews#3711

feat: Add PR review skill for Comet expression reviews#3711
mbutrovich merged 4 commits intoapache:mainfrom
andygrove:add-pr-review-skill

andygrove commented Mar 16, 2026 •

edited

Loading

Uh oh!

wForget left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andygrove commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

wForget left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Mar 16, 2026 •

edited

Loading