Conversation
| | DateDiff | `datediff` | Yes | | | ||
| | DateFormat | `date_format` | Yes | Partial support. Only specific format patterns are supported. | | ||
| | DateSub | `date_sub` | Yes | | | ||
| | Days | `days` | Yes | V2 partition transform. Supports DateType and TimestampType inputs. | |
There was a problem hiding this comment.
Days should appear after DatePart
| import org.apache.spark.sql.Column | ||
| import org.apache.spark.sql.DataFrame | ||
| import org.apache.spark.sql.catalyst.expressions.{Days, Literal} | ||
| import org.apache.spark.sql.functions.col |
There was a problem hiding this comment.
nit: it is unusual to have imports within the test
| checkSparkAnswerAndOperator("SELECT unix_date(NULL)") | ||
| } | ||
|
|
||
| test("days") { |
There was a problem hiding this comment.
nit: there are many different tests within this one test function. I wonder if it would make sense to split them out into separate tests.
|
Thanks @0lai0. I took a first pass through and this is looking good so far. Could you take a look at the Spark 4 compilation issues? |
|
Thanks @andygrove for the review! All three comments have been addressed and also fixed the Spark 4.0 compilation failures.
Both Spark 3.5 (tests pass) and Spark 4.0 (compilation verified) are working in local. |
Which issue does this PR close?
Closes #3124
Rationale for this change
Comet previously did not support the Spark Days expression. It is used internally by Spark for daily partitioning in Iceberg/Delta tables. Since Comet did not recognize this expression, queries involving Days would fall back to JVM execution.
What changes are included in this PR?
This change adds a Serde handler for Days that reuses the existing Cast protobuf and Rust implementation, requiring no new native code paths.
DateTypeinput: dates are internally stored as days since epoch, so a simple Cast(Date → Int) suffices (same approach asCometUnixDate).TimestampTypeinput: uses a timezone-aware Cast(Timestamp → Date) viaCometCast.castToProto(which respects the session timezone for correct date boundary determination), followed by Cast(Date → Int).The
Date → Intcast is constructed directly as protobuf (bypassingCometCast.castToProto) because Spark'sisAlwaysCastToNullinterceptsDate → Intin LEGACY mode and would incorrectly return null.datetime.scala: Added CometDays handlerQueryPlanSerde.scala: Registered the handler in temporalExpressions mapCometTemporalExpressionSuite.scala: Added days testHow are these changes tested?
./mvnw test -pl spark -Dsuites="org.apache.comet.CometTemporalExpressionSuite" -Dtest="none" -DfailIfNoTests=false./mvnw scalastyle:check -pl spark