Skip to content

feat: Support Spark expression days#3746

Open
0lai0 wants to merge 3 commits intoapache:mainfrom
0lai0:support_spark_days
Open

feat: Support Spark expression days#3746
0lai0 wants to merge 3 commits intoapache:mainfrom
0lai0:support_spark_days

Conversation

@0lai0
Copy link
Contributor

@0lai0 0lai0 commented Mar 20, 2026

Which issue does this PR close?

Closes #3124

Rationale for this change

Comet previously did not support the Spark Days expression. It is used internally by Spark for daily partitioning in Iceberg/Delta tables. Since Comet did not recognize this expression, queries involving Days would fall back to JVM execution.

What changes are included in this PR?

This change adds a Serde handler for Days that reuses the existing Cast protobuf and Rust implementation, requiring no new native code paths.

  • For DateType input: dates are internally stored as days since epoch, so a simple Cast(Date → Int) suffices (same approach as CometUnixDate).
  • For TimestampType input: uses a timezone-aware Cast(Timestamp → Date) via CometCast.castToProto (which respects the session timezone for correct date boundary determination), followed by Cast(Date → Int).

The Date → Int cast is constructed directly as protobuf (bypassing CometCast.castToProto) because Spark's isAlwaysCastToNull intercepts Date → Int in LEGACY mode and would incorrectly return null.

datetime.scala : Added CometDays handler

QueryPlanSerde.scala : Registered the handler in temporalExpressions map

CometTemporalExpressionSuite.scala : Added days test

How are these changes tested?

./mvnw test -pl spark -Dsuites="org.apache.comet.CometTemporalExpressionSuite" -Dtest="none" -DfailIfNoTests=false
./mvnw scalastyle:check -pl spark

| DateDiff | `datediff` | Yes | |
| DateFormat | `date_format` | Yes | Partial support. Only specific format patterns are supported. |
| DateSub | `date_sub` | Yes | |
| Days | `days` | Yes | V2 partition transform. Supports DateType and TimestampType inputs. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Days should appear after DatePart

Comment on lines +400 to +403
import org.apache.spark.sql.Column
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.catalyst.expressions.{Days, Literal}
import org.apache.spark.sql.functions.col
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it is unusual to have imports within the test

checkSparkAnswerAndOperator("SELECT unix_date(NULL)")
}

test("days") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there are many different tests within this one test function. I wonder if it would make sense to split them out into separate tests.

@andygrove
Copy link
Member

Thanks @0lai0. I took a first pass through and this is looking good so far. Could you take a look at the Spark 4 compilation issues?

@0lai0
Copy link
Contributor Author

0lai0 commented Mar 22, 2026

Thanks @andygrove for the review! All three comments have been addressed and also fixed the Spark 4.0 compilation failures.
The root cause was two removed APIs in Spark 4:

  • Column(Expression) constructor → replaced with getColumnFromExpression() shim
  • Column.expr property → replaced with UnresolvedAttribute to construct column references directly in test code

Both Spark 3.5 (tests pass) and Spark 4.0 (compilation verified) are working in local.
./mvnw test -pl spark \ -Dsuites="org.apache.comet.CometTemporalExpressionSuite" \ -Dtest="none" -DfailIfNoTests=false
./mvnw test-compile -Pspark-4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: days

2 participants