feat(risingwave): support tumble and hop window agg#11970
feat(risingwave): support tumble and hop window agg#11970litaxc wants to merge 4 commits intoibis-project:mainfrom
Conversation
|
Can you add tests for this similar to Alternatively, it could also be an option to centralize the window tests in |
|
I added tests for the |
deepyaman
left a comment
There was a problem hiding this comment.
Looks good overall, very much aligned to the Flink implementation (which is great). @cpcloud / @gforsyth / @NickCrews could one of you kick off the CI workflow?
| t = alltypes | ||
| expr = ( | ||
| t.window_by(t.timestamp_col) | ||
| .hop(size=ibis.interval(days=10), slide=ibis.interval(days=10)) |
There was a problem hiding this comment.
Maybe should use a different slide so that it's not essentially a hopping window? :) I'm sure it works, but even reviewing at a glance I just noticed the results are the same.
There was a problem hiding this comment.
I tried using size=20 days, but I couldn't quite understand the resulting window_start/end produced by RisingWave. So I changed back to use the same size and slide to ensure I would't introduce wrong results.
|
Thanks for your work here @litaxc! Looks like a good implementation. I would prefer these tests to be backend-agnostic so they run on all backends and we can ensure that all backends produce the same result. I see we already have tests in the flink and pyspark backend tests (grep for Can you please:
I would also like to have better visibility that the result is actually correct. Just testing the shape feels both inadequate and hard to grok. Either I would like to
I think either of these should be enough to give us enough visibility and confidence around #11970 (comment) |
|
@NickCrews I agree, but I'm happy to take this myself in a follow-up (unless you particularly want to @litaxc):
I was planning to take a look at the result shape, when get a chance, since that's a bit more concerning; sorry for the delay on that. |
|
I'm fine with the cleanup being in a followup as long as it's not me that has to do it :) that would be awesome if you would be willing to take it on @deepyaman!! If we get some more visibility/confidence around the pending inline comment so we are confident it is correct then I am happy to merge. Thanks |
|
One problem with merging all the temporal window tests into one file is that the same query produces different results for Flink, PySpark, and RisingWave.
(BTW the test data length is 7300) (EDIT: I verify the result with DuckDB using |
I agree this isn't ideal. If possible, it would be great to eventually smooth over these inconsistencies so that ibis's promise of portability actually is true. But for this first step, I just want the tests to be co-located so that it is obvious that they give different results. Later we can get around to making them consistent in a followup PR. In the meantime we can parameterize the expected_rows so that the tests pass. eg _nrows_expected_by_backend = {
"duckdb": 7300,
# etc
}
def test_tumble_window_by_grouped_agg(con):
expected_shape = (_nrows_expected_by_backend(con.name), 4)
# etcPS, I'm familiar at all with these streaming backends, if you have any insight on WHY they give different results, and ideas on how to normalize them, that would be awesome to jot down notes here for someone in the future. |
Description of changes
Implement the visit_WindowAggregate function as it is done in Flink.
Issues closed
Resolves #11969