Update limit behavior to use incrementing index (fix last_by_index aggregate)#2964
Conversation
7fc2d4f to
0b83a2b
Compare
bd63e26 to
da9fb69
Compare
Signed-off-by: Timothy Bess <timbessmail@gmail.com>
Signed-off-by: Timothy Bess <timbessmail@gmail.com>
394606b to
fd271ee
Compare
…freeing it with "free" which I believe worked in our case, but is UB I believe. Signed-off-by: Timothy Bess <timbessmail@gmail.com>
fd271ee to
b7a4879
Compare
texodus
left a comment
There was a problem hiding this comment.
Thanks for the PR! Looks good!
You forgot to mentions his PR also fixes CI (much appreciated!) ... which has been failing due to the GHA CI machines running out of HD space during the Python build.
Test coverage looks great and covers last_by_index aggregate behavior specifically.
Benchmarks for feature change to limit behavior wrt Table.update() are great too. This change does facially increase the allocation size with the new psp_old_key column (I think - there is a lot changed including the flatten behavior so I may be wrong here), but its great to see there are no observable impact within the bounds of our CPU time benchmarking. It is also a good reminder IMO we should implement heap benchmarks!
| break; | ||
|
|
||
| switch (op) { FOREACH_T_OP(X) } | ||
| #undef X |
| flattened_old_pkey_col->set_scalar(flattened_idx, old); | ||
| flattened_old_pkey_col->set_valid(flattened_idx, true); | ||
| LOG_DEBUG("DELETING OLD PKEY: " << old); | ||
| // erase(old); |
| t_uindex m_bidx; | ||
| t_uindex m_eidx; | ||
| t_uindex m_begin_idx; | ||
| t_uindex m_edge_idx; |
|
|
||
| // | pkey-1 | pkey-1 | pkey-2 | pkey-2 | ||
| // ^ Is this an edge? I think so. | ||
| // |-----------------| <- This is a span between those edges. |
| if (delete_encountered) { | ||
| d_pkey_col->push_back( | ||
| d_pkey_col->set_nth( | ||
| store_idx % limit, |
There was a problem hiding this comment.
It seems we're applying limit to the flattened input port table. Is this is because the size of this table is later assumed to be lower bound of the new master table size (or similar)? I think this is fine but (especially compared to the current state of the limit feature), but it has reminded me we likely have some other limit related abnormalities in specific aggregates, e.g. AGGTYPE_HIGH_WATER_MARK, which never decreases and thus would calculate differently if a single input update step was large than the limit, and the overwritten range removed the new high water mark value (I think).
| template <class FUNCTION> | ||
| void | ||
| parallel_for(int num_tasks, FUNCTION&& func) { | ||
| #undef PSP_PARALLEL_FOR |
| dst.define("CMAKE_BUILD_TYPE", profile.as_str()); | ||
| dst.define("ARROW_BUILD_EXAMPLES", "OFF"); | ||
| dst.define("RAPIDJSON_BUILD_EXAMPLES", "OFF"); | ||
| dst.define("ARROW_CXX_FLAGS_DEBUG", "-Wno-error"); |
There was a problem hiding this comment.
This is great! IMO we should move this into the CMakeLists.txt though so it applies to the wasm builds also (assuming it doesn't already).
| if (!check_version_gte(metadata.version, "3.4.3")) { | ||
| // Bug with old versions of perspective segfault when you delete | ||
| // a table with pending updates. | ||
| await table.size(); |
There was a problem hiding this comment.
Nice catch, this is good to fix (fix mentioned in the PR summary)
| return cmp(cmpval, 0); | ||
| } // namespace perspective | ||
| } | ||
| ; |
limit behavior to use incrementing index (fix last_by_index aggregate)
Signed-off-by: Andrew Stein <steinlink@gmail.com>
Signed-off-by: Andrew Stein <steinlink@gmail.com>
This PR decouples primary keys from the limit functionality of perspective tables, allowing us to handle aggregates like "last by index" a bit better when users have a limit and want to rely on insert order to window their data.
Along the way I also fixed some memory safety issues where we had called
newto allocate some memory and free'd it withfreerather thandelete. This wasn't breaking anything at the moment, but may have been relying on undefined behavior accidentally.The new benchmarks also triggered a bug in our polling implementation where if a table was dirtied and deleted before the
pollcall, it would segfault the poll call because we hadn't removed it from the dirty map in the server, so I fixed that as well.