FIX-#7405: Internally sort indices for loc/iloc set#7440
FIX-#7405: Internally sort indices for loc/iloc set#7440sfc-gh-joshi merged 6 commits intomodin-project:mainfrom
Conversation
…oc sets Signed-off-by: Jonathan Shi <jhshi07@gmail.com>
fcbcf7f to
ef752c2
Compare
|
@anmyachev would you be able to take a look, since you did most of the implemntation for |
Hi @noloerino, you write that the problem is that sorting occurs internally, and at the same time you add another one internal sorting, as I understand it. Why does this solve the problem? Could it be that the external and internal index (you need to manually look at the internal partitions for this) simply do not match? This was a common problem due to optimizations, as far as I remember. |
In testing, the problem I identified is that the index is sorted before determining which indices belong to which partitions. For example, if you were to do Removing the index sort would be difficult because it seems like the partition-finding logic in |
I see.
It's true. It seems to me that it would be more performant to return the previous order in |
What do these changes do?
Refactors internal logic for the PandasQueryCompiler
write_itemsfunction to sort the items array and row/column indexers before passing them to partitions. This fixes the linked bug, which occurs when the internal DataFrame representation sorts indices before performing the write operation.I considered removing the sort in PandasDataFrame, but this created difficulties in determining which data went to which partitions. Furthermore, it was difficult to cleanly ensure that both the items and index respected the sort order at that layer.
This change will cause overhead in some indexing cases, though this should be minimal under the assumption that
len(items) << len(df). The code also includes logic to avoid sorting the items array if the index is an item like a range/RangeIndex/slice withstep > 0that is known to already be sorted.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.pyblack --check modin/ asv_bench/benchmarks scripts/doc_checker.pygit commit -sdocs/development/architecture.rstis up-to-date