Docs updates - multithreading/process recommendations + strikethrough of disk serialization by codeananda · Pull Request #307 · Toblerity/rtree

codeananda · 2024-02-13T10:06:13Z

Given the multithreading/processing issues with rtree and libspatialindex, I thought it made sense to add this info to the README (and also the online docs).

Related discussions that inspired my updates:

hobu · 2024-02-13T14:28:14Z

I don't understand why we're adding docs to say "Disk serialization is broken" with a pointer to a PR that updates something that shouldn't break it instead of just fixing the thing.

codeananda · 2024-02-14T09:24:56Z

Is your issue that I've linked to the wrong PR/issue? Or more that we're updating the docs instead of fixing the problem? I can help with the former but not sure I know how to fix the latter.

hobu · 2024-02-14T14:45:04Z

Or more that we're updating the docs instead of fixing the problem?

Correct. Is there a ticket that clearly demonstrates the issue? I don't think we can categorically say disk serialization doesn't work. Maybe it doesn't work for you and your situation, but we would be flooded with tickets if it was broken more generally.

adamjstewart · 2024-02-14T15:48:19Z

I know pickling doesn't work (which means multiprocessing doesn't work on macOS/Windows): #87

hobu · 2024-02-14T18:46:39Z

I know pickling doesn't work

pickling isn't the same as "supports disk serialization". It's always been possible to use Rtree to store whatever data you wanted to with an index entry. That it's not conveniently pickled python objects is orthogonal.

codeananda · 2024-02-15T09:51:37Z

pickling isn't the same as "supports disk serialization"

I didn't know this.

The docs give the impression that pickling is synonymous with disk serialisation. Indeed, so does the code.

The Serializing your index to a file section doesn't actually tell you how to serialize to a file, but rather read an already created file from disk.
The only section in performance relating to disk serialization recommends using cpickle.
The Index class has dumps and loads methods - both one-liners that use pickle.loads/dumps
Index._serialize calls self.dumps on the first line.

I genuinely thought the only way to serialize rtree objects to disk was pickle.

Or am I not understanding the meaning of "disk serialization"?

At the very least, we need to add a snippet under Serializing your index to a file that shows how you write to disk.

codeananda · 2024-03-05T10:32:55Z

@hobu @mwtoews any update?

FreddieWitherden · 2024-11-22T13:29:32Z

At the very least, we need to add a snippet under Serializing your index to a file that shows how you write to disk.

I believe you can pass None for the data and still have an R-tree stored on disk. This is probably what most people think of in terms of disk serialization. Of course, here, you are on your own in terms of associating Python objects with each entry (just going by their integer id's). This, I believe, is a better option and likely more efficient (and closer to what databases do: indices store pointers to the data rather than the data itself). Keeps your index small and enables you to change your data more freely and have multiple pointers to the same data block.

codeananda added 4 commits February 13, 2024 10:02

Added strikethrough to Disk serialization

e0e956e

Added multithread/process explainer + recommendations to get around them

94905b9

Added same multithread/process updates to index.rst

0f98826

Added disk serialization is broken as of Jan 2024 message to index.rst

d8b19a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs updates - multithreading/process recommendations + strikethrough of disk serialization#307

Docs updates - multithreading/process recommendations + strikethrough of disk serialization#307
codeananda wants to merge 4 commits intoToblerity:mainfrom
codeananda:docs_updates

codeananda commented Feb 13, 2024

Uh oh!

hobu commented Feb 13, 2024

Uh oh!

codeananda commented Feb 14, 2024

Uh oh!

hobu commented Feb 14, 2024

Uh oh!

adamjstewart commented Feb 14, 2024

Uh oh!

hobu commented Feb 14, 2024

Uh oh!

codeananda commented Feb 15, 2024

Uh oh!

codeananda commented Mar 5, 2024

Uh oh!

FreddieWitherden commented Nov 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

codeananda commented Feb 13, 2024

Uh oh!

hobu commented Feb 13, 2024

Uh oh!

codeananda commented Feb 14, 2024

Uh oh!

hobu commented Feb 14, 2024

Uh oh!

adamjstewart commented Feb 14, 2024

Uh oh!

hobu commented Feb 14, 2024

Uh oh!

codeananda commented Feb 15, 2024

Uh oh!

codeananda commented Mar 5, 2024

Uh oh!

FreddieWitherden commented Nov 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants