Conversation
|
As of the latest commit, we can compress an index ( New: Old: The more interesting algorithms aren't implemented yet, but stay tuned. Also, the compression is concurrent and quite fast, Robust compresses in seconds. |
|
A quick update: I implemented precomputed scores (only as |
| ``` | ||
| Header := Version, Type, Encoding | ||
| Version := Major, Minor, Path | ||
| Type := ValueId, Count |
There was a problem hiding this comment.
I am a bit confused by Type. What are ValueId and Count?
There was a problem hiding this comment.
ValueId would be the type, and count would be how many of those, as in one, or pair or tuple, etc.
Actually, so far I implemented it like this: https://github.com/pisa-engine/pisa/pull/280/files#diff-2a007c99bc1af07f1fb150c293383559R71
Another approach would be to always have scalars in one file, and join multiple ones for tuples. But then we can't store arrays of undetermined length (say, positional index). But all of this is up for discussion.
| > The latter should be further discussed. | ||
|
|
||
| ``` | ||
| Posting File := Header, [Posting Block] |
There was a problem hiding this comment.
Is there one Header for each Posting Block?
There was a problem hiding this comment.
No, one header, followed by a list of blocks.
|
Quick tests on Clueweb09B show essentially the same results for BM25 ranked OR as before, while the average drops from |
I've been working on starting a draft of index format specification and some code examples.
Let's discuss!
include/pisa/v1test/test_v1.cpp