Skip to content

Add LRU to improve the memory usage of the indexer#2050

Open
Lucccyo wants to merge 6 commits intoocaml:mainfrom
Lucccyo:lru_cache_index
Open

Add LRU to improve the memory usage of the indexer#2050
Lucccyo wants to merge 6 commits intoocaml:mainfrom
Lucccyo:lru_cache_index

Conversation

@Lucccyo
Copy link
Copy Markdown
Contributor

@Lucccyo Lucccyo commented Mar 24, 2026

The index links are stored in a Hashtable, which grows without bound as we add new entries. This is using a lot of memory.
This PR replaces the use of a Hashtable with an LRU cache to bound memory usage, keeping only the most recently used indices and evicting the least recently used entry when the capacity is reached.

@voodoos
Copy link
Copy Markdown
Collaborator

voodoos commented Mar 24, 2026

(I did not review the tricky pointer arithmetic of the dllist yet)

| In_memory v -> write_child lnk schema v size ~placeholders ~restore
| On_disk _ ->
write_child lnk schema (fetch lnk) size ~placeholders ~restore)
| In_memory v | In_memory_c (v, _) ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if your tests ran into the In_memory_c case here? The current code is correct, but if this does happen in practice then In_memory_c would benefit from being translated into a pointed-index On_disk_ptr directly (since an In_memory_c is an in-memory value that was read from an existing file)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My test doesn't hit the In_memory_c case here.

in
List.iter
(fun (Cached (link, loc, store, schema)) ->
Cache.remove store.cache loc;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remove doesn't look entirely correct, since we are loosing sharing as we never reintroduce lnk into the cache later if it turns out that offset was still useful (so we'll allocate new links for that same value). Does your benchmarks show that this remove is necessary? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree.
On the benchmarks, removing the Cache.remove multiplies the max memory usage by around 1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants