Skip to content

Support for personalized pagerank#20

Open
blokhinnv wants to merge 2 commits into
LadybugDB:mainfrom
blokhinnv:personalized-pagerank
Open

Support for personalized pagerank#20
blokhinnv wants to merge 2 commits into
LadybugDB:mainfrom
blokhinnv:personalized-pagerank

Conversation

@blokhinnv

Copy link
Copy Markdown

Summary

Adds a teleportationWeights optional parameter to page_rank in the algo extension.

PPR is useful in many applications around AI (Graph RAG, in particular), so its support would be nice for implementing such algorithms with LadybugDB as backend.

What changed

  • New optional parameter teleportationWeights on CALL page_rank(...).

  • Format: a STRUCT keyed by node table names; each entry is {keyProperty, weights}:

    • keyProperty : property used to match nodes to keys in weights;
    • weights : MAP<keyType, DOUBLE> .
  • Weights are normalized; nodes not present in the map get teleportation weight 0.

  • Bind-time validation:

    • non-negative weights, positive sum;
    • table and property exist;
    • map key type matches property type.
  • Tests in page_rank.test (including error cases).

Example

CALL page_rank('PK',
    teleportationWeights := {
        person: {keyProperty: 'fName', weights: map(['Alice', 'Bob'], [0.125, 0.125])}
    }
)
RETURN node.fName, rank
ORDER BY node.fName

Compatibility

Without teleportationWeights, behavior is unchanged (uniform teleportation).

@adsharma

Copy link
Copy Markdown
Contributor

@blokhinnv Thank you for the PR. Not opposed to it.

The direction we're taking is to have a streamlined path from ladybug's CSR storage into a CSR based arrow memory when the storage layout is carefully chosen (docs on how to hit the fast path coming). Once you do that 30-40x faster than kuzu or previous ladybug versions, you don't need to limit yourself to the small number of algorithms in the ALGO extension.

What do you think about making the same change to:

https://github.com/Ladybug-Memory/icebug/blob/main/networkit/cpp/centrality/PageRank.cpp

@blokhinnv

Copy link
Copy Markdown
Author

Actually, I first checked if Networkit has an implementation of PPR and was kinda surprised it has not. The idea looks reasonable but I recall when I first time tried to understand what icebug is I got confused a little bit.. I saw imports of networkit in the code which (if I got it correctly) was actually your fork but with the original name and I didn't manage to find out what is the purpose of it as that time (about two-three months ago)...

But since you have a major update of the codebase lately there is a chance for me to get it this time, will see..

@adsharma

Copy link
Copy Markdown
Contributor

@blokhinnv renaming networkit -> icebug would be similar to kuzu -> ladybug, but it was intrusive and would break too many things. So we added an alias. Instead of:

import networkit as nk
import icebug as ib # <--- use this

The purpose of the fork is very clear. Have you tried running pagerank on a large graph such as wikidata using networkit vs icebug?

https://huggingface.co/datasets/ladybugdb/wikidata-20260401/tree/main

See performance delta vs graphframes here:

https://github.com/Ladybug-Memory/icebug-graphframes-comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants