Skip to content

Sparips.jl integration #183

@bbrehm

Description

@bbrehm

You noted

Sparips.jl this is a preprocessor that comes with a different wrapper of Ripser. The preprocessor allows you to compute persistent homology of very large datasets. Integrating it with Ripserer is on my TODO list.

I think that Ripserer.jl is currently the best-maintained and engineered TDA package in the julia world.

Given that I never came around to registering Sparips.jl and have not touched the code for 8 years (time flies), I think the ideal way forward would be to integrate Sparips.jl into Ripserer and archive Sparips.jl.

For that, I'd go over the code again, clean it up, and check it in here. Ideally with an API that works well in Ripserer context, but also for stand-alone use, i.e. for people who want to use alternative solvers. For such people, the additional dependencies of Ripserer are not too heavy.


Why would people use alternative solvers?

First, because of multi-parameter persistence like e.g. https://github.com/DavidLapous/multipers/tree/main . The sparips algorithm naturally supports settings where you have both scales (Rips-style) and a height function f (often derived from density of sample points).

Second, because homology solvers can make use of the even smaller zigzag filtration generated by sparips. I see no way of making that work profitably with a cohomology solver, so Ripserer needs to compute over the (still sparse) full filtration.

Third, because sparips makes the distance matrix sparse enough to handle complexes with a lot of points. This will quickly overwhelm Ripser / Ripserer with respect to simplex indices (UInt128 or even UInt512 are a band-aid at best).

Fourth, because the Ripser oblivious rank algorithm has exponential runtime for funny fractals like the smale solenoid. You don't see that issue without sparsification: The exponential runtime only kicks in if your persistence diagram spans multiple orders of magnitude, and computing over multiple orders of magnitude is infeasible without approximation and sparsification. I think we can put a band-aid on it, by caching cochains that were very expensive to produce (e.g. if we spent more than 1e7 steps reducing a column, then don't just store the pivot but also cache the final cochain, for reuse in future reductions).


What do you think @mtsch ?

Are you on the julialang slack, especially the TDA channel, for coordination?

Stuff that would also be needed to make this nice, probably in the companion package https://github.com/mtsch/PersistenceDiagrams.jl

  1. Plotting of persistence diagrams with error bars
  2. An additional mode to the bottleneck distance / matching part, to verify whether two persistence diagrams with error bars are compatible. That is important for testing! Sparips spits out a sparsified / approximate complex, together with error bounds; so it is a natural test to check that the original persistence diagram (without approximation) lies within the given error bounds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions