Pixie

pixie is a succinct data structures library.

Features

BitVector
- Data structure with 3.61% overhead supporting rank and select for 1 bits.
- Supports:
  - rank(i): number of set bits (1s) up to position i.
  - select(k): position of the k-th set bit.
  - Similar operations rank0/select0 for 0.
- Implementation mainly follows [1] with SIMD optimizations similar to [2]
- Optimized via AVX-512/AVX-2, for large binary sequences performance is I/O bounded.
RmMTree
- Implementation of a range min-max tree, it supports rank, select and excess-related operations allowing for a fast navigation in DFUDS/BP trees.

Requirements

C++20
CMake ≥ 3.18.

Build Instructions

git clone https://github.com/Malkovsky/pixie.git
cd pixie
cmake --preset release
cmake --build --preset release

Manual alternative:

mkdir -p build/release
cmake -B build/release -DCMAKE_BUILD_TYPE=Release
cmake --build build/release -j

Tests are enabled by default (PIXIE_TESTS=ON). Benchmarks are opt-in; enable with -DPIXIE_BENCHMARKS=ON or configure with the benchmarks-all preset. Use benchmarks-third-party for comparison backends such as sdsl-lite, and benchmarks-diagnostic for performance diagnostics (Release with debug info + performance counters support).

Running Tests

After building with presets, binaries are located in build/release.

BitVector

./build/release/unittests

RmM Tree

./build/release/test_rmm

Coverage

Configure a coverage build with GCC (benchmarks disabled):

cmake --preset coverage
cmake --build --preset coverage

Run tests and generate the gcov text report:

./scripts/coverage_report.sh

Running Benchmarks

Before running benchmarks, configure with presets:

cmake --preset benchmarks-all
cmake --build --preset release

For a RelWithDebInfo diagnostic build, use:

cmake --preset benchmarks-diagnostic
cmake --build --preset release

BitVector

Benchmarks are random 50/50 0-1 bitvectors up to $2^{34}$ bits.

./build/release/benchmarks

Write JSON and plot size-scaled benchmark curves with a log-scaled x-axis:

./build/release/benchmarks --benchmark_out=bitvector_bench.json --benchmark_out_format=json
python3 scripts/plot_size_benchmarks.py bitvector_bench.json -o graphs/bitvector_size.png --size-key n

Excess Positions

./build/release/excess_positions_benchmarks --benchmark_out=excess_positions.json --benchmark_out_format=json
python3 scripts/excess_benchmark_table.py excess_positions.json -o src/docs/excess_positions_benchmark_results.md

Generated benchmark documentation can be written to src/docs/benchmark_results.md; the documentation pipeline does not run benchmarks.

RmM Tree

./build/release/bench_rmm

For focused runs, bench_rmm accepts --ops with a comma-separated operation list. The benchmark harness only builds the query pools needed by the selected operations, so subset runs avoid much of the setup cost:

./build/release/bench_rmm --ops=rank1,select1 --benchmark_out=rmm_rank_select.json --benchmark_out_format=json

By default, RmM benchmarks step through sizes by powers of two. Use --per_octave=<n> for finer sampling between adjacent powers of two, or --explicit_sizes=<csv> for an exact size list.

Google Benchmark filters are also used to limit RmM setup when --ops is not provided:

./build/release/bench_rmm --benchmark_filter='^rank1$' --benchmark_out=rmm_rank1.json --benchmark_out_format=json

For comparison with range min-max tree implementation from sdsl-lite, use the third-party benchmark preset. This defines SDSL_SUPPORT and builds bench_rmm_sdsl:

cmake --preset benchmarks-third-party
cmake --build --preset benchmarks-third-party
sudo cpupower frequency-set --governor performance
./build/release-third-party/bench_rmm_sdsl --benchmark_out=rmm_bench_sdsl.json

For visualization, write the JSON output to a file using --benchmark_out=<file> (e.g. ./build/release/bench_rmm --benchmark_out=rmm_bench.json) and plot it with scripts/plot_rmm.py (add --sdsl-json rmm_bench_sdsl.json for per-operation sdsl-lite comparison plots). For size-scaled tree plots, use:

python3 scripts/plot_size_benchmarks.py rmm_bench.json -o graphs/rmm_size.png --size-key N

Example Usage

#include <pixie/bitvector.h>
#include <vector>
#include <iostream>

using namespace pixie;

int main() {
    std::vector<uint64_t> bits = {0b101101}; // 6 bits
    BitVector bv(bits, 6);

    std::cout << "bv: " << bv.to_string() << "\n";     // "101101"
    std::cout << "rank(4): " << bv.rank(4) << "\n";    // number of ones in first 4 bits
    std::cout << "select(2): " << bv.select(2) << "\n"; // position of 2nd one-bit
}

#include <pixie/rmm_tree.h>
#include <cstdint>
#include <iostream>
#include <string>
#include <vector>

using namespace pixie;

int main() {
    // root
    // ├─ A
    // │  ├─ a1
    // │  └─ a2
    // ├─ B
    // └─ C
    //    └─ c1
    std::string bits = "11101001011000";
    std::vector<std::uint64_t> words((bits.size() + 63) / 64);
    for (std::size_t i = 0; i < bits.size(); ++i) {
        if (bits[i] == '1') {
            words[i / 64] |= std::uint64_t{1} << (i % 64);
        }
    }

    // RmMTree is non-owning: keep words alive and immutable while using t.
    RmMTree t(words, bits.size());

    std::cout << "close(1): " << t.close(1) << "\n";     // expected 6 (A)
    std::cout << "open(3): " << t.open(3) << "\n";       // expected 2 (a1)
    std::cout << "enclose(1): " << t.enclose(1) << "\n"; // expected 0 (root)
}

References

[1] Laws et al., SPIDER: Improved Succinct Rank and Select Performance SPIDER
[2] Kurpicz, Engineering compact data structures for rank and select queries on bit vectors pasta-toolbox/bit_vector

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
academic		academic
agentic/cpp		agentic/cpp
include		include
report		report
scripts		scripts
src		src
third_party/sdsl-lite		third_party/sdsl-lite
.clang-format		.clang-format
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
README.md		README.md
REPORT.md		REPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pixie

Features

Requirements

Build Instructions

Running Tests

BitVector

RmM Tree

Coverage

Running Benchmarks

BitVector

Excess Positions

RmM Tree

Example Usage

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pixie

Features

Requirements

Build Instructions

Running Tests

BitVector

RmM Tree

Coverage

Running Benchmarks

BitVector

Excess Positions

RmM Tree

Example Usage

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages