Skip to content

Adaptive Pauli-Lindbladian shim + adaptive-evolution demo#98

Draft
AlexSchuckert wants to merge 26 commits into
mainfrom
lindblad-shim
Draft

Adaptive Pauli-Lindbladian shim + adaptive-evolution demo#98
AlexSchuckert wants to merge 26 commits into
mainfrom
lindblad-shim

Conversation

@AlexSchuckert

Copy link
Copy Markdown
Collaborator

Summary

Adds a Rust shim for direct, Trotterless Heisenberg-picture time evolution of an observable under a generic Pauli-Lindbladian (Hermitian Pauli Hamiltonian H = Σ c_i P_i + Hermitian Pauli jump operators L_k with rates γ_k), plus a Python driver demonstrating an adaptive Pauli-basis integrator built on it.

LindbladSpec (Rust, crates/ppvm-python-native/src/lindblad.rs)

A precompiled Lindbladian exposing three primitives over bit-packed Pauli words:

  • action(p)L*(p) for a single Pauli string;
  • leakage(basis, coeffs[, protected]) — the off-basis component of L*(Σ c_j p_j) (drives adaptive basis growth);
  • generator(basis) — the Lindbladian restricted to basis as sparse COO triplets (wrapped into a scipy CSC matrix on the Python side).

Hot path: a fused per-byte commutator computing both the XOR product and its phase in one pass; per-qubit active-site index so only terms whose support meets supp(p) are visited; a DashMap action cache (the analogue of @lru_cache in the prototype); rayon-parallel basis loops. Numpy uint8 (0=I,1=X,2=Z,3=Y) I/O end-to-end, so the driver never builds per-row strings on the hot path.

ppvm.Lindbladian (Python wrapper, ppvm-python/src/ppvm/lindblad.py)

Thin wrapper with both ndarray-native (action_arr/leakage_arr/generator_arr) and string-keyed (action/leakage/generator) APIs.

Demo (ppvm-python/demo/lindblad_adaptive.py)

End-to-end adaptive integrator: at each step it measures the leakage out of the current basis, adds the leaked strings above add_tol, advances exp(dt · M) (exact in dt within the basis — no Trotter error) via scipy.sparse.linalg.expm_multiply, and prunes below drop_tol (a protected set is never dropped). --predictor_corrector 1 enriches the basis with the predictor's second-hop strings and redoes the step, lifting the dt-error from O(dt²) to O(dt³). Reports C_k(t) and a discarded-weight a-posteriori error monitor.

Packaging

import ppvm no longer requires SciPy: SciPy is imported lazily inside generator()/generator_arr() (the only methods that need it). numpy is declared as a runtime dependency (used by the wrapper).

Test plan

  • ppvm-python/test/test_lindblad.py cross-checks action/leakage/generator/protected-set against an independent dense Pauli-multiplication-table reference (XY+Z-dephasing and TFIM+X-dephasing); 4 tests pass.
  • cargo fmt --check and cargo clippy clean on ppvm-python-native.
  • ruff check clean on the wrapper and test.
  • uv lock --check consistent.
  • Demo runs end to end (plain and --predictor_corrector).

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 27, 2026 20:13

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an adjoint Pauli-Lindbladian “shim” (Rust + PyO3) and a Python-facing wrapper to enable direct Heisenberg-picture time evolution of observables in an adaptive Pauli-string basis, plus a demo and cross-checking tests.

Changes:

  • Introduces LindbladSpec (Rust) implementing action, leakage, and sparse generator over bit-packed Pauli words with caching + Rayon parallelism.
  • Adds ppvm.Lindbladian (Python) wrapper with both ndarray-native (*_arr) and string-keyed convenience APIs, with lazy SciPy import for generator construction.
  • Adds an adaptive-evolution demo and a new test suite cross-checking shim results against a dense reference implementation.

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ppvm-python/uv.lock Adds NumPy as a declared dependency in the lock.
ppvm-python/test/test_lindblad.py New tests validating action/leakage/generator vs a dense reference (uses SciPy).
ppvm-python/src/ppvm/lindblad.py New Python wrapper for the native Lindbladian shim (string + ndarray APIs; lazy SciPy import).
ppvm-python/src/ppvm/init.py Re-exports Lindbladian from the new module.
ppvm-python/pyproject.toml Declares NumPy as a runtime dependency for the wrapper.
ppvm-python/demo/lindblad_adaptive.py New end-to-end demo of adaptive basis growth + expm-based stepping.
crates/ppvm-python-native/src/lindblad.rs New Rust implementation of the Lindbladian primitives with caching and parallel basis loops.
crates/ppvm-python-native/src/lib.rs Exposes LindbladSpec from the PyO3 module.
crates/ppvm-python-native/Cargo.toml Adds native dependencies (dashmap/fxhash/numpy/rayon) needed by the shim.
Cargo.lock Locks new Rust dependencies introduced by the native shim.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ppvm-python/src/ppvm/lindblad.py Outdated
Comment on lines +37 to +41
def string_to_codes(s: str, n_qubits: int) -> np.ndarray:
"""Encode a Pauli string ``"IXYZ..."`` as a length-``n_qubits`` uint8 array."""
if len(s) != n_qubits:
raise ValueError(f"Pauli string {s!r} has length {len(s)} != n_qubits {n_qubits}")
return np.array([_PAULI_CODE[c] for c in s], dtype=np.uint8)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: _string_to_codes now strips _ and raises a ValueError naming the bad character and expected alphabet (matching the Rust parser), instead of letting a KeyError propagate.

Comment on lines +4 to +6
//! Rust shim for direct Pauli-Lindbladian time evolution on an adaptive
//! Pauli-string basis. See `xy-experiments/main_k_adaptive.py` for the
//! Python driver that owns the linear algebra and basis adaptation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: the stale xy-experiments/main_k_adaptive.py reference is gone; the in-tree driver is ppvm-python/demo/lindblad_adaptive.py.

Comment on lines +516 to +536
let local: Vec<FxHashMap<Word, f64>> = basis_words
.par_iter()
.zip(coeffs_view.par_iter())
.map(|(p, &c)| {
let mut m: FxHashMap<Word, f64> = FxHashMap::default();
let mut s1 = Vec::new();
let mut s2 = Vec::new();
accumulate_action(self, p, c, &mut m, &mut s1, &mut s2);
m
})
.collect();

let mut merged: FxHashMap<Word, f64> = FxHashMap::default();
for m in local {
for (k, v) in m {
if in_basis.contains_key(&k) || protected_set.contains_key(&k) {
continue;
}
*merged.entry(k).or_insert(0.0) += v;
}
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b07f75d: leakage/leakage_with_prune now fold into thread-local FxHashMaps and merge via Rayon reduce, so the Vec<HashMap> of per-element intermediates is no longer materialized. Keys and summed values are identical (FP reduction order can differ at ~1e-16, well within the test tolerances).

Comment on lines +573 to +578
let basis_words = decode_basis(&basis_view, self.n_qubits)?;
let index: FxHashMap<Word, u32> = basis_words
.iter()
.enumerate()
.map(|(i, w)| (*w, i as u32))
.collect();

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: the generator() binding (and the pc_step paths) now call assert_basis_unique, which returns a ValueError naming the duplicate row indices before any matrix is built.

Comment on lines 9 to 15
requires-python = ">=3.10"
dependencies = [
"bloqade-circuit>=0.14.1",
"kirin>=0.0.15",
"numpy>=1.26",
"ppvm-python-native",
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is valid concern, please address

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: scipy>=1.13 is now an explicit entry in the dev dependency group in ppvm-python/pyproject.toml (runtime ppvm stays scipy-free).

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://QuEraComputing.github.io/ppvm/pr-preview/pr-98/

Built to branch gh-pages at 2026-06-02 01:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@Roger-luo

Copy link
Copy Markdown
Collaborator

@copilot fix the license header for Alex :-)

@Roger-luo

Copy link
Copy Markdown
Collaborator

@AlexSchuckert wondering if we can have some information about how you verify the correctness of this to help reviewing?

Copilot AI review requested due to automatic review settings June 2, 2026 01:21
@AlexSchuckert AlexSchuckert removed the request for review from Copilot June 2, 2026 01:21

Copilot AI commented Jun 2, 2026

Copy link
Copy Markdown

@copilot fix the license header for Alex :-)

Fixed in commit 6209cb8: adjusted ppvm-python/src/ppvm/lindblad.py header formatting to include the required blank line after the SPDX lines so the license-header check passes.

Copilot AI requested a review from Roger-luo June 2, 2026 01:22
Comment thread crates/ppvm-lindblad/src/expm.rs Outdated
/// then iteratively apply a degree-`m` Horner-form Taylor polynomial
/// `s` times. Inside each Horner sweep we terminate early once two
/// successive Taylor terms have norm below `opts.tol · ‖F‖`.
pub fn expm_multiply(a: &CsrMatrix, t: f64, b: &[f64], opts: ExpmOpts) -> Vec<f64> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this exists in a rust crate already?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Rust crate provides expm_multiply in the matrix-free action form (Al-Mohy & Higham, scaling-and-squaring on A · v). The expm crate exists but only handles dense matrices via the Padé approximation — not useful for our sparse Heisenberg-picture generator. SciPy and Julia (KrylovKit / Expokit ports) both have it; we follow SciPy's expm_multiply algorithm and θ_m table directly. It's ~200 LOC of self-contained numerics, verified by expm_serial_matches_parallel and cross-checked against scipy.sparse.linalg.expm_multiply in test_pc_step_rust_matches_python_pc. Happy to swap if a maintained crate shows up.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: the matrix-exponential action is now provided by the external quspin-expm crate (QuSpin-rust), used matrix-free. The hand-rolled implementation has been removed; expm.rs is reduced to just the (m, s) selection table.

Comment thread crates/ppvm-lindblad/src/expm.rs Outdated
//! the SpMV path itself scales before debugging higher-level paths.

use ppvm_lindblad::CsrMatrix;
use std::time::Instant;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we actually turn this into benchmark like other crates instead of using std::time to measure the performance? it will be cleaner to use the same setup as other crates like those in ppvm-runtime

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: the examples/spmv_scaling.rs timing example has been removed -- the real-CSR SpMV it exercised no longer exists (evolution is matrix-free via quspin-expm).

Comment thread ppvm-python/demo/lindblad_adaptive.py Outdated

L_op.clear_cache() # action cache is per-k since each k starts a fresh basis

for nt in range(c.steps + 1):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this might work... but it would be nice to split these nested loops into functions to make things easier to read

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: the demo is now factored into helper functions -- build_xy_dephasing, init_mode, add_leakage_to_basis, prune_and_cap, and run_mode -- rather than inline nested loops.

Comment thread ppvm-python/demo/lindblad_adaptive.py Outdated
Comment thread ppvm-python/demo/lindblad_pc_scaling.py Outdated
Comment thread ppvm-python/src/ppvm/lindblad.py Outdated
Comment thread ppvm-python/test/test_lindblad.py Outdated
for k in set(got_a) | set(got_b):
assert got_a.get(k, 0.0) == got_b.get(k, 0.0), (
f"lincomb fast path mismatch at p={p!r}: {got_a} vs {got_b}"
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a 700 loc python file is a hard reject

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: the monolithic test_lindblad.py has been split into ppvm-python/test/lindblad/ (test_action_generator.py, test_adaptive_pc.py, test_non_hermitian_jumps.py, test_pc_step_rust.py, plus _helpers.py).

Comment on lines 9 to 15
requires-python = ">=3.10"
dependencies = [
"bloqade-circuit>=0.14.1",
"kirin>=0.0.15",
"numpy>=1.26",
"ppvm-python-native",
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is valid concern, please address

AlexSchuckert added a commit that referenced this pull request Jun 2, 2026
Correctness
- generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words
  at the PyO3 boundary with a clear ValueError. The underlying row-index
  map silently overwrote earlier rows on collision, producing a wrong
  sparse generator. ppvm-lindblad now factors the index-build with
  uniqueness debug_assert into `build_basis_index(...)`, called from both
  `generator` and `generator_csr`. New regression test
  `test_generator_rejects_duplicate_basis`.
- `_string_to_codes`: raise `ValueError` naming the bad character; strip
  `_` separators to match the Rust `parse_pauli_string` parser.

API surface
- Rename the four codec helpers (string_to_codes, codes_to_string,
  basis_to_codes, codes_to_basis) to `_`-prefixed module-private names.

Dependencies — drop scipy from runtime + tests
- Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals)
  instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap;
  see docstring.
- Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy
  eigendecomposition reference (`expm_mv_dense`).
- pyproject.toml: scipy removed from `dev`; moved to a new `demo` group
  (demos still use it for the reference matrix exponential).

Test layout
- Split the 723-line `test/test_lindblad.py` into `test/lindblad/`:
  * `_helpers.py` — shared Pauli-table reference, dense Liouvillian
    reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers
  * `test_action_generator.py` — Hermitian-Pauli action/leakage/generator
  * `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps
  * `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref
  * `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling
  Each test file now ≤ 121 lines.

Demos
- Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py`
  rewritten as jupytext percent-format notebooks with markdown narrative
  cells and end-of-notebook plots, replacing CLI arg parsing with a
  top-of-file parameter cell.
- `lindblad_adaptive.py`: split the monolithic per-step body into
  `init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`;
  dropped the unnecessary `target_arr = basis_arr.copy()` /
  `protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr
  in place — `vstack`/slice rebind).
- `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location`
  bypass that was working around `demo/stim.py` shadowing the real `stim`
  package on `python demo/foo.py` invocations.
- Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo
  that happened to share a name with the `stim` package).

Benches
- Convert `examples/spmv_scaling.rs` (std::time + manual print) to a
  proper criterion `benches/spmv_scaling.rs` with one entry per thread
  count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a
  `rayon::ThreadPool::install`. Added `criterion` to dev-deps +
  `[[bench]]` to `ppvm-lindblad/Cargo.toml`.

CSR + expm_multiply replacement (left as-is)
- Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all
  ship CSR + serial SpMV but none offer rayon-parallel SpMV with the
  nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham
  `expm_multiply` (matrix-free action). Decision documented in PR review
  replies; happy to revisit if a suitable crate appears.

Verification
- cargo test -p ppvm-lindblad --lib: 10/10
- pytest test/: 115/115

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 2, 2026 11:54

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 22 changed files in this pull request and generated 1 comment.

Comment thread ppvm-python/demo/lindblad_adaptive.py Outdated
AlexSchuckert and others added 5 commits June 2, 2026 12:58
…olution

Adds a LindbladSpec PyClass that exposes three primitives needed for direct
Heisenberg-picture evolution of a generic Pauli Lindbladian (Hermitian
Hamiltonian sum + Hermitian Pauli jumps) on an adaptive Pauli-string basis:

  - action(p): L*(p) for a single Pauli string
  - leakage(basis, coeffs): off-basis component of L*(Sum c_j p_j)
  - generator(basis): sparse generator matrix in COO form

The hot path stores Pauli words as packed [u8; W] X/Z bit arrays and runs
a single fused per-byte multiply that computes both r = h XOR p and the
product phase in one pass. Active-site iteration (per-qubit inverted index
of which H/jump terms touch each qubit) restricts the inner loop to
non-trivially-acting terms; an action cache (DashMap, FxHash) memoises
per-input contribution lists; rayon parallelises the basis loops.

The numpy boundary takes uint8 code arrays (0=I 1=X 2=Z 3=Y) end-to-end,
so the Python driver never builds per-row strings on the hot path. A thin
Python wrapper (ppvm.Lindbladian) exposes both ndarray and string-keyed
APIs for callers who don't need the ndarray fast path.

End-to-end agreement with a pure-Python reference is within 1e-16 in
max |Ck|. At L=51, kmax=1, dt=0.2, 20 steps, the shim runs in 17s vs
481s for the pure-Python adaptive driver (~28x), with the action cache
and ndarray-native wrapper jointly responsible for the speedup over
naive Rust.
- demo/lindblad_adaptive.py: adaptive Pauli-Lindbladian time evolution on a
  leakage-grown Pauli-string basis (all-to-all XY + single-site Z dephasing),
  with an optional predictor-corrector (lifts the dt-error from O(dt^2) to
  O(dt^3)) and a discarded-weight a-posteriori error monitor. Exercises
  ppvm.Lindbladian (action / leakage / generator) end to end.
- test/test_lindblad.py: cross-check action / leakage / generator / the
  protected set against an independent dense Pauli-multiplication-table
  reference, on XY+Z-dephasing and TFIM+X-dephasing Lindbladians.
- Declare numpy as a runtime dependency (the shim wrapper uses it) and import
  SciPy lazily inside generator()/generator_arr(); `import ppvm` and the
  action/leakage primitives no longer require SciPy -- only the sparse-matrix
  convenience does.
…e-Rust PC step

Split the adjoint Pauli-Lindbladian algorithm out of `ppvm-python-native`
into a standalone `ppvm-lindblad` crate backed by `PauliWord<[u64; 2]>`
from `ppvm-runtime`, and extend it with:

- Non-Hermitian Pauli-sum jumps via `JumpInput` / `JumpKind::{HermitianPauli,
  General}`. The general path implements the full GKSL sandwich
  `Σ_{a,b} λ_a* λ_b P_a O P_b - 1/2 {L†L, O}` with the `L†L` Pauli expansion
  precomputed at construction. Same fast diagonal path for the common
  Hermitian-Pauli jump case.

- Pure-Rust matrix-exponential action `expm_multiply` implementing Al-Mohy
  & Higham (2011) Algorithm 3.2: degree-`m` Horner-form Taylor with
  scaling-and-squaring, `(m, s)` from a precomputed `θ_m` table, optional
  early termination. Backed by a minimal CSR matrix type with serial and
  rayon-parallel SpMV (parallel above a configurable nnz threshold).

- `LindbladSpec::pc_step` runs the entire predictor-corrector step in Rust
  (first-hop leakage expansion → predictor expm → second-hop leakage
  expansion → corrector expm), so the Python driver no longer needs scipy
  for the inner loop.

- `pc_step_timed` returns a per-phase microsecond breakdown
  (leakage1/expand1/gencsr1/expm1/...) for profiling parallel scaling.

- `num_threads: Option<usize>` parameter wraps the entire PC step in a
  freshly-built rayon `pool.install`, isolating the requested core count
  for benchmarking.

- Performance: parallel-scatter CSR build (skips `from_triplets`'s
  sequential count/scatter); `rayon::map_init` thread-local scratch
  (hashmap + scratch vectors reused across rayon tasks instead of
  per-task allocations); hash-only `in_basis`/`protected_set` membership
  tables (`FxHashMap<u64, ()>` instead of `<Word, ()>` shrinks the
  working set ~6× and keeps it in L2/L3 past basis 10⁶).

At L=51, long-range XY α=1, γ=1, T=1 (basis grows to 577K), 20 PC steps
runs end-to-end in ~512 s wall on 4 threads — down from ~734 s
pre-optimisation (1.43× faster).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e tests, scaling demo

Wrapper updates and tests for the new `ppvm-lindblad` crate.

API additions on `Lindbladian`:

- `pc_step` / `pc_step_arr`: thin Python wrappers around the pure-Rust
  predictor-corrector step. Expose `parallel_threshold`, `num_threads`,
  and `expm_tol` knobs for benchmarking parallel scaling without
  re-importing scipy.

- Accept jumps as either a Pauli string (Hermitian-Pauli fast path) or a
  complex Pauli linear combination `Iterable[(str, complex)]` for
  non-Hermitian dissipators.

- `sigma_plus(site, n_qubits)` and `sigma_minus(site, n_qubits)` helpers
  produce the `(X ± iY)/2` Pauli sums for amplitude damping / thermal
  excitation. Re-exported from the top-level `ppvm` package.

Tests:

- Amplitude damping (`test_amplitude_damping_action`,
  `test_amplitude_damping_generator_and_leakage`) and thermal σ± mix
  (`test_thermal_excitation_damping_action`) cross-checked against a
  dense Liouvillian super-operator reference.

- Adaptive shim convergence vs the closed Jordan-Wigner bilinear ODE for
  NN-XY + Z-dephasing on OBC
  (`test_adaptive_converges_to_nn_xy_z_dephasing_bilinear`).

- Predictor-corrector raises the dt-scaling from quadratic to cubic
  (`test_predictor_corrector_lifts_dt_scaling_to_cubic`).

- Pure-Rust pc_step matches scipy-based PC at FP precision
  (`test_pc_step_rust_matches_scipy_pc`) and exhibits the same cubic
  dt-scaling against the bilinear reference
  (`test_pc_step_rust_dt_scaling_is_cubic`).

- Length-1 real lincomb routes to the Hermitian fast path
  (`test_lincomb_single_term_matches_hermitian_fast_path`).

Demo:

- `demo/lindblad_pc_scaling.py`: end-to-end wall-time scaling sweep with
  `--max-cores`, `--model {nn,long-range}`, `--alpha`, `--gamma`, `--tau`
  CLI for HPC nodes. Reports first-step + steady-state ms per thread
  count with the entire PC step pinned to a per-call rayon pool.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AlexSchuckert added a commit that referenced this pull request Jun 2, 2026
Correctness
- generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words
  at the PyO3 boundary with a clear ValueError. The underlying row-index
  map silently overwrote earlier rows on collision, producing a wrong
  sparse generator. ppvm-lindblad now factors the index-build with
  uniqueness debug_assert into `build_basis_index(...)`, called from both
  `generator` and `generator_csr`. New regression test
  `test_generator_rejects_duplicate_basis`.
- `_string_to_codes`: raise `ValueError` naming the bad character; strip
  `_` separators to match the Rust `parse_pauli_string` parser.

API surface
- Rename the four codec helpers (string_to_codes, codes_to_string,
  basis_to_codes, codes_to_basis) to `_`-prefixed module-private names.

Dependencies — drop scipy from runtime + tests
- Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals)
  instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap;
  see docstring.
- Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy
  eigendecomposition reference (`expm_mv_dense`).
- pyproject.toml: scipy removed from `dev`; moved to a new `demo` group
  (demos still use it for the reference matrix exponential).

Test layout
- Split the 723-line `test/test_lindblad.py` into `test/lindblad/`:
  * `_helpers.py` — shared Pauli-table reference, dense Liouvillian
    reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers
  * `test_action_generator.py` — Hermitian-Pauli action/leakage/generator
  * `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps
  * `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref
  * `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling
  Each test file now ≤ 121 lines.

Demos
- Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py`
  rewritten as jupytext percent-format notebooks with markdown narrative
  cells and end-of-notebook plots, replacing CLI arg parsing with a
  top-of-file parameter cell.
- `lindblad_adaptive.py`: split the monolithic per-step body into
  `init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`;
  dropped the unnecessary `target_arr = basis_arr.copy()` /
  `protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr
  in place — `vstack`/slice rebind).
- `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location`
  bypass that was working around `demo/stim.py` shadowing the real `stim`
  package on `python demo/foo.py` invocations.
- Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo
  that happened to share a name with the `stim` package).

Benches
- Convert `examples/spmv_scaling.rs` (std::time + manual print) to a
  proper criterion `benches/spmv_scaling.rs` with one entry per thread
  count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a
  `rayon::ThreadPool::install`. Added `criterion` to dev-deps +
  `[[bench]]` to `ppvm-lindblad/Cargo.toml`.

CSR + expm_multiply replacement (left as-is)
- Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all
  ship CSR + serial SpMV but none offer rayon-parallel SpMV with the
  nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham
  `expm_multiply` (matrix-free action). Decision documented in PR review
  replies; happy to revisit if a suitable crate appears.

Verification
- cargo test -p ppvm-lindblad --lib: 10/10
- pytest test/: 115/115

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness
- generator()/pc_step()/pc_step_timed(): reject duplicate basis Pauli words
  at the PyO3 boundary with a clear ValueError. The underlying row-index
  map silently overwrote earlier rows on collision, producing a wrong
  sparse generator. ppvm-lindblad now factors the index-build with
  uniqueness debug_assert into `build_basis_index(...)`, called from both
  `generator` and `generator_csr`. New regression test
  `test_generator_rejects_duplicate_basis`.
- `_string_to_codes`: raise `ValueError` naming the bad character; strip
  `_` separators to match the Rust `parse_pauli_string` parser.

API surface
- Rename the four codec helpers (string_to_codes, codes_to_string,
  basis_to_codes, codes_to_basis) to `_`-prefixed module-private names.

Dependencies — drop scipy from runtime + tests
- Lindbladian.generator{,_arr} now return COO triples (rows, cols, vals)
  instead of a scipy.sparse.csc_matrix. Users wanting scipy can wrap;
  see docstring.
- Tests replace `scipy.sparse.linalg.expm_multiply` with a numpy
  eigendecomposition reference (`expm_mv_dense`).
- pyproject.toml: scipy removed from `dev`; moved to a new `demo` group
  (demos still use it for the reference matrix exponential).

Test layout
- Split the 723-line `test/test_lindblad.py` into `test/lindblad/`:
  * `_helpers.py` — shared Pauli-table reference, dense Liouvillian
    reference, bilinear NN-XY+Z-dephasing solver, adaptive PC drivers
  * `test_action_generator.py` — Hermitian-Pauli action/leakage/generator
  * `test_non_hermitian_jumps.py` — non-Hermitian (σ-/σ+/thermal) jumps
  * `test_adaptive_pc.py` — adaptive PC convergence vs bilinear ref
  * `test_pc_step_rust.py` — pure-Rust pc_step parity + cubic dt-scaling
  Each test file now ≤ 121 lines.

Demos
- Both `demo/lindblad_adaptive.py` and `demo/lindblad_pc_scaling.py`
  rewritten as jupytext percent-format notebooks with markdown narrative
  cells and end-of-notebook plots, replacing CLI arg parsing with a
  top-of-file parameter cell.
- `lindblad_adaptive.py`: split the monolithic per-step body into
  `init_mode`, `add_leakage_to_basis`, `prune_and_cap`, `run_mode`;
  dropped the unnecessary `target_arr = basis_arr.copy()` /
  `protected_arr = basis_arr.copy()` calls (nothing mutates basis_arr
  in place — `vstack`/slice rebind).
- `lindblad_pc_scaling.py`: dropped the `importlib.util.spec_from_file_location`
  bypass that was working around `demo/stim.py` shadowing the real `stim`
  package on `python demo/foo.py` invocations.
- Rename `demo/stim.py` → `demo/stim_msd.py` (it's an MSD-circuit demo
  that happened to share a name with the `stim` package).

Benches
- Convert `examples/spmv_scaling.rs` (std::time + manual print) to a
  proper criterion `benches/spmv_scaling.rs` with one entry per thread
  count `{1, 2, 3, 4, 6, 8, 10}`, each running SpMV inside a
  `rayon::ThreadPool::install`. Added `criterion` to dev-deps +
  `[[bench]]` to `ppvm-lindblad/Cargo.toml`.

CSR + expm_multiply replacement (left as-is)
- Surveyed `sprs`, `nalgebra-sparse`, `russell_sparse`, `faer-rs`: all
  ship CSR + serial SpMV but none offer rayon-parallel SpMV with the
  nnz-threshold gating used here. No Rust crate provides Al-Mohy & Higham
  `expm_multiply` (matrix-free action). Decision documented in PR review
  replies; happy to revisit if a suitable crate appears.

Verification
- cargo test -p ppvm-lindblad --lib: 10/10
- pytest test/: 115/115

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 2, 2026 12:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 22 changed files in this pull request and generated no new comments.

@AlexSchuckert

Copy link
Copy Markdown
Collaborator Author

@AlexSchuckert wondering if we can have some information about how you verify the correctness of this to help reviewing?

I'm using several physics examples in test/lindblad. There it essentially builds the exact evolution using dense matrices. Of course I can't fully exclude weird edge cases, but I think these examples are non-trivial.

@AlexSchuckert AlexSchuckert marked this pull request as draft June 3, 2026 09:35
AlexSchuckert and others added 6 commits June 3, 2026 17:19
`pc_step` now takes an optional `drop_tol` parameter. At the end of the
predictor-corrector step (after the corrector expm_multiply), any basis
entry whose absolute coefficient is below `drop_tol` is removed from the
basis, unless the corresponding Pauli word appears in `protected`.

Implemented as a small in-place compaction (FxHashSet lookup on
`protected`, single pass swap-and-truncate over basis+coeffs). No-op
when `drop_tol ≤ 0`, preserving the existing add-only growth behaviour.

Motivation: at γ = 0 (pure unitary Hamiltonian dynamics) there is no
physical damping of small-coefficient strings, so the basis grows
monotonically with only `tau_add` as a brake. `drop_tol > 0` lets the
caller actively prune entries whose coefficient has decayed, keeping
the basis bounded and making γ-extrapolation experiments feasible.

Plumbed through `ppvm_lindblad::Lindbladian::pc_step{,_timed}`, the
PyO3 binding (new optional kwarg, defaults to 0.0), and the Python
shim `ppvm.lindblad.Lindbladian.pc_step{,_arr}`.

Verified:
- All 10 Rust lib tests still pass.
- All 122 Python tests still pass.
- Smoke test: at L=6, NN XY, γ=0, seed = Z_2:
    no drop_tol            → basis saturates at 36 strings
    drop_tol = 0.05        → basis stays at 9
    drop_tol = 0.5         → basis stays at 1 (only protected Z_2)
    drop_tol = 1.5         → basis stays at 1 (protected never dropped)
  Protected words are never removed regardless of magnitude.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Empirical benchmarks (xy-experiments/bench_cache_*.py, see PR thread)
showed that the per-Pauli `L*(p)` memoisation cache:

- HURTS wall time for sparse-local Hamiltonians (e.g. XX ladder): hash-
  map lookup latency at million-entry scale matches the cost of just
  recomputing the few-bond commutator. cache_ON 195s vs cache_OFF 149s
  at L=20 ladder, |basis|=3.7M (-24%).

- Helps modestly for dense long-range Hamiltonians (LR-XY, α=1): ~25%
  speedup, BUT at ~5 KB per cached entry. At basis sizes that matter
  (10⁶+), this consumes tens of GB and pushes runs into OOM well before
  the basis itself does.

In both regimes — and especially for the L=41 sweeps that just OOM'd
on Perlmutter at γ=0 / tight add_tol — the cache is a net loss. Pauli
propagation is intrinsically memory-limited (basis + transient action
lists), so any per-Pauli auxiliary storage just shrinks the reachable
problem size.

Strips:
- `action_cache: DashMap` field + `cache_enabled: AtomicBool` flag
- `clear_cache()`, `cache_size()`, `set_cache_enabled()` on LindbladSpec
- corresponding PyO3 bindings and Python wrapper methods
- `dashmap` Cargo dep (no other code in the crate uses it)
- `L_op.clear_cache()` calls in demos and tests (now no-ops anyway)

All 10 Rust lib tests + 122 Python tests still pass. End-to-end
behaviour unchanged; just lower memory + (often) faster.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… RSS cut

Two memory optimisations on top of the adaptive-Pauli evolution paths:

1. **Switch ppvm-python-native to the mimalloc global allocator.**
   The system allocator (macOS / glibc malloc) holds onto freed pages in
   arena freelists, so the OS-reported RSS stays high even after Rust
   has freed allocations. mimalloc returns pages to the kernel much more
   aggressively. Drop-in, ~5 lines.

2. **Chunk `leakage_inner`'s parallel pass into batches of 4096 basis
   rows.** Previously `basis.par_iter().map_init(...).collect()`
   materialised the full `Vec<Vec<(Word, f64)>>` (~N × A × 24 B) in
   memory simultaneously before merging — ~430 MB at L=51 LR-XY, N=14 K,
   A≈1275. Now each chunk's `Vec<Vec>` is merged into the global
   `FxHashMap` and dropped before the next chunk starts; peak in-flight
   is bounded to `CHUNK_SIZE · A · 24 B`. Chunk of 4096 still gives
   ~400 items per rayon worker so parallelism stays healthy.

Benchmark on macOS at two configs (separate process per measurement so
ru_maxrss is clean):

```
                    config          wall(s)  peak RSS (MB)
                                    before  after  before  after
  LR-XY L=51 γ=1   add_tol=2e-4     18.51   16.51    3160    1759   (-44%)
  LR-XY L=12 γ=0.1 add_tol=2e-4     13.10   11.56    4583    1797   (-61%)
```

Memory ~halved; wall ~10-15 % faster (mimalloc is itself faster than
the system allocator). All 10 Rust lib tests + 122 Python tests pass.

This lets the L=41 / γ=0 / tight-tol sweeps that were OOM-ing at 80 GB
on Perlmutter shared nodes likely fit, and roughly doubles the largest
basis we can reach on a fixed memory budget.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tie the leakage-rate add threshold to the post-step drop threshold via
  tau_add = K * drop_tol / dt
with K=5 as the default — Pareto-optimal in the L=51 LR-XY γ=1 study
(K=5, drop=4e-6 matches the K=1, drop=2e-5 accuracy floor at 3× lower
RSS+wall, and dominates K=10 at high accuracy). Callers can still pass
tau_add explicitly; the new policy only kicks in when tau_add is omitted
and drop_tol > 0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds three new code paths plus three memory optimizations on the
existing pc_step:

  - matrix_free flag on pc_step: skip CSR build, do each Krylov-Taylor
    SpMV by recomputing L*(p) on the fly. New spmv_matrix_free,
    one_norm_matrix_free, expm_multiply_mf. Trades wall (4× at L=6) for
    memory (modest at L=6, expected to dominate at L≥10).
  - rk4_step: classical 4th-order Runge-Kutta on the adjoint
    Lindbladian, matrix-free, with leakage-driven natural basis growth
    and post-step drop_tol prune. New compute_action_sum helper. Doc
    flags the dt < 2.78/||L*|| stability boundary with explicit warning
    about silent norm-conserving / observable-diverging failure mode.
  - leakage_with_prune: streaming Cauchy-Schwarz remaining-budget
    prune. Exact (drops only candidates whose true sum cannot exceed
    tau_add). Uses new max_action_coef per-operator bound.
  - pc_step_inner: collapse coeffs_pre + coeffs_pre_padded clones into
    a single in-place coeffs buffer, saving 2 × n × 8B per step.
  - ExpmOpts.max_krylov_m: cap on Krylov-Taylor degree m_star. Trades
    more outer scaling-and-squaring iterations for less Krylov scratch.

All four exposed through Python pc_step_arr and rk4_step_arr.

Correctness: CSR (new opts) and matrix-free pc_step trajectories are
bit-identical (4e-16 max delta over 50 steps at drop=1e-6, L=6).
}

criterion_group!(benches, bench_spmv_scaling);
criterion_main!(benches);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we post in the PR comment about the result of this benchmark for future reference?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete: the spmv_scaling benchmark has been removed -- it benchmarked the now-deleted real-CSR SpMV. The matrix-exponential action is now provided by the external quspin-expm crate.

Comment thread crates/ppvm-lindblad/src/expm.rs Outdated
Comment on lines +115 to +121
for (i, yi) in y.iter_mut().enumerate() {
let mut sum = 0.0;
for k in indptr[i]..indptr[i + 1] {
sum += data[k] * x[indices[k] as usize];
}
*yi = sum;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we verify if this gets correctly vectorized after compilation?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete: the hand-rolled scaling-and-squaring inner loop has been removed; the matrix-exponential action is now provided by the external quspin-expm crate.

Comment thread crates/ppvm-python-native/src/lib.rs Outdated
Comment on lines +4 to +9
// mimalloc returns freed pages to the kernel more aggressively than the
// default system allocator. This materially reduces peak RSS for the
// allocation-heavy adaptive-Pauli paths (leakage + generator each
// allocate hundreds of MB of transient Vec data per pc_step).
#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious. I think we need more explanation of why we are adding this custom allocator and how much improvement this gives compared to not having it.

Ideally, the initial draft PR should not contain this type of optimization; we should add that in a separate PR. I would decline to merge this draft PR as a single unit.

It would be better to start with the basic setup and have that reviewed first, then submit a separate PR focusing on the different optimizations. This avoids forcing the reviewer to go through a giant 10,000-line PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: the global allocator has been removed from this PR (f14edaf) and moved to its own PR -- #129 -- so it can be reviewed and benchmarked independently, per your suggestion.

@@ -0,0 +1,400 @@
// SPDX-FileCopyrightText: 2026 The PPVM Authors

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're adding this as an internal implementation, it would be nice to add this as a separate crate instead of within ppvm-liblblod. I imagine this will be useful for other Rust developers as well, and it will also help clean up the dependency a little bit.

It would be nice to first submit a separate PR that creates a single crate that implements sparse matrix and exponential.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: rather than a first-party crate, the matrix-exponential is now provided by the external quspin-expm crate (QuSpin-rust). expm.rs is reduced to just the (m, s) selection table, so the dependency surface is much smaller.

/// `[u64; 2]` covers up to 128 qubits; the `FxBuildHasher` matches the
/// hash used by the `FxHashMap` keys we wrap with; `REHASH=true` means
/// `set()` keeps the cached hash in sync.
pub type Word = PauliWord<[u64; W_U64], FxBuildHasher, true>;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why our implementation has a hard-coded number of qubits here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b07f75d: added docs explaining the fixed-width [u64; W_U64] Pauli-word storage and MAX_QUBITS = 64 * W_U64 -- chosen for cache-friendly O(1) Pauli ops; construction beyond it errors with Error::TooManyQubits.

Comment thread crates/ppvm-lindblad/src/lib.rs Outdated
pub type Word = PauliWord<[u64; W_U64], FxBuildHasher, true>;

/// Errors raised when constructing a [`LindbladSpec`].
#[derive(Debug, Clone)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put all the error handling in a separate file, so we have a clean lib.rs file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b07f75d: the Error enum and its Display/Error impls moved to crates/ppvm-lindblad/src/error.rs; the public path ppvm_lindblad::Error is unchanged.

Comment thread crates/ppvm-lindblad/src/lib.rs Outdated
/// inner `*mut T` (which is not) — Rust 2021's disjoint-field capture
/// would otherwise peel the wrapper.
#[derive(Clone, Copy)]
struct SendPtr<T>(*mut T);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting approach. Does this have an existing Rust standard library functionality that we can reuse?

We do not accept any unsafe code in this crate in general. If we have to use unsafe, it must be separated and isolated in a separate crate.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: all unsafe has been removed from ppvm-lindblad. The SendPtr parallel-scatter helper is gone along with the CSR generator it served -- evolution is now matrix-free via quspin-expm.

Comment thread crates/ppvm-lindblad/src/lib.rs Outdated
/// `num_threads`, when set, runs the entire step inside a freshly built
/// rayon thread pool of that size — useful for benchmarking parallel
/// scaling. When `None`, the global rayon pool is used.
#[allow(clippy::too_many_arguments)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not work around Clippy errors like this. If Clippy warns you about too many arguments, this is a code smell of an incorrect abstraction being created here.

Ideally, you would create an object that contains configurations such as:

  1. Drop tolerance
  2. Tau add
  3. Other relevant options

Then, you can feed that object in with the Hamiltonian and Lindblad stuff.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b07f75d: introduced PcStepConfig { tau_add, drop_tol, num_threads }; pc_step/pc_step_timed (and their inner helpers) now take it, and the #[allow(clippy::too_many_arguments)] has been removed. The Python-facing kwargs are unchanged -- the binding builds the config internally.

Comment thread crates/ppvm-lindblad/src/lib.rs Outdated
/// breakdown (microseconds), for profiling parallel scaling and hot
/// spots. Output: `(leakage1, expand1, gencsr1, expm1, leakage2,
/// expand2, gencsr2, expm2)`.
#[allow(clippy::too_many_arguments)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b07f75d: covered by the same PcStepConfig refactor.

david-pl and others added 13 commits June 17, 2026 10:24
…shim

Brings the bytemuck-based weight() speedup + early-return for uncapped
MaxPauliWeight (and the rest of #126) into the Lindblad branch.

Conflicts resolved: keep stim_demo.py (byte-identical to lindblad-shim's
stim_msd.py rename); regenerate uv.lock against the merged pyproject.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The expm serial/parallel SpMV parity test built `ExpmOpts { tol,
parallel_threshold }` without the `max_krylov_m` field, so `cargo test`
failed to compile the ppvm-lindblad test target (pre-existing on
lindblad-shim, surfaced when running the full suite after merging #126).
The whole Rust workspace test suite now passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ree)

Replace the hand-rolled Al-Mohy & Higham scaling-and-squaring engine on the
real (f64) `pc_step`/`expm_step` path with the external `quspin-expm` crate,
used fully matrix-free: a borrowed `MfOp` implements `quspin_types::
LinearOperator` over the in-basis Lindbladian action (`spmv_matrix_free`),
and `ExpmOp::from_parts` supplies the shift mu = tr(M)/n, partition count s,
and Taylor order m* directly -- bypassing quspin's adaptive parameter
selection so only `dot` runs in the Taylor loop (no estimator, no
dot_transpose). mu and the exact column 1-norm of M - mu*I are gathered in
one matrix-free pass via `action()`; (m*, s) reuse the existing `select_ms`
table. No CSR is materialised for the exponential action (`generator_csr`
is kept for the other consumers).

The old `expm` engine (expm_multiply / expm_multiply_mf / spmv / Csr) is
retained but no longer on the `pc_step` path. The complex orbit-rep path is
unaffected here and migrates separately on `symmetry-merging`.

Validated: cargo test --workspace --exclude ppvm-python-native (618 passed)
and the Python lindblad suite (13 passed, pc_step vs bilinear reference).

NOTE: QuSpin-rust currently ships no LICENSE (all-rights-reserved). This pin
is a placeholder pending a permissive license being added upstream; the
quspin-expm authors are co-authors of the ppvm paper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…path

Now that the real pc_step path runs entirely on quspin-expm (matrix-free,
via mf_expm), delete the dead hand-rolled engine and every remnant of the
CSR-for-real path. This branch has no complex/orbit-rep path, so the whole
old engine is dead:

  * expm.rs: reduced to just `select_ms` + its `THETA` table (the only
    thing the new engine reuses). Removed the Csr type, csr_from_triplets,
    csr_one_norm, spmv / spmv_serial / spmv_parallel, ExpmOpts,
    expm_multiply, expm_multiply_mf, and their tests/imports.
  * lib.rs: removed LindbladSpec::generator_csr, one_norm_matrix_free, the
    now-dead SendPtr scatter helper, and the `expm` re-export. Dropped the
    `opts: ExpmOpts` and `matrix_free: bool` parameters from pc_step,
    pc_step_inner, pc_step_timed (+ inner) and expm_step — the path is
    unconditionally matrix-free now.
  * benches/spmv_scaling.rs: removed (benched the deleted real Csr/SpMV);
    drop its [[bench]] entry and the now-unused criterion dev-dep.

Python bindings + wrapper: drop the expm_tol / parallel_threshold /
matrix_free / max_krylov_m kwargs (they only configured the removed engine)
from pc_step / pc_step_timed (bindings) and pc_step_arr / pc_step (wrapper);
num_threads is retained. Updated the lindblad_pc_scaling demo.

Net -750 / +30 lines. Validated: cargo test --workspace --exclude
ppvm-python-native (all green, warning-clean) and the Python suite
(136 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Maintainer-requested code-quality changes (PR #98):

  * PcStepConfig (new config.rs): bundle tau_add / drop_tol / num_threads
    into a config struct passed to pc_step / pc_step_timed (+ inner
    helpers), replacing the long parameter lists and the
    `#[allow(clippy::too_many_arguments)]`. The Python-facing kwargs are
    unchanged — the binding builds the config internally.
  * error.rs (new): move the `Error` enum + Display/Error impls out of
    lib.rs; re-exported as `ppvm_lindblad::Error` (public path unchanged).
  * leakage: merge via Rayon fold/reduce into thread-local maps instead of
    collecting a Vec<HashMap> and merging sequentially — lower peak memory.
    Keys and summed values are identical (FP reduction order differs at
    ~1e-16, well within test tolerances).
  * docs: explain the fixed-width `[u64; W_U64]` Word storage and
    MAX_QUBITS = 64 * W_U64.
  * demo: simplify the redundant position expression to `np.arange(L) - L//2`.

Validated: cargo test --workspace --exclude ppvm-python-native (612 passed),
cargo clippy clean (no too_many_arguments on the refactored fns), and the
full Python suite (136 passed). Python public API unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per PR #98 review (Roger-luo): the global allocator is a peak-RSS
optimization that doesn't belong in this feature PR. Removed here and
moved to a dedicated PR off `main` so it can be reviewed/benchmarked on
its own. Reverts to the system allocator; no behavior change (validated:
lindblad Python suite still green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants