Ensuring Reproducibility of Deconvolution by gordonkoehn · Pull Request #19 · cbg-ethz/LolliPop

gordonkoehn · 2025-07-08T15:33:35Z

This PR ensured lollipop.deconvolute reproducibility. It was recently noticed that we have variation in the abundance results and confidence intervals. This PR investigates the exact variation.

Introduced consistency, limited quoting of the results, and reestablished (establishes) reproducibility of results.

See the feed below for a full explanation.

gordonkoehn · 2025-07-08T16:33:49Z

Other potential issues are – vibe code analysis to be verified.

1. PRIMARY ISSUE: RNG Seeding Incompatibility (Critical)

Location: lollipop/cli/deconvolute.py:156

Problem: The code uses modern NumPy's np.random.default_rng() but the resampling function still uses legacy np.random.randint().
# Current (broken) code:
def _deconvolute_bootstrap_wrapper(args: DeconvBootstrapsArgs):
    child_seed = args.pop("child_seed")
    np.random.default_rng(child_seed)  # ❌ This doesn't affect legacy functions!
    return _deconvolute_bootstrap(**args)

# In confints.py:
def resample_mutations(df_city1, mutations, namefield="mutations"):
    rand_idcs = np.random.randint(0, high=int(len(mutations) / 2), ...)  # ❌ Uses legacy RNG
Result: Bootstrap resampling is completely non-deterministic.

2. SECONDARY ISSUE: Numerical Precision in Optimization (Medium)

Location: lollipop/regressors.py:48

Problem: scipy.optimize.least_squares called without explicit tolerance parameters.
# Current code lacks precision control:
ls = least_squares(
    lambda beta: (np.expand_dims(k, 1) * X).dot(beta) - (k * y),
    b0,
    bounds=(0, 1),
    loss=self.loss_type,      # No tolerance specified
    f_scale=self.f_scale      # Uses scipy defaults
)
Result: Slight numerical differences in optimization convergence.

3. TERTIARY ISSUE: Confidence Interval Calculations (Low)

Location: lollipop/confints.py

Problem: Matrix inversions and floating-point operations without proper error handling.
try:
    inv_info = np.linalg.inv(linprob_info)
except np.linalg.LinAlgError:
    inv_info = linprob_info * np.nan  # Non-deterministic NaN handling

gordonkoehn · 2025-07-09T11:44:31Z

Are the same seeds assigned to the same process when rerun? Are sets or dictionaries different each time? [YES]

So we need a sorted list.

gordonkoehn · 2025-07-09T11:48:51Z

Add a list = sorted(list) in L740

https://github.com/cbg-ethz/LolliPop/blob/main/lollipop/cli/deconvolute.py#L740

gordonkoehn · 2025-07-10T11:16:16Z

Alright, 5c4e5df introduced a comprehensive, yet still relatively fast test on reproducibility.

It runs the deconvolution on 3 cores, for three locations, and compares the outputs to exact precision. This should catch any variations we don't want.

gordonkoehn · 2025-07-10T11:24:12Z

Actually, these tests still don't have bootstraps.

We should add bootstraps to this and then add the standard behaviour to round to 2-3 digits for no bootstrapping and to significant digits if we do bootstrapping.

gordonkoehn · 2025-07-10T12:58:46Z

Implemented the full test now. And it fails not due to numerical precision but due to something else. This test checks the very setup we run in production. Check the very readable error logs.

We sadly see variation up to the second significant digit i.e. 0.7X %

I have already added a general rounding to three significant digits in proportion, proportionLower, and proportionUpper.

For reference, WiseDashboard and CovSpectrum Display results to the second decimal, i.e., XX.XX % for consistency.
So, quoting to the 4th significant digit should be enough here in any case. In any case, rigorous scientific quoting would lead to only one significant digit given these large errors (see the experimental physicists' error bible pg. 34), so this is the most practical to do here.

gordonkoehn · 2025-07-10T13:10:27Z

Added reproducible seed assignment, but this does not seem to be it. Nevertheless, it is good to keep around. Certainly a potential avenue of error.

gordonkoehn · 2025-07-10T14:50:27Z

The other potential cause is the ordering of set, which may result in different orders of processes and calculations within one with different seeds. I have now sorted all the lists that were once sets.

The issue persists.

gordonkoehn · 2025-07-10T15:44:52Z

YES !!! This is reproducible now. Perhaps for the first time in Lollipop history ;) So, adding in a few more sorts and we got it.

Bottom line, this issue most likely was a combination of two issues:

Inconsistent usage of numpy random number generators:

When I introduced the parallel processing we needed for on-time reporting, I switched the random number generation to use numpy.random.SeedSequence, i.e., np.random.Generator to give each process its own seed. Now I didn't dig deep enough to see that the random number generator actually used the numpy legacy way of randomness, i.e., np.random.RandomState(). Also, I did not carefully propagate it through to the resampling. Clearly my mistake, we should have seen this upon code review; added better tests from the start.

Use of `sets`, adding non-determinism by unordered lists

Nevertheless, even if that would have been correct, it would have still failed because: Above, we assured a reproducible sequence of seeds, and that process 1 gets seed 1, and that seed 2 goes to process 2, and so on...

Yet we did not ensure that process 2 always processes the same data. That is because deconvolute.py exactly uses set() for everything: location_list, variants_list...

sets don't guarantee order in Python. Another thing we should have noticed.

Note that this issue has been around since inception, so while a distinct process may have just become affected since the parallelisation, intermediate computations for one location may also have varied. Yet I am not absolutely certain here.

Further non-determinism not addressed here is in the

scipy.optimize.least_squares, see diff_step which depends on the machine the optimization is run on, but I hope the difference is so small that we round it away anyway.

The test I built that models exactly the settings of production succeeds, so I am confident we have reproducibility, at least on the same machine.

kirschen-k · 2025-07-11T06:50:29Z

Really nice catch here, great job digging into this!

gordonkoehn · 2025-07-11T07:30:48Z

Another thought - as currently single-digit percentages may rarely still change given different initial random seeds, but we report to XX.XX% precision, we should consider increasing the number of bootstraps. Yet this is certainly a separate issue. Currently, we are running at 100 iterations; the original paper recommends 1000. Did we do an analysis of some kind at some point? But this would warrant further practical considerations in our setup, i.e., we deconvolve for all data, which may have too high a runtime if we do that.

dr-david · 2025-07-11T09:23:08Z

Amazing work, love it! I think with these tests, and this fix we should be good.

dr-david · 2025-07-11T09:25:13Z

Another thought - as currently single-digit percentages may rarely still change given different initial random seeds, but we report to XX.XX% precision, we should consider increasing the number of bootstraps. Yet this is certainly a separate issue. Currently, we are running at 100 iterations; the original paper recommends 1000. Did we do an analysis of some kind at some point? But this would warrant further practical considerations in our setup, i.e., we deconvolve for all data, which may have too high a runtime if we do that.

I would think 1000 resamples would make runtime a bit too prohibitive ... for not too much added precision. Except if we want to further parallelise this too ^^ But I would say it's probably not worth the extra compute.

DrYak · 2025-07-28T15:53:48Z

Good for catching the order of variants (depending of it's handled, I could affect the order of the columns in the matrix and thus have some tiny impact on exact binary reproducibility of results)

gordonkoehn · 2025-12-09T08:59:57Z

@DrYak let's merge this.

Gordon J. Köhn added 4 commits July 8, 2025 17:32

WIP

668fe1c

remove

174a675

removed

5dd8692

quick test

6b73101

Gordon J. Köhn added 3 commits July 10, 2025 12:47

ignore osx files

7faf9e0

fix test

c60ab37

test_reproducibility all there

5c4e5df

Gordon J. Köhn added 4 commits July 10, 2025 13:25

add bootstrapping

70874af

better formating of error

503de99

round to 3 significant digits / handle zeros

edc62a9

add proper test with better output formating

9f1fb8d

Gordon J. Köhn added 2 commits July 10, 2025 14:59

to 4th sig fig

8e9a4ff

ensure locations_list sorting for reproducible seed assigment

deca9a0

Gordon J. Köhn added 2 commits July 10, 2025 16:16

updating numpy to modern random generators

f31173c

WIP: all sorted now

287a11f

Gordon J. Köhn added 4 commits July 10, 2025 16:56

remove remaining non-deterministic ordering

3995fa6

fix sorted

87d6b32

linting

2b1df9a

list tests

e8dbb5d

gordonkoehn marked this pull request as ready for review July 10, 2025 15:45

gordonkoehn requested a review from dr-david July 10, 2025 15:45

gordonkoehn requested review from DrYak, Copilot and dr-david and removed request for DrYak, Copilot and dr-david July 10, 2025 15:45

gordonkoehn changed the title ~~Adding Tests for Ensuring Reproducibility with Random Numbers~~ Ensuring Reproducibility of Deconvolution Jul 10, 2025

removing debugging test code

a6a81e0

gordonkoehn linked an issue Jul 16, 2025 that may be closed by this pull request

Limit the Variant Abundance Estimate Outputs to Meaningful Accuracy #17

Open

gordonkoehn mentioned this pull request Jul 16, 2025

Limit the Variant Abundance Estimate Outputs to Meaningful Accuracy #17

Open

Conversation

gordonkoehn commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gordonkoehn commented Jul 8, 2025

1. PRIMARY ISSUE: RNG Seeding Incompatibility (Critical)

2. SECONDARY ISSUE: Numerical Precision in Optimization (Medium)

3. TERTIARY ISSUE: Confidence Interval Calculations (Low)

Uh oh!

gordonkoehn commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gordonkoehn commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gordonkoehn commented Jul 10, 2025

Uh oh!

gordonkoehn commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gordonkoehn commented Jul 10, 2025

Uh oh!

gordonkoehn commented Jul 10, 2025

Uh oh!

gordonkoehn commented Jul 10, 2025

Uh oh!

gordonkoehn commented Jul 10, 2025

Inconsistent usage of numpy random number generators:

Use of sets, adding non-determinism by unordered lists

Uh oh!

kirschen-k commented Jul 11, 2025

Uh oh!

gordonkoehn commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-david commented Jul 11, 2025

Uh oh!

dr-david commented Jul 11, 2025

Uh oh!

DrYak commented Jul 28, 2025

Uh oh!

gordonkoehn commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gordonkoehn commented Jul 8, 2025 •

edited

Loading

gordonkoehn commented Jul 9, 2025 •

edited

Loading

gordonkoehn commented Jul 9, 2025 •

edited

Loading

gordonkoehn commented Jul 10, 2025 •

edited

Loading

Use of `sets`, adding non-determinism by unordered lists

gordonkoehn commented Jul 11, 2025 •

edited

Loading