Conversation
|
Other potential issues are – vibe code analysis to be verified.
|
|
Are the same seeds assigned to the same process when rerun? Are sets or dictionaries different each time? [YES] So we need a sorted list. |
|
Add a |
|
Alright, 5c4e5df introduced a comprehensive, yet still relatively fast test on reproducibility. It runs the deconvolution on 3 cores, for three locations, and compares the outputs to exact precision. This should catch any variations we don't want. |
|
Actually, these tests still don't have bootstraps. We should add bootstraps to this and then add the standard behaviour to round to 2-3 digits for no bootstrapping and to significant digits if we do bootstrapping. |
|
Implemented the full test now. And it fails not due to numerical precision but due to something else. This test checks the very setup we run in production. Check the very readable error logs. We sadly see variation up to the second significant digit i.e. 0.7X % I have already added a general rounding to three significant digits in For reference, WiseDashboard and CovSpectrum Display results to the second decimal, i.e., XX.XX % for consistency. |
|
Added reproducible seed assignment, but this does not seem to be it. Nevertheless, it is good to keep around. Certainly a potential avenue of error. |
|
The other potential cause is the ordering of The issue persists. |
|
YES !!! This is reproducible now. Perhaps for the first time in Lollipop history ;) So, adding in a few more sorts and we got it. Bottom line, this issue most likely was a combination of two issues: Inconsistent usage of numpy random number generators:When I introduced the parallel processing we needed for on-time reporting, I switched the random number generation to use Use of
|
|
Really nice catch here, great job digging into this! |
|
Another thought - as currently single-digit percentages may rarely still change given different initial random seeds, but we report to XX.XX% precision, we should consider increasing the number of bootstraps. Yet this is certainly a separate issue. Currently, we are running at 100 iterations; the original paper recommends 1000. Did we do an analysis of some kind at some point? But this would warrant further practical considerations in our setup, i.e., we deconvolve for all data, which may have too high a runtime if we do that. |
|
Amazing work, love it! I think with these tests, and this fix we should be good. |
I would think 1000 resamples would make runtime a bit too prohibitive ... for not too much added precision. Except if we want to further parallelise this too ^^ But I would say it's probably not worth the extra compute. |
|
Good for catching the order of variants (depending of it's handled, I could affect the order of the columns in the matrix and thus have some tiny impact on exact binary reproducibility of results) |
|
@DrYak let's merge this. |
This PR ensured
lollipop.deconvolutereproducibility. It was recently noticed that we have variation in the abundance results and confidence intervals. This PR investigates the exact variation.Introduced consistency, limited quoting of the results, and reestablished (establishes) reproducibility of results.
See the feed below for a full explanation.