-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Hi!
I'm getting a crash when running emptyDropsCellRanger.
The call:
DropletUtils::emptyDropsCellRanger(dge)
Where dge is a matrix of cells by genes. I'll note this data set is compatible with your standard emptyDrops(dge) function.
The problem with the data is that the number of cell barcodes in the data set is too low, and the program runs into issues during the call of ambient.FUN.
In particular, the default bounds for the start and end of the ambient cell barcodes:
ind.min=45000
ind.max=90000
When this line is run, the function gets into trouble:
ambient[o[min(ind.min, length(totals)):min(ind.max, length(totals))]] <- TRUE
length(totals)=24101
So ambient only has a single positive entry, as the min and max resolve to the same index.
table (ambient)
ambient
FALSE TRUE
24100 1
This later causes the calculated ambient profile to be pretty strange, and it contains NaN values, which are rarely a good thing:
ambient.prob[13080:13090,]
CDC34 CDC37 CDC37L1 CDC37L1-DT CDC40 CDC42 CDC42-AS1 CDC42-IT1
3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05
CDC42BPA CDC42BPB CDC42BPG
NaN 3.130576e-05 3.130576e-05
Eventually, the code tries
any(still.zero)
And since still.zero contains NAs (inherited from ambient.prob), the result is NA, which produces the reported error at the top of the stack.
I don't expect the program to magically find an answer to this, but it might be nice to check the bounds and throw an exception that the selected indexes are too large for the provided data set.
I'm going to add my own data sanity checks before I run emptyDropsCellRanger as well.
