Skip to content

emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67

@lcolladotor

Description

@lcolladotor

Hi Aaron et al,

We (@Yalbibalderas @AnaBVA) noticed that the following code from OSCA returns different t-SNE plots. Eventually we narrowed down the issue to BioC 3.11 vs 3.12 (3.13 is the same as 3.12). In particular, the issue is with changes between those versions for DropletUtils.

The path to DropletUtils is documented at https://twitter.com/lcolladotor/status/1425242252872409092?s=20.

You can reproduce this with:

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))

table(e.out$FDR < 0.001, useNA = "ifany")

where BioC 3.11 returns

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
  1056   4233 731991 

and later versions return

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
   989   4300 731991 

The full code is below for getting to the tSNE (adapted from https://comunidadbioinfo.github.io/cdsb2021_scRNAseq/anotaci%C3%B3n-de-clusters-de-c%C3%A9lulas.html and ultimately from OSCA at https://bioconductor.org/books/release/OSCA/unfiltered-human-pbmcs-10x-genomics.html) is below.

Details
# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))

table(e.out$FDR < 0.001, useNA = "ifany")





library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
    keys = rowData(sce.pbmc)$ID,
    column = "SEQNAME", keytype = "GENEID"
)

sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]

# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
    subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
    type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]

# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)

## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
    subset.row = top.pbmc,
    technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")

plotTSNE(sce.pbmc)



options(width = 120)
sessioninfo::session_info()

We think that it comes down to DropletUtils::emptyDrops() and between April 27 and October 27 2020 (when BioC 3.12 was bioc-devel), there's 2 commits that altered it, in particular we think that 71b161e might be the root of the difference.

Screen Shot 2021-08-10 at 10 06 04 PM

Based on the history, we see that commit was early in the BioC 3.12 devel cycle. With that in mind, I did a chimera installation with BioC 3.12 and the BioC 3.11 version of DropletUtils, specifically 51d00b0 (remotes::install_github("MarioniLab/DropletUtils@51d00b0")). That version reproduced the t-SNE and DropletUtils::emptyDrops() we see with BioC 3.11.

Details
> options(width = 120)
main* > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                                      
 version  R version 4.0.5 Patched (2021-03-31 r80247)
 os       macOS Big Sur 11.5.1                       
 system   x86_64, darwin17.0                         
 ui       RStudio                                    
 language (EN)                                       
 collate  en_US.UTF-8                                
 ctype    en_US.UTF-8                                
 tz       America/New_York                           
 date     2021-08-10Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version  date       lib source                                  
 AnnotationDbi        * 1.52.0   2020-10-27 [1] Bioconductor                            
 AnnotationFilter     * 1.14.0   2020-10-27 [1] Bioconductor                            
 askpass                1.1      2019-01-13 [1] CRAN (R 4.0.2)                          
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.0.2)                          
 backports              1.2.1    2020-12-09 [1] CRAN (R 4.0.2)                          
 beachmat               2.6.4    2020-12-20 [1] Bioconductor                            
 beeswarm               0.3.1    2021-03-07 [1] CRAN (R 4.0.2)                          
 Biobase              * 2.50.0   2020-10-27 [1] Bioconductor                            
 BiocFileCache        * 1.14.0   2020-10-27 [1] Bioconductor                            
 BiocGenerics         * 0.36.1   2021-04-16 [1] Bioconductor                            
 BiocNeighbors          1.8.2    2020-12-07 [1] Bioconductor                            
 BiocParallel           1.24.1   2020-11-06 [1] Bioconductor                            
 BiocSingular           1.6.0    2020-10-27 [1] Bioconductor                            
 biocthis               1.0.10   2021-02-26 [1] Bioconductor                            
 biomaRt                2.46.3   2021-02-09 [1] Bioconductor                            
 Biostrings             2.58.0   2020-10-27 [1] Bioconductor                            
 bit                    4.0.4    2020-08-04 [1] CRAN (R 4.0.2)                          
 bit64                  4.0.5    2020-08-30 [1] CRAN (R 4.0.2)                          
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.0.2)                          
 blob                   1.2.1    2020-01-20 [1] CRAN (R 4.0.2)                          
 bluster                1.0.0    2020-10-27 [1] Bioconductor                            
 bookdown               0.22     2021-04-22 [1] CRAN (R 4.0.2)                          
 cachem                 1.0.4    2021-02-13 [1] CRAN (R 4.0.2)                          
 callr                  3.7.0    2021-04-20 [1] CRAN (R 4.0.2)                          
 cli                    2.5.0    2021-04-26 [1] CRAN (R 4.0.5)                          
 colorout               1.2-2    2020-11-03 [1] Github (jalvesaq/colorout@726d681)      
 colorspace             2.0-0    2020-11-11 [1] CRAN (R 4.0.2)                          
 cowplot                1.1.1    2020-12-30 [1] CRAN (R 4.0.2)                          
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.0.2)                          
 curl                   4.3.1    2021-04-30 [1] CRAN (R 4.0.2)                          
 data.table             1.14.0   2021-02-21 [1] CRAN (R 4.0.2)                          
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.0.2)                          
 dbplyr               * 2.1.1    2021-04-06 [1] CRAN (R 4.0.2)                          
 DelayedArray           0.16.3   2021-03-24 [1] Bioconductor                            
 DelayedMatrixStats     1.12.3   2021-02-03 [1] Bioconductor                            
 desc                   1.3.0    2021-03-05 [1] CRAN (R 4.0.2)                          
 devtools             * 2.4.0    2021-04-07 [1] CRAN (R 4.0.2)                          
 digest                 0.6.27   2020-10-24 [1] CRAN (R 4.0.2)                          
 dplyr                  1.0.5    2021-03-05 [1] CRAN (R 4.0.2)                          
 dqrng                  0.3.0    2021-05-01 [1] CRAN (R 4.0.5)                          
 DropletUtils         * 1.9.0    2021-08-11 [1] Github (MarioniLab/DropletUtils@51d00b0)
 edgeR                  3.32.1   2021-01-14 [1] Bioconductor                            
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.0.2)                          
 EnsDb.Hsapiens.v86   * 2.99.0   2021-08-11 [1] Bioconductor                            
 ensembldb            * 2.14.1   2021-04-19 [1] Bioconductor                            
 evaluate               0.14     2019-05-28 [1] CRAN (R 4.0.1)                          
 fansi                  0.4.2    2021-01-15 [1] CRAN (R 4.0.2)                          
 farver                 2.1.0    2021-02-28 [1] CRAN (R 4.0.2)                          
 fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.0.2)                          
 FNN                    1.1.3    2019-02-15 [1] CRAN (R 4.0.2)                          
 fs                     1.5.0    2020-07-31 [1] CRAN (R 4.0.2)                          
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.0.2)                          
 GenomeInfoDb         * 1.26.7   2021-04-08 [1] Bioconductor                            
 GenomeInfoDbData       1.2.4    2020-11-03 [1] Bioconductor                            
 GenomicAlignments      1.26.0   2020-10-27 [1] Bioconductor                            
 GenomicFeatures      * 1.42.3   2021-04-04 [1] Bioconductor                            
 GenomicRanges        * 1.42.0   2020-10-27 [1] Bioconductor                            
 ggbeeswarm             0.6.0    2017-08-07 [1] CRAN (R 4.0.2)                          
 ggplot2              * 3.3.3    2020-12-30 [1] CRAN (R 4.0.2)                          
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.2)                          
 gridExtra              2.3      2017-09-09 [1] CRAN (R 4.0.2)                          
 gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.0.2)                          
 HDF5Array              1.18.1   2021-02-04 [1] Bioconductor                            
 hms                    1.0.0    2021-01-13 [1] CRAN (R 4.0.2)                          
 htmltools              0.5.1.1  2021-01-22 [1] CRAN (R 4.0.2)                          
 httr                   1.4.2    2020-07-20 [1] CRAN (R 4.0.2)                          
 igraph                 1.2.6    2020-10-06 [1] CRAN (R 4.0.2)                          
 IRanges              * 2.24.1   2020-12-12 [1] Bioconductor                            
 irlba                  2.3.3    2019-02-05 [1] CRAN (R 4.0.2)                          
 knitr                  1.33     2021-04-24 [1] CRAN (R 4.0.2)                          
 labeling               0.4.2    2020-10-20 [1] CRAN (R 4.0.2)                          
 lattice                0.20-44  2021-05-02 [1] CRAN (R 4.0.5)                          
 lazyeval               0.2.2    2019-03-15 [1] CRAN (R 4.0.2)                          
 lifecycle              1.0.0    2021-02-15 [1] CRAN (R 4.0.2)                          
 limma                  3.46.0   2020-10-27 [1] Bioconductor                            
 locfit                 1.5-9.4  2020-03-25 [1] CRAN (R 4.0.2)                          
 lubridate              1.7.10   2021-02-26 [1] CRAN (R 4.0.2)                          
 magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.0.2)                          
 Matrix               * 1.3-2    2021-01-06 [1] CRAN (R 4.0.5)                          
 MatrixGenerics       * 1.2.1    2021-01-30 [1] Bioconductor                            
 matrixStats          * 0.58.0   2021-01-29 [1] CRAN (R 4.0.2)                          
 memoise                2.0.0    2021-01-26 [1] CRAN (R 4.0.3)                          
 munsell                0.5.0    2018-06-12 [1] CRAN (R 4.0.2)                          
 openssl                1.4.4    2021-04-30 [1] CRAN (R 4.0.2)                          
 pillar                 1.6.0    2021-04-13 [1] CRAN (R 4.0.2)                          
 pkgbuild               1.2.0    2020-12-15 [1] CRAN (R 4.0.3)                          
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.0.2)                          
 pkgload                1.2.1    2021-04-06 [1] CRAN (R 4.0.2)                          
 prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.0.2)                          
 processx               3.5.2    2021-04-30 [1] CRAN (R 4.0.2)                          
 progress               1.2.2    2019-05-16 [1] CRAN (R 4.0.2)                          
 prompt                 1.0.1    2021-03-12 [1] CRAN (R 4.0.2)                          
 ProtGenerics           1.22.0   2020-10-27 [1] Bioconductor                            
 ps                     1.6.0    2021-02-28 [1] CRAN (R 4.0.2)                          
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.0.2)                          
 R.methodsS3            1.8.1    2020-08-26 [1] CRAN (R 4.0.2)                          
 R.oo                   1.24.0   2020-08-26 [1] CRAN (R 4.0.2)                          
 R.utils                2.10.1   2020-08-26 [1] CRAN (R 4.0.2)                          
 R6                     2.5.0    2020-10-28 [1] CRAN (R 4.0.2)                          
 rappdirs               0.3.3    2021-01-31 [1] CRAN (R 4.0.2)                          
 Rcpp                   1.0.6    2021-01-15 [1] CRAN (R 4.0.2)                          
 RCurl                  1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)                          
 remotes                2.3.0    2021-04-01 [1] CRAN (R 4.0.2)                          
 rhdf5                  2.34.0   2020-10-27 [1] Bioconductor                            
 rhdf5filters           1.2.0    2020-10-27 [1] Bioconductor                            
 Rhdf5lib               1.12.1   2021-01-26 [1] Bioconductor                            
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.0.2)                          
 rmarkdown              2.7      2021-02-19 [1] CRAN (R 4.0.4)                          
 rprojroot              2.0.2    2020-11-15 [1] CRAN (R 4.0.2)                          
 Rsamtools              2.6.0    2020-10-27 [1] Bioconductor                            
 RSpectra               0.16-0   2019-12-01 [1] CRAN (R 4.0.2)                          
 RSQLite                2.2.7    2021-04-22 [1] CRAN (R 4.0.2)                          
 rsthemes               0.1.0    2020-11-03 [1] Github (gadenbuie/rsthemes@6391fe5)     
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.0.2)                          
 rsvd                   1.0.5    2021-04-16 [1] CRAN (R 4.0.2)                          
 rtracklayer            1.50.0   2020-10-27 [1] Bioconductor                            
 Rtsne                  0.15     2018-11-10 [1] CRAN (R 4.0.2)                          
 S4Vectors            * 0.28.1   2020-12-09 [1] Bioconductor                            
 scales                 1.1.1    2020-05-11 [1] CRAN (R 4.0.2)                          
 scater               * 1.18.6   2021-02-26 [1] Bioconductor                            
 scran                * 1.18.7   2021-04-16 [1] Bioconductor                            
 scuttle                1.0.4    2020-12-17 [1] Bioconductor                            
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.0.2)                          
 SingleCellExperiment * 1.12.0   2020-10-27 [1] Bioconductor                            
 sparseMatrixStats      1.2.1    2021-02-02 [1] Bioconductor                            
 statmod                1.4.35   2020-10-19 [1] CRAN (R 4.0.2)                          
 stringi                1.5.3    2020-09-09 [1] CRAN (R 4.0.2)                          
 stringr                1.4.0    2019-02-10 [1] CRAN (R 4.0.2)                          
 styler                 1.4.1    2021-03-30 [1] CRAN (R 4.0.2)                          
 SummarizedExperiment * 1.20.0   2020-10-27 [1] Bioconductor                            
 suncalc                0.5.0    2019-04-03 [1] CRAN (R 4.0.2)                          
 testthat             * 3.0.2    2021-02-14 [1] CRAN (R 4.0.2)                          
 tibble                 3.1.1    2021-04-18 [1] CRAN (R 4.0.2)                          
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.0.2)                          
 usethis              * 2.0.1    2021-02-10 [1] CRAN (R 4.0.2)                          
 utf8                   1.2.1    2021-03-12 [1] CRAN (R 4.0.2)                          
 uwot                   0.1.10   2020-12-15 [1] CRAN (R 4.0.2)                          
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.0.2)                          
 vipor                  0.4.5    2017-03-22 [1] CRAN (R 4.0.2)                          
 viridis                0.6.0    2021-04-15 [1] CRAN (R 4.0.2)                          
 viridisLite            0.4.0    2021-04-13 [1] CRAN (R 4.0.2)                          
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.0.2)                          
 xfun                   0.22     2021-03-11 [1] CRAN (R 4.0.2)                          
 XML                    3.99-0.6 2021-03-16 [1] CRAN (R 4.0.2)                          
 xml2                   1.3.2    2020-04-23 [1] CRAN (R 4.0.2)                          
 XVector                0.30.0   2020-10-28 [1] Bioconductor                            
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.0.2)                          
 zlibbioc               1.36.0   2020-10-28 [1] Bioconductor                            

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
main* > 

I tried installing this version of DropletUtils 71b161e but I can't due some compilation errors related to scuttle, which makes sense since in BioC 3.12 you changed many packages. I also tried a chimera of BioC 3.11 and upgrading DropletUtils to BioC 3.12 versions (with remotes::install_github()) around the time of the suspected root commit and related packages, but well, I couldn't get it to work tonight.

In any case, do you think that there's a way to get the same emptyDrops() results with current versions of DropletUtils as those we were getting back in BioC 3.11? My sense is that the answer is no since the internals of the function changed. Aka, it's not like a parameter changed and we can just change it from say FALSE to TRUE, etc.

All of this arose due to how in BioC 3.11, the clusters of CD4 and CD8 T-cells looked nicely separated in the t-SNE and don't with current versions as shown below.

Hmm. We can't figure out why these two plots are different with the same code, though different R versions.

BioC 3.11? @PeteHaitch https://t.co/8DJM6pagdj

BioC 3.13https://t.co/t8KiPTyU9D@Bioconductor #rstats @yalbi_ibm @AnaBetty2304 pic.twitter.com/P65u7dnVCo

— 🇲🇽 Leonardo Collado-Torres (@lcolladotor) August 10, 2021
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

And well, we wanted to be able that question when we teach that part of OSCA on Friday morning.

Best,
Leo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions