-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Hi Aaron et al,
We (@Yalbibalderas @AnaBVA) noticed that the following code from OSCA returns different t-SNE plots. Eventually we narrowed down the issue to BioC 3.11 vs 3.12 (3.13 is the same as 3.12). In particular, the issue is with changes between those versions for DropletUtils.
The path to DropletUtils is documented at https://twitter.com/lcolladotor/status/1425242252872409092?s=20.
You can reproduce this with:
# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
"http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)
# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))
table(e.out$FDR < 0.001, useNA = "ifany")where BioC 3.11 returns
> table(e.out$FDR < 0.001, useNA = "ifany")
FALSE TRUE <NA>
1056 4233 731991 and later versions return
> table(e.out$FDR < 0.001, useNA = "ifany")
FALSE TRUE <NA>
989 4300 731991 The full code is below for getting to the tSNE (adapted from https://comunidadbioinfo.github.io/cdsb2021_scRNAseq/anotaci%C3%B3n-de-clusters-de-c%C3%A9lulas.html and ultimately from OSCA at https://bioconductor.org/books/release/OSCA/unfiltered-human-pbmcs-10x-genomics.html) is below.
Details
# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
"http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)
# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))
table(e.out$FDR < 0.001, useNA = "ifany")
library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
keys = rowData(sce.pbmc)$ID,
column = "SEQNAME", keytype = "GENEID"
)
sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]
# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]
# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)
## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)
## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
subset.row = top.pbmc,
technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")
plotTSNE(sce.pbmc)
options(width = 120)
sessioninfo::session_info()We think that it comes down to DropletUtils::emptyDrops() and between April 27 and October 27 2020 (when BioC 3.12 was bioc-devel), there's 2 commits that altered it, in particular we think that 71b161e might be the root of the difference.
Based on the history, we see that commit was early in the BioC 3.12 devel cycle. With that in mind, I did a chimera installation with BioC 3.12 and the BioC 3.11 version of DropletUtils, specifically 51d00b0 (remotes::install_github("MarioniLab/DropletUtils@51d00b0")). That version reproduced the t-SNE and DropletUtils::emptyDrops() we see with BioC 3.11.
Details
> options(width = 120)
main* > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.0.5 Patched (2021-03-31 r80247)
os macOS Big Sur 11.5.1
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2021-08-10
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
AnnotationDbi * 1.52.0 2020-10-27 [1] Bioconductor
AnnotationFilter * 1.14.0 2020-10-27 [1] Bioconductor
askpass 1.1 2019-01-13 [1] CRAN (R 4.0.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
beachmat 2.6.4 2020-12-20 [1] Bioconductor
beeswarm 0.3.1 2021-03-07 [1] CRAN (R 4.0.2)
Biobase * 2.50.0 2020-10-27 [1] Bioconductor
BiocFileCache * 1.14.0 2020-10-27 [1] Bioconductor
BiocGenerics * 0.36.1 2021-04-16 [1] Bioconductor
BiocNeighbors 1.8.2 2020-12-07 [1] Bioconductor
BiocParallel 1.24.1 2020-11-06 [1] Bioconductor
BiocSingular 1.6.0 2020-10-27 [1] Bioconductor
biocthis 1.0.10 2021-02-26 [1] Bioconductor
biomaRt 2.46.3 2021-02-09 [1] Bioconductor
Biostrings 2.58.0 2020-10-27 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.2)
blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.2)
bluster 1.0.0 2020-10-27 [1] Bioconductor
bookdown 0.22 2021-04-22 [1] CRAN (R 4.0.2)
cachem 1.0.4 2021-02-13 [1] CRAN (R 4.0.2)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.2)
cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
colorout 1.2-2 2020-11-03 [1] Github (jalvesaq/colorout@726d681)
colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2)
cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.0.2)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
curl 4.3.1 2021-04-30 [1] CRAN (R 4.0.2)
data.table 1.14.0 2021-02-21 [1] CRAN (R 4.0.2)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
dbplyr * 2.1.1 2021-04-06 [1] CRAN (R 4.0.2)
DelayedArray 0.16.3 2021-03-24 [1] Bioconductor
DelayedMatrixStats 1.12.3 2021-02-03 [1] Bioconductor
desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.2)
devtools * 2.4.0 2021-04-07 [1] CRAN (R 4.0.2)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
dplyr 1.0.5 2021-03-05 [1] CRAN (R 4.0.2)
dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.0.5)
DropletUtils * 1.9.0 2021-08-11 [1] Github (MarioniLab/DropletUtils@51d00b0)
edgeR 3.32.1 2021-01-14 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
EnsDb.Hsapiens.v86 * 2.99.0 2021-08-11 [1] Bioconductor
ensembldb * 2.14.1 2021-04-19 [1] Bioconductor
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.2)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
FNN 1.1.3 2019-02-15 [1] CRAN (R 4.0.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
GenomeInfoDb * 1.26.7 2021-04-08 [1] Bioconductor
GenomeInfoDbData 1.2.4 2020-11-03 [1] Bioconductor
GenomicAlignments 1.26.0 2020-10-27 [1] Bioconductor
GenomicFeatures * 1.42.3 2021-04-04 [1] Bioconductor
GenomicRanges * 1.42.0 2020-10-27 [1] Bioconductor
ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.2)
ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
HDF5Array 1.18.1 2021-02-04 [1] Bioconductor
hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.2)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.2)
IRanges * 2.24.1 2020-12-12 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 4.0.2)
knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.2)
lattice 0.20-44 2021-05-02 [1] CRAN (R 4.0.5)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.2)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
limma 3.46.0 2020-10-27 [1] Bioconductor
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.2)
lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.0.2)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
Matrix * 1.3-2 2021-01-06 [1] CRAN (R 4.0.5)
MatrixGenerics * 1.2.1 2021-01-30 [1] Bioconductor
matrixStats * 0.58.0 2021-01-29 [1] CRAN (R 4.0.2)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.3)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
openssl 1.4.4 2021-04-30 [1] CRAN (R 4.0.2)
pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.2)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.0.2)
progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.2)
prompt 1.0.1 2021-03-12 [1] CRAN (R 4.0.2)
ProtGenerics 1.22.0 2020-10-27 [1] Bioconductor
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.0.2)
Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)
remotes 2.3.0 2021-04-01 [1] CRAN (R 4.0.2)
rhdf5 2.34.0 2020-10-27 [1] Bioconductor
rhdf5filters 1.2.0 2020-10-27 [1] Bioconductor
Rhdf5lib 1.12.1 2021-01-26 [1] Bioconductor
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
Rsamtools 2.6.0 2020-10-27 [1] Bioconductor
RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.0.2)
RSQLite 2.2.7 2021-04-22 [1] CRAN (R 4.0.2)
rsthemes 0.1.0 2020-11-03 [1] Github (gadenbuie/rsthemes@6391fe5)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.0.2)
rtracklayer 1.50.0 2020-10-27 [1] Bioconductor
Rtsne 0.15 2018-11-10 [1] CRAN (R 4.0.2)
S4Vectors * 0.28.1 2020-12-09 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
scater * 1.18.6 2021-02-26 [1] Bioconductor
scran * 1.18.7 2021-04-16 [1] Bioconductor
scuttle 1.0.4 2020-12-17 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
SingleCellExperiment * 1.12.0 2020-10-27 [1] Bioconductor
sparseMatrixStats 1.2.1 2021-02-02 [1] Bioconductor
statmod 1.4.35 2020-10-19 [1] CRAN (R 4.0.2)
stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2)
SummarizedExperiment * 1.20.0 2020-10-27 [1] Bioconductor
suncalc 0.5.0 2019-04-03 [1] CRAN (R 4.0.2)
testthat * 3.0.2 2021-02-14 [1] CRAN (R 4.0.2)
tibble 3.1.1 2021-04-18 [1] CRAN (R 4.0.2)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.2)
usethis * 2.0.1 2021-02-10 [1] CRAN (R 4.0.2)
utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
uwot 0.1.10 2020-12-15 [1] CRAN (R 4.0.2)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2)
vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.2)
viridis 0.6.0 2021-04-15 [1] CRAN (R 4.0.2)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.0.2)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2)
xfun 0.22 2021-03-11 [1] CRAN (R 4.0.2)
XML 3.99-0.6 2021-03-16 [1] CRAN (R 4.0.2)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2)
XVector 0.30.0 2020-10-28 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
zlibbioc 1.36.0 2020-10-28 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
main* > I tried installing this version of DropletUtils 71b161e but I can't due some compilation errors related to scuttle, which makes sense since in BioC 3.12 you changed many packages. I also tried a chimera of BioC 3.11 and upgrading DropletUtils to BioC 3.12 versions (with remotes::install_github()) around the time of the suspected root commit and related packages, but well, I couldn't get it to work tonight.
In any case, do you think that there's a way to get the same emptyDrops() results with current versions of DropletUtils as those we were getting back in BioC 3.11? My sense is that the answer is no since the internals of the function changed. Aka, it's not like a parameter changed and we can just change it from say FALSE to TRUE, etc.
All of this arose due to how in BioC 3.11, the clusters of CD4 and CD8 T-cells looked nicely separated in the t-SNE and don't with current versions as shown below.
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>Hmm. We can't figure out why these two plots are different with the same code, though different R versions.
— 🇲🇽 Leonardo Collado-Torres (@lcolladotor) August 10, 2021
BioC 3.11? @PeteHaitch https://t.co/8DJM6pagdj
BioC 3.13https://t.co/t8KiPTyU9D@Bioconductor #rstats @yalbi_ibm @AnaBetty2304 pic.twitter.com/P65u7dnVCo
And well, we wanted to be able that question when we teach that part of OSCA on Friday morning.
Best,
Leo
