Inconsistencies in CoGAPS results across environments

Hello!

I have recently run into an issue regarding the output of CoGAPS when run in different environments.

I have tested the tool in:
- local system R environment
- local conda environment
- Galaxy development instance using `bioconductor-cogaps` and Galaxy development instance using `ghcr.io/fertiglab/cogaps:master` (produce same output)

For context, here are some links that may be helpful for Galaxy:
- https://galaxyproject.org/
- https://training.galaxyproject.org/training-material/topics/dev/index.html?utm_source=gtn&utm_medium=home&utm_campaign=home-quickstart#st-dependencies

I have tested 8 different Visium datasets found here: https://github.com/FridleyLab/spatialGE_Data/blob/main/tnbc_bassiouni.zip. For each sample, I get varying degrees of consistent/inconsistent output when looking a the `featureLoadings` and `sampleFactors` slots. For example, when I run sample 092a, the output is the same across all environments. When I run sample 094c, the output is identical between galaxy and conda, but different on my local R. Sample 092b is consistent between local R and galaxy but different on conda, while Sample 120d is different between all environments. There are several samples that do output identical values between all environments, but it is very unpredictable.

Below are some example values for samples and environments:

[CoGAPS_Output.xlsx](https://github.com/user-attachments/files/22787458/CoGAPS_Output.xlsx)

I'm staying consistent at CoGAPS Version 3.26.0, but have also tested with 3.28.0 and 3.29.1, with the same results.

Do you have any insights as to why this might be happening? Is there a minimum data requirement? Do all dependencies need to be the same version in order for the output to be consistent? What level of difference is to be expected?

I will include the code I'm using below, however it is the same script used for CoGAPS in the BTC pipeline. The parameters have stayed consistent at `1000` genes, `5` patterns, `100` iterations, and `false` for sparsity.

Any help would be much appreciated!

```
#--------#
# COGAPS #
#--------#

library(CoGAPS)

#libraries fot plotting added later
library(dplyr)
library(tidyr)
library(ggplot2)
library(tibble)

#specify args
args <- commandArgs(TRUE)

dgcmatrix <- args[1]
n_top_genes <- args[2]
n_patterns <- args[3]
n_iterations <- args[4]
sparse_opt <- args[5]

#use sparse matrix to find top genes
sparse <- readRDS(dgcmatrix)
      
#select top 5K genes
message("finding top ", n_top_genes, " genes")

vars <- apply(sparse, 1, var)
if (length(vars) > length(n_top_genes)) {
  ngenes <- n_top_genes
} else {
  ngenes <- vars
}
top_genes <- order(vars, decreasing=TRUE)[1:ngenes]
sparse <- sparse[top_genes,]

message("selected top ", length(top_genes), " genes of ", length(vars))

data <- as.matrix(sparse)

#avoid errors with distributed params
dist_param <- NULL

#set params for patterns and iterations
params <- CogapsParams(seed=42,
                       nPatterns = as.numeric(n_patterns), #number of patterns CoGAPS will learn
                       nIterations = as.numeric(n_iterations), #number of iterations for each phase of the algorithm
                       sparseOptimization = as.logical(sparse_opt), #speeds up performance with sparse data (roughly >80 default uncertainty)
                       distributed=dist_param) #either "genome-wide" or "single-cell" indicating which distributed algorithm should be used

#run cogaps
cogapsResult <- CoGAPS(data = data, params = params, outputFrequency = 100)

sample_id <- tools::file_path_sans_ext(basename(dgcmatrix))
outdir <- "cogaps_result"
dir.create(outdir, showWarnings = FALSE, recursive = TRUE)

outfile <- file.path(outdir, paste0(sample_id, "_cogapsResult.rds"))
saveRDS(cogapsResult, file = outfile)
message("Saving cogapsResult to ", outfile)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistencies in CoGAPS results across environments #147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistencies in CoGAPS results across environments #147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions