Skip to content

[FEA] Dask-array based statistics on single cell data #412

@MPebworthEpana

Description

@MPebworthEpana

Is your feature request related to a problem? Please describe.
This may be a tall ask, but it would be great to have GPU-acceleration for single cell modeling. The current standard for highly accurate modeling on large complex human datasets is the MAST program (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5), or simply pseudobuling. Wilcoxons, t-test, and others have significant statistical flaws that undermine the accuracy of their results when applied to biological questions (like disease vs healthy and whatnot).

Even at sub-million cell sizes, MAST was slow. At 1+ million cells, it becomes unbearably slow. Being able to run MAST-like analysis in a Dask array-based AnnData would truly unlock complex statistical analysis of large scale scRNAseq analysis

Describe the solution you'd like
Dask-array based statistical modeling of scRNAseq, based on the known principles/variables that have been figured out by the MAST authors.

Dask-array based linear modeling has been implemented here:
https://ml.dask.org/modules/generated/dask_ml.linear_model.LinearRegression.html

Is there a CPU based implementation
A link to an implementation or paper with the suggested functionality

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions