Skip to content

Commit 7069761

Browse files
authored
docs: added docs for LLM agent integration with scvi-tools (#3741)
1 parent a63e223 commit 7069761

File tree

11 files changed

+274
-2
lines changed

11 files changed

+274
-2
lines changed

docs/user_guide/use_case/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
custom_dataloaders
77
downstream_analysis_tasks
88
hyper_parameters_tuning
9+
llm_assisted_analysis
910
multi_gpu_training
1011
saving_and_loading_models
1112
scvi_criticism
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Using LLM Engines with scvi-tools
2+
3+
Large language models (LLMs) can significantly lower the barrier to using scvi-tools by helping researchers write code, choose models, tune parameters, and troubleshoot analyses through natural language. This page covers how to leverage five popular AI platforms—Claude, ChatGPT, OpenClaw, Gemini, and BioMNI—to get the most out of scvi-tools.
4+
5+
---
6+
7+
## Claude (Anthropic)
8+
9+
Claude is a general-purpose AI assistant from Anthropic. For scvi-tools users, Claude offers a dedicated **scvi-tools Skill Bundle**—a curated set of skills covering the full scvi-tools ecosystem.
10+
11+
### scvi-tools Skill Bundle
12+
13+
The skill bundle gives Claude deep knowledge of scvi-tools workflows and includes guidance for:
14+
15+
- **Batch integration**: scVI and scArches
16+
- **Cell type annotation**: SCANVI and CellAssign
17+
- **Spatial analysis**: DestVI, Tangram, Cell2location, Stereoscope
18+
- **Epigenetic data**: PeakVI and scBasset
19+
- **Multimodal integration**: TotalVI (CITE-seq) and MultiVI (RNA+ATAC)
20+
- **Perturbation studies**: contrastiveVI
21+
22+
Each skill covers recommended workflows, parameter guidance, and troubleshooting tips.
23+
24+
### Installation
25+
26+
**Claude Code users:**
27+
```bash
28+
/plugin install scvi-tools@life-sciences
29+
```
30+
31+
**Claude.ai users:**
32+
Organization admins can upload the skill ZIP via *Admin Settings > Skills*. Individual users can upload via *Settings > Capabilities > Skills*. Download instructions and the skill ZIP are provided in the [Anthropic tutorial](https://claude.com/resources/tutorials/how-to-use-the-scvi-tools-bioinformatics-skill-bundle-with-claude).
33+
34+
Once installed, you can ask questions like:
35+
> "I have 10x Chromium data from 3 donors with different sequencing depths. Which scvi-tools model should I use for integration, and what batch key should I set?"
36+
37+
See the full tutorial at [Anthropic's scvi-tools Skill Bundle guide](https://claude.com/resources/tutorials/how-to-use-the-scvi-tools-bioinformatics-skill-bundle-with-claude).
38+
39+
---
40+
41+
## ChatGPT (OpenAI)
42+
43+
ChatGPT can assist with scvi-tools through two complementary routes: custom GPTs and MCP (Model Context Protocol) tool integrations.
44+
45+
### Custom GPTs
46+
47+
OpenAI's GPT Store hosts community-built GPTs specialized in single-cell analysis. For example, the [Scanpy – Your Single-Cell RNA-seq Data Analyst](https://chatgpt.com/g/g-GKNExWk2P-scanpy-your-single-cell-rna-seq-data-analyst) GPT is configured to assist with scanpy-based workflows, which pair naturally with scvi-tools preprocessing pipelines.
48+
49+
You can use such GPTs to:
50+
- Walk through an end-to-end scRNA-seq analysis
51+
- Get scvi-tools code snippets for common operations
52+
- Debug errors from scvi-tools model training
53+
54+
### MCP (Model Context Protocol) Tool Use
55+
56+
OpenAI supports [tool use via the API](https://developers.openai.com/api/docs/guides/tools/), which enables agents to call external functions—including Python code execution. This makes it possible to build automated pipelines where ChatGPT generates and runs scvi-tools code on your data.
57+
58+
A simple agent prompt example:
59+
> "Load the AnnData file at `data/pbmc.h5ad`, run scVI with 2 batches defined by `adata.obs['batch']`, train for 400 epochs, and return the UMAP coordinates."
60+
61+
With tool use enabled, ChatGPT can generate the code and invoke a Python execution environment to produce results.
62+
63+
---
64+
65+
## OpenClaw
66+
67+
[OpenClaw](https://lobehub.com/skills/k-dense-ai-claude-scientific-skills-scvi-tools) (available via the LobeHub market) provides an installable skill focused on scvi-tools for use with Claude-based agents. It is optimized for researchers who need rigorous statistical frameworks and multi-batch integration.
68+
69+
### Installation
70+
71+
```bash
72+
# Register your agent (one-time)
73+
npx -y @lobehub/market-cli register --name "YourName" --source open-claw
74+
75+
# Install the scvi-tools skill
76+
npx -y @lobehub/market-cli skills install k-dense-ai-claude-scientific-skills-scvi-tools
77+
```
78+
79+
After installation, read the `SKILL.md` file in the extracted directory for usage instructions. The skill covers:
80+
81+
- Probabilistic batch correction and dataset alignment
82+
- Multi-modal analysis (CITE-seq, spatial, multiome)
83+
- Uncertainty quantification in differential expression
84+
- Cell annotation with transfer learning
85+
86+
This skill is best suited for users who want a lightweight Claude-compatible skill without requiring the full Claude.ai platform.
87+
88+
---
89+
90+
## Gemini (Google)
91+
92+
Gemini is Google's general-purpose LLM, accessible via [Google AI Studio](https://aistudio.google.com) and the Gemini API. While there is no dedicated scvi-tools skill for Gemini, it is effective for assisted code generation, debugging, and conceptual guidance when working with scvi-tools.
93+
94+
### General Use
95+
96+
Gemini can help with scvi-tools through natural language prompting for:
97+
- Generating scvi-tools setup and training code
98+
- Explaining model outputs and hyperparameters
99+
- Suggesting appropriate models for your data type
100+
101+
**Example prompt in AI Studio:**
102+
> "Write Python code to train an scVI model on an AnnData object with a batch column called `sample_id`, then extract the latent embedding and run a UMAP."
103+
104+
You can paste scvi-tools error messages, documentation excerpts, or code snippets directly into the chat to get targeted assistance.
105+
106+
---
107+
108+
## BioMNI (Stanford)
109+
110+
[BioMNI](https://biomni.stanford.edu/) is a general-purpose biomedical AI agent from Stanford, designed to autonomously execute research tasks across diverse biomedical subfields. It has a native integration with the scverse ecosystem—including scvi-tools—announced in 2025.
111+
112+
### Integration with scverse and scvi-tools
113+
114+
BioMNI understands biological context and can orchestrate multi-step pipelines across scverse packages (Scanpy, scvi-tools, Squidpy, Pertpy) from plain-language instructions. Crucially, all agent-generated code is packaged as reproducible Jupyter notebooks.
115+
116+
**Example prompts:**
117+
> "Run QC and normalization, integrate my three batches using scVI, cluster the cells, and annotate cell types using SCANVI."
118+
119+
> "Cluster cells and identify marker genes for each cluster."
120+
121+
BioMNI handles parameter selection, dependency management, and returns documented, reproducible results—no manual coding required.
122+
123+
### Access
124+
125+
- **Web platform**: [biomni.stanford.edu](https://biomni.stanford.edu/) — interactive, no setup required
126+
- **Open-source**: [github.com/snap-stanford/Biomni](https://github.com/snap-stanford/Biomni) — self-hosted deployment
127+
128+
BioMNI is particularly well-suited for biologists who want to run complete single-cell and spatial workflows without writing code, while still producing reproducible, shareable analyses.
129+
130+
---
131+
132+
## Summary
133+
134+
| Platform | Best For | scvi-tools Integration |
135+
|---|---|---|
136+
| **Claude** | Guided workflows, parameter tuning, troubleshooting | Dedicated skill bundle with full model coverage |
137+
| **ChatGPT** | Code generation, custom GPTs, agentic pipelines | Custom GPTs + MCP tool use |
138+
| **OpenClaw** | Lightweight Claude-based skill, CLI install | Installable scvi-tools skill via LobeHub |
139+
| **Gemini** | General code assistance, AI Studio prompting | General LLM assistance; no dedicated skill |
140+
| **BioMNI** | End-to-end automated scverse pipelines | Native scverse/scvi-tools integration |
141+
142+
Each platform offers a different trade-off between ease of use, customization, and depth of scvi-tools knowledge. For users who primarily want guidance and code examples, Claude's skill bundle or BioMNI provide the deepest integration. For users building custom pipelines or agentic workflows, ChatGPT's MCP tool use or BioMNI's open-source deployment offer the most flexibility.
143+
144+
:::{note}
145+
LLM-generated code should always be reviewed before running on important data. Check that model parameters, batch keys, and data shapes match your specific dataset.
146+
:::

src/scvi/external/decipher/_model.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,41 @@ def train(
9393
plan_kwargs: dict | None = None,
9494
**trainer_kwargs,
9595
):
96+
"""Train the model.
97+
98+
Wraps :meth:`~scvi.model.base.PyroSviTrainMixin.train` with Decipher-specific
99+
defaults (``early_stopping_monitor="nll_validation"`` and ``drop_last=True``).
100+
101+
Parameters
102+
----------
103+
max_epochs
104+
Number of passes through the dataset.
105+
accelerator
106+
Supports passing different accelerator types ``("cpu", "gpu", "tpu", "ipu",
107+
"hpu", "mps", "auto")`` as well as custom accelerator instances.
108+
device
109+
The device to use. Can be set to a non-negative index (int or str) or ``"auto"``
110+
for automatic selection.
111+
train_size
112+
Size of training set in the range ``[0.0, 1.0]``.
113+
validation_size
114+
Size of the validation set. If ``None``, defaults to ``1 - train_size``.
115+
shuffle_set_split
116+
Whether to shuffle indices before splitting.
117+
batch_size
118+
Minibatch size to use during training.
119+
early_stopping
120+
Perform early stopping. Additional arguments can be passed in ``**trainer_kwargs``.
121+
training_plan
122+
Training plan instance. If ``None``, a default :class:`~scvi.train.PyroTrainingPlan`
123+
is used.
124+
datasplitter_kwargs
125+
Additional keyword arguments passed into :class:`~scvi.dataloaders.DataSplitter`.
126+
plan_kwargs
127+
Keyword arguments for :class:`~scvi.train.PyroTrainingPlan`.
128+
**trainer_kwargs
129+
Additional keyword arguments passed to :class:`~scvi.train.Trainer`.
130+
"""
96131
if "early_stopping_monitor" not in trainer_kwargs:
97132
trainer_kwargs["early_stopping_monitor"] = "nll_validation"
98133
datasplitter_kwargs = datasplitter_kwargs or {}

src/scvi/external/mrvi/_model.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,21 @@ def load(
226226
else:
227227
raise ValueError("Unknown backend . Use 'torch' or 'jax' MRVI.")
228228

229+
def differential_expression(self, *args, **kwargs):
230+
"""Perform differential expression analysis.
231+
232+
Delegates to the underlying :class:`~scvi.external.TorchMRVI` or
233+
:class:`~scvi.external.JaxMRVI` instance returned by the constructor.
234+
235+
See Also
236+
--------
237+
:meth:`~scvi.external.TorchMRVI.differential_expression`
238+
"""
239+
raise NotImplementedError(
240+
"Call differential_expression on the TorchMRVI or JaxMRVI instance "
241+
"returned by MRVI(...)."
242+
)
243+
229244

230245
def peek_loaded_model_registry(dir_path, prefix):
231246
"""Getting the loaded model registry to give better warnings for loading MRVI"""

src/scvi/external/poissonvi/_model.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,12 @@ def m1_domain_fn(samples):
388388
def differential_expression(
389389
self,
390390
):
391-
# Refer to function differential_accessibility
391+
"""Not implemented. Use :meth:`~scvi.external.POISSONVI.differential_accessibility` instead
392+
393+
Raises
394+
------
395+
NotImplementedError
396+
"""
392397
msg = (
393398
f"differential_expression is not implemented for {self.__class__.__name__}, please "
394399
f"use {self.__class__.__name__}.differential_accessibility"

src/scvi/external/resolvi/_model.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,22 @@ def _prepare_data(
393393
adata.obsm["distance_neighbor"] = distance_neighbor
394394

395395
def compute_dataset_dependent_priors(self, n_small_genes=None):
396+
"""Compute dataset-dependent prior parameters for the ResolVI model.
397+
398+
Estimates background expression ratio and spatial kernel size from the data,
399+
which are used as priors during training.
400+
401+
Parameters
402+
----------
403+
n_small_genes
404+
Number of low-expressed genes used to estimate the background ratio.
405+
If ``None``, defaults to ``n_genes // 50``.
406+
407+
Returns
408+
-------
409+
dict with keys ``"background_ratio"``, ``"median_distance"``,
410+
``"mean_log_counts"``, and ``"std_log_counts"``.
411+
"""
396412
x = self.adata_manager.get_from_registry(REGISTRY_KEYS.X_KEY)
397413
n_small_genes = x.shape[1] // 50 if n_small_genes is None else int(n_small_genes)
398414
# Computing library size over low-expressed genes (expectation for the background).

src/scvi/external/velovi/_model.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -901,6 +901,15 @@ def get_gene_likelihood(
901901

902902
@torch.inference_mode()
903903
def get_rates(self):
904+
"""Return the learned splicing, degradation, and transcription rates.
905+
906+
Returns
907+
-------
908+
dict with keys ``"beta"`` (splicing), ``"gamma"`` (degradation),
909+
``"alpha"`` (transcription on-state), ``"alpha_1"`` (transcription off-state),
910+
and ``"lambda_alpha"`` (switching rate), each as a numpy array of shape
911+
``(n_genes,)``.
912+
"""
904913
gamma, beta, alpha, alpha_1, lambda_alpha = self.module._get_rates()
905914

906915
return {
@@ -950,6 +959,27 @@ def get_directional_uncertainty(
950959
gene_list: Iterable[str] = None,
951960
n_jobs: int = -1,
952961
):
962+
"""Compute directional uncertainty of RNA velocity.
963+
964+
Estimates the uncertainty of the velocity vector direction for each cell
965+
by sampling from the posterior and computing pairwise cosine similarities.
966+
967+
Parameters
968+
----------
969+
adata
970+
AnnData object. If ``None``, uses the AnnData passed during model initialization.
971+
n_samples
972+
Number of posterior samples for estimating uncertainty.
973+
gene_list
974+
List of genes to use. If ``None``, uses all genes.
975+
n_jobs
976+
Number of parallel jobs for cosine similarity computation.
977+
``-1`` uses all available cores.
978+
979+
Returns
980+
-------
981+
Tuple of (DataFrame of directional statistics per cell, cosine similarity matrix).
982+
"""
953983
adata = self._validate_anndata(adata)
954984

955985
logger.info("Sampling from model...")

src/scvi/model/_jaxscvi.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,8 +185,10 @@ def get_latent_representation(
185185
return self.module.as_numpy_array(latent)
186186

187187
def to_device(self, device):
188+
"""Move model to device. No-op for JAX models (device placement is handled by JAX)."""
188189
pass
189190

190191
@property
191192
def device(self):
193+
"""The current device that the module's params are on."""
192194
return self.module.device

src/scvi/model/base/_base_model.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1271,10 +1271,28 @@ def update_setup_method_args(self, setup_method_args: dict):
12711271
self._registry[_SETUP_ARGS_KEY].update(setup_method_args)
12721272

12731273
def get_normalized_expression(self, *args, **kwargs):
1274+
"""Not implemented for this model class.
1275+
1276+
Available in RNA models that inherit from
1277+
:class:`~scvi.model.base.RNASeqMixin`.
1278+
1279+
Raises
1280+
------
1281+
NotImplementedError
1282+
"""
12741283
msg = f"get_normalized_expression is not implemented for {self.__class__.__name__}."
12751284
raise NotImplementedError(msg)
12761285

12771286
def differential_abundance(self, *args, **kwargs):
1287+
"""Not implemented for this model class.
1288+
1289+
Available in models that inherit from
1290+
:class:`~scvi.model.base.VAEMixin`.
1291+
1292+
Raises
1293+
------
1294+
NotImplementedError
1295+
"""
12781296
msg = f"differential_abundance is not implemented for {self.__class__.__name__}."
12791297
raise NotImplementedError(msg)
12801298

src/scvi/train/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
AdversarialTrainingPlanConfig,
1010
ClassifierTrainingPlanConfig,
1111
JaxTrainingPlanConfig,
12+
KwargsConfig,
1213
LowLevelPyroTrainingPlanConfig,
1314
PyroTrainingPlanConfig,
1415
SemiSupervisedAdversarialTrainingPlanConfig,
@@ -54,6 +55,7 @@
5455
"ScibCallback",
5556
"METRIC_KEYS",
5657
"JaxTrainingPlanConfig",
58+
"KwargsConfig",
5759
]
5860

5961

0 commit comments

Comments
 (0)