Skip to content

NCI-RBL/scCHyMErA-Seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

116 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scCHyMErA-Seq

Code repository for the scCHyMErA-Seq project

DOI

scCHyMErA-Seq is a platform that enables efficient exon perturbations and gene knockouts, generating single-cell RNA-sequencing phenotypic readouts. To facilitate downstream analysis, this repository includes a ready-to-use pipeline built with scverse tools.

scCHyMErA_github_04292025


Input Files

To run the scCHyMErA-Seq pipeline, you’ll need:

  1. A matrix file (*matrix.h5) produced by Cell Ranger
  2. A metadata file containing cell barcodes and guide information

Generating Input Files

Use the Cell Ranger count pipeline for CRISPR Guide Capture analysis. Cell Ranger documentation

module load cellranger
cellranger count --id=s \
    --transcriptome=refdata-gex-GRCh38-2024-A \
    --libraries=library.csv \
    --feature-ref=feature_reference.csv \
    --create-bam=true

This will create matrix file and protospacer files along with many others

Example: library.csv

sample,fastqs,lanes,library_type
GEX,Sample_GEX,Any,Gene Expression
Cas9,Sample_Cas9,Any,CRISPR Guide Capture
Cas12a,Sample_Cas12a,Any,CRISPR Guide Capture

Loading Files and Downstream Analysis

Prerequisites

Install the following Python packages:


Usage

Quality Control

python qc_cells.py filtered_feature_bc_matrix.h5

Matrix Preprocessing & Mixscape

python scanpy_analysis_split.py
python scanpy_analysis_combined.py

Outputs:

  • UMAPs of all processed cells
  • Cluster-specific LDA plots (highlighted cluster vs grey others)

UMAP + Leiden Clustering

Arguments for scanpy_analysis_split.py and scanpy_analysis_combined.py:

Argument Description
-o, --out Output directory for plots (default: current working directory)
--analysis Type of analysis: KO or Exon (used in scanpy_analysis_split.py only)
--resolution Leiden clustering resolution (0–1; higher = more clusters)
-m, --matrix_input Path to input matrix file (.h5)
-a, --anno_csv Path to annotation file (CSV) with cell barcode and guide pairing

These scripts also generate inputs for chymeraseq.md and gprofiler_analysis.md

Example SLURM Job

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --open-mode=append
#SBATCH --time=1:00:00
#SBATCH --mem=300g
#SBATCH --job-name=schymeraseq

timestamp=$(date +%Y%m%d_%H%M)

export PYTHONHASHSEED=0
export NUMBA_CPU_NAME=generic

python scanpy_analysis_split.py -o ./ --analysis Exon --resolution 0.15 \
    -m filtered_feature_bc_matrix.h5 -a paired_hgRNA_calls_per_cell.csv \
    --timestamp $timestamp

python scanpy_analysis_split.py -o ./ --analysis KO --resolution 0.15 \
    -m filtered_feature_bc_matrix.h5 -a paired_hgRNA_calls_per_cell.csv \
    --timestamp $timestamp

python scanpy_analysis_combined.py -o ./ --resolution 0.15 \
    -m filtered_feature_bc_matrix.h5 -a paired_hgRNA_calls_per_cell.csv \
    --timestamp $timestamp

Bulk Differential Expression Analysis

To identify differentially expressed genes for each perturbation:

python pseudobulk_deg.py \
    -m filtered_feature_bc_matrix.h5 \
    -a paired_hgRNA_calls_per_cell.csv \
    -p exon_mxs_obs.csv \
    --timestamp $timestamp

Functional Enrichment Analysis

To run functional enrichment analysis using g:Profiler:

# --excel_file - Excel, csv or txt file generated by scanpy.get.rank_genes_groups_df()

python gprofiler_analysis.py \
--excel_file DEG_exons_mod.csv \
--out deg_exons_mod_0.5 \
--lfc_cutoff 0.5 \
--run_gprofiler

About

Code repository for scCHyMErA-Seq project

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •