Skip to content

Questions about filtering process of zumi pipline when using it to analyze Smartseq3xpress raw sequence data #413

@wxginpumc

Description

@wxginpumc

Hi
I have got the raw sequencing data of 293T cell(MGI, paired end 150bp), and this data has been demultiplexed. So I followed steps "https://github.com/sdparekh/zUMIs/wiki/Starting-from-demultiplexed-fastq-files" to obtain input files. When I run zumis, I met following problems:

(R3.6.0) wxg@wxg-VMware-Virtual-Platform:~$ /home/wxg/ZhuoMian/LearnzUMI/zUMIs-main/zUMIs.sh -y /mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/TEST_zUMI_yaml_293t.yaml

You provided these parameters:
YAML file: /mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/TEST_zUMI_yaml_293t.yaml
zUMIs directory: /home/wxg/ZhuoMian/LearnzUMI/zUMIs-main
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable Rscript
RAM limit: 40
zUMIs version 2.9.7e

2025年 11月 25日 星期二 14:58:13 CST
WARNING: The STAR version used for mapping is 2.7.11b and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.11b.
Filtering...
2025年 11月 25日 星期二 15:00:03 CST
[1] "Warning! None of the annotated barcodes were detected."
[1] "Less than 100 barcodes present, will continue with all barcodes..."
[1] " reads were assigned to barcodes that do not correspond to intact cells."
Error in setnames(x, value) :
无法将 0 个名字赋值给一个包含 1 列的 data.table
Calls: BCbin ... names<-.data.table -> setnames -> stopf -> raise_condition -> signal
停止执行
Mapping...
[1] "2025-11-25 15:00:04 CST"
STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /mnt/disk16t/JiYInJia_sequence/293-20251105/primary_293t_sequenceData/293_STAR/STAR_Results --sjdbGTFfile /mnt/disk16t/JiYInJia_sequence/293-20251105/primary_293t_sequenceData/293_STAR/Homo_sapiens.GRCh38.114.gtf --runThreadN 14 --sjdbOverhang 149 --readFilesType SAM PE --twopassMode Basic --readFilesIn /mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlaa.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlab.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlac.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlad.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlae.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlaf.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlag.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlah.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlai.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlaj.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlak.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlal.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlam.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlan.filtered.tagged.bam,/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/.tmpMerge//zUMI_yaml.zUMI_yamlao.filtered.tagged.bam --outFileNamePrefix /mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMI_yaml.filtered.tagged.
STAR version: 2.7.11b compiled: 2025-07-20T19:06:27+08:00 :/home/wxg/桌面/LearnzUMI/STAR-master/source
Nov 25 15:00:24 ..... started STAR run
Nov 25 15:00:24 ..... loading genome
Nov 25 15:06:26 ..... processing annotations GTF
Nov 25 15:07:02 ..... inserting junctions into the genome indices
Nov 25 15:07:54 ..... started 1st pass mapping
Nov 25 15:15:11 ..... finished 1st pass mapping
Nov 25 15:15:12 ..... inserting junctions into the genome indices
Nov 25 15:16:05 ..... started mapping
Nov 25 15:24:23 ..... finished mapping
Nov 25 15:24:25 ..... finished successfully
2025年 11月 25日 星期二 15:24:25 CST
Counting...
[1] "2025-11-25 15:24:29 CST"
Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) :
文件 '/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/zUMI_yamlkept_barcodes_binned.txt' 不存在, 或不可读. getwd()=='/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output'
Calls: fread -> stopf -> raise_condition -> signal
停止执行
2025年 11月 25日 星期二 15:24:29 CST
载入需要的程辑包:yaml
载入需要的程辑包:Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : 无法打开链结
Calls: rds_to_loom -> readRDS -> gzfile
此外: Warning message:
In gzfile(file, "rb") :
无法打开压缩文件'/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/expression/zUMI_yaml.dgecounts.rds',可能是因为'没有那个文件或目录'
停止执行
2025年 11月 25日 星期二 15:24:30 CST
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2025-11-25 15:24:31 CST"
Error in gzfile(file, "rb") : 无法打开链结
Calls: readRDS -> gzfile
此外: Warning message:
In gzfile(file, "rb") :
无法打开压缩文件'/mnt/disk16t/JiYInJia_sequence/293-20251105/test_zumi_293/output/zUMIs_output/expression/zUMI_yaml.dgecounts.rds',可能是因为'没有那个文件或目录'
停止执行
2025年 11月 25日 星期二 15:25:30 CST
(R3.6.0) wxg@wxg-VMware-Virtual-Platform:~$

I noticed that the kept_barcodes_binned.txt was not generated after STAR command, and this was the main problem. If we trace back the cause, I suspect that the filtering step failed to detect the "quality-approved" barcodes(”None of the annotated barcodes were detected.“). But when I tested sample data from Samrtseq3xpress paper(Scalable single-cell RNA sequencing from full transcripts with Smart-seq3xpress), I got three files including kept_barcodes_binned.txt, kept_barcodes.txt and .BCbinning.txt. and I could run the workflow without any issues. The yaml files and other input were uploaded, and I hope you could help me solve these problems.
The uploaded data came from two 293T cells, I only wished to use these data to take a test.
I really feel sorry to upload the chinese command due to my VMware version. Hope you could give me a hand.

TEST_zUMI_yaml_293t.yaml

reads_for_zUMIs.expected_barcodes.txt

reads_for_zUMIs.samples.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions