Daylily Omics Analysis contains the Snakemake workflows, shell entrypoints, profile configuration, and run documentation used for Daylily whole-genome sequencing analysis. It is specifically tuned to run inside infrastructure created by daylily-ephemeral-cluster, with Daylily omics/reference data mounted on the headnode and compute nodes under /fsx/data.
This repository does not create, update, or destroy AWS infrastructure. Cluster lifecycle, FSx mounts, and production sample staging belong to daylily-ephemeral-cluster and its daylily-ec CLI. Use daylily-ec to stage reads and create or deliver the samples.tsv and units.tsv manifests for production worksets; this repo consumes those manifests from each analysis clone.
Current workflows use paired manifest tables:
| File | Purpose |
|---|---|
config/samples.tsv |
One row per biological sample and truth/control metadata. |
config/units.tsv |
One row per sequencing unit, lane, read pair, CRAM/BAM, or downsampled analysis unit. |
The legacy config/analysis_manifest.csv path is historical. Keep it only for old-run conversion notes.
SUBSAMPLE_PCT in units.tsv is supported for inline FASTQ downsampling. Values must be floats in (0.0, 1.0]; use na or an empty value when no downsampling is intended.
For production analyses, prefer manifests generated or staged by daylily-ec from the operator side. Hand-written copies are acceptable for focused debugging only when the paths, genome build, and /fsx/data reference resources have been verified.
For a local smoke test from a fresh checkout, use the existing DAY-EC environment. This verifies wiring and small fixtures; routine full workflows are expected to run on a prepared headnode. The fixture copy commands below write config/samples.tsv and config/units.tsv, so run them in a scratch checkout or preserve existing manifests first:
eval "$(conda shell.zsh hook)"
conda activate DAY-EC
source dyoainit
dy-a local hg38
mkdir -p config
cp .test_data/data/0.01xwgs_HG002_hg38.samples.tsv config/samples.tsv
cp .test_data/data/0.01xwgs_HG002_hg38.units.tsv config/units.tsv
dy-r produce_alignstats -p -j 1 -n
dy-r produce_alignstats -p -j 1For a Slurm-backed headnode run, connect through daylily-ec/SSM, then use a persistent workset clone. Stage production reads and manifests with daylily-ec before running workflow targets:
cd /fsx/analysis_results/ubuntu
day-clone -t <git-ref-or-tag> -d <workset-name>
cd /fsx/analysis_results/ubuntu/<workset-name>/daylily-omics-analysis
source dyoainit
dy-a slurm hg38
dy-r produce_snv_concordances -p -k -j 20 -n
dy-r produce_snv_concordances -p -k -j 20Run dy-r help for available targets and use tab completion after source dyoainit.
| Command | Purpose |
|---|---|
source dyoainit |
Initialize Daylily shell functions, environment checks, and completion. |
| `dy-a <local | slurm> ` |
dy-r <targets...> [flags] |
Compose and run the Snakemake command. |
dy-m [--workdir PATH] [--interval N] |
Monitor command history, master log, Slurm jobs, and recent task logs. |
| `dy-g <hg38 | hg38_broad |
dy-d reset |
Reset Daylily shell state. |
Common flags passed through dy-r:
| Flag | Meaning |
|---|---|
-n |
Dry-run. |
-p |
Print shell commands. |
-k |
Keep independent jobs running after a failure. |
-j N |
Limit concurrent Snakemake jobs. |
-T N |
Snakemake retry/attempt flag used by existing Daylily run commands. |
--rerun-incomplete |
Re-run incomplete outputs. |
--keep-incomplete |
Keep incomplete outputs for debugging failed jobs. |
--keep-temp |
Daylily convenience flag translated by bin/day_run to Snakemake --notemp. |
| Target | Typical use |
|---|---|
produce_alignstats |
Alignment statistics and aggregate alignstats_combo_mqc.tsv. |
produce_snv_concordances |
GIAB/RTG concordance outputs where truth metadata is present. |
produce_sentd_snv_vcf |
Illumina Sentieon DNAscope SNV calling. |
produce_deep19_snv_vcf |
DeepVariant 1.9 SNV calling. |
produce_sentdont_snv_vcf |
ONT Sentieon SNV calling. |
produce_sentdpb_snv_vcf |
PacBio Sentieon SNV calling. |
produce_sentdug_snv_vcf |
Ultima Genomics SNV calling, usually on hg38_broad. |
produce_cgt7p_snv_vcf |
Complete Genomics/MGI Sentieon DNAscope path using sentcg and cgt7p. |
produce_sentdhiom_snv_vcf |
Modular Illumina+ONT hybrid Sentieon workflow. |
produce_sentdhuom_snv_vcf |
Modular Ultima+ONT hybrid Sentieon workflow. |
produce_dmd_dedup_cram, produce_smd_dedup_cram, produce_na_dedup_cram |
Canonical dedup selector targets; dppl is accepted only as a deprecated alias for dmd. |
produce_all_align, produce_all_dedup_cram, produce_all_snv_vcf, produce_all_sv_vcf |
Run every registered selector in that stage, subject to manifest/platform compatibility. |
produce_bclconvert_fastqs, produce_bclconvert_metrics, produce_bclconvert_multiqc, produce_bclconvert_fastqs_and_metrics |
Illumina BCL Convert bootstrap, generated units, demux metrics, and MultiQC-ready BCL metric tables. |
produce_manta_sv_vcf, produce_tiddit_sv_vcf, produce_dysgu_sv_vcf |
Structural variant callers. |
produce_htd_calls |
Selected HTD/special callers from --config htd_callers=[...]. |
produce_verifybamid2_panel_comparison |
Runs selected VerifyBamID2 SNP panels from --config verifybamid2_panels=[...] and writes a comparison TSV. |
produce_multiqc_input_data |
MultiQC for input sequence-data QC. |
produce_multiqc_cram |
MultiQC for CRAM/alignment QC. |
produce_multiqc_snv, produce_multiqc_sv |
MultiQC for SNV and SV QC scopes. |
produce_multiqc_sample_qc |
MultiQC for sample-level QC such as contamination and relatedness. |
produce_multiqc_variant_annotation |
MultiQC for enabled annotation QC such as VEP. |
produce_multiqc_all |
Canonical final routine MultiQC aggregation. |
Legacy selector targets such as produce_sentD_vcf, produce_manta, and
produce_multiqc_final_wgs remain available for now, but are marked as
deprecated in the workflow and docs. Current examples should use the canonical
selector names above.
Complete Genomics T7+ and MGI-style WGS uses the dedicated sentcg -> smd -> cgt7p path. The canonical selector form avoids selector --config lists:
dy-r produce_sentcg_align produce_smd_dedup_cram produce_cgt7p_snv_vcf \
produce_alignstats produce_snv_concordances \
-p -j 20 -k -T 1 --retries 0 --rerun-incomplete --keep-incompleteThis path uses Sentieon BWA MEM with DNAscopeMGIWGS2.1.bundle/bwa.model, read group platform DNBSEQ, Sentieon duplicate marking, and DNAscope with DNAscopeMGIWGS2.1.bundle/dnascope.model plus --pcr_indel_model none.
See docs/workflows/complete_genomics_sentieon.md for model paths, output names, downsampling, and monitoring details.
| Item | Location |
|---|---|
| Results | results/day/<build>/ |
| Per-sample outputs | results/day/<build>/<sample>/ |
| Aggregate reports | results/day/<build>/other_reports/ |
| Benchmark summary | results/day/<build>/reports/benchmarks_summary.tsv |
| Snakemake master logs | .snakemake/log/<timestamp>.snakemake.log |
| Slurm logs | logs/slurm/<rule>/*.{out,err} |
| Command history | day_cmd.log |
| Completion markers | daylily.successful_run, daylily.failed_run |
When debugging, inspect logs in this order: latest .snakemake/log by mtime, relevant logs/slurm files by mtime, then the stable rule log under results/day/<build>/<sample>/.../logs/.
| Document | Purpose |
|---|---|
daylily-ephemeral-cluster |
Cluster lifecycle, headnode access, sample staging, and manifest generation. |
docs/README.md |
Documentation index and current/historical doc policy. |
docs/quickest_start.md |
Minimal smoke-test checklist. |
docs/first_ephemeral_cluster_analysis.md |
First headnode workset run. |
docs/ops/dycli.md |
CLI command behavior and monitoring. |
docs/ops/config.md |
Profiles, config precedence, sample/unit schema notes. |
docs/ops/tests.md |
Local validation commands. |
docs/ops/multiqc_qc_targets.md |
Staged MultiQC targets, runtime gating, and routine vs optional QC policy. |
docs/catalog_of_tools.md |
Code-sourced catalog of Daylily tool integrations, evidence, outputs, and tests. |
docs/ops/dir_and_file_scheme.md |
Current result layout and naming conventions. |
docs/ops/workflow_catalog.md |
Packaged workflow catalog API and current contents. |
docs/workflows/complete_genomics_sentieon.md |
Complete Genomics/MGI sentcg/smd/cgt7p workflow. |
docs/workflows/bclconvert_bootstrap.md |
Illumina BCL Convert bootstrap path, generated units, BCL metrics, and MultiQC custom-data integration. |
docs/workflows/ensemble_vcf.md |
Ensemble VCF workflow notes. |
docs/remote_test_execution.md |
Remote tmux/Slurm execution pattern. |
Top-level run notes such as run_cg.md, gotimeplan.md, hyb_runbook.md, hybrun.md, and ugdata.md are historical records for specific executions. Prefer the canonical docs above for new runs.
.
├── bin/ # dy-cli wrappers and utility scripts
├── config/ # profiles, genome/supporting config, sample/unit inputs
├── daylily_omics_analysis/ # packaged Python helpers, including workflow catalog
├── docs/ # canonical docs and historical notes
├── resources/ # staged supporting data
├── tests/ # shell and Python validation tests
└── workflow/ # Snakemake rules, envs, scripts, and schemas
For a documentation-only change, run:
git diff --check
bash tests/test_cli_commands.sh
bash tests/test_bclconvert_bootstrap.sh
python -m pytest tests/test_complete_genomics_sentieon.py tests/test_workflow_catalog.pyFor broad workflow changes, run the relevant target dry-run through dy-r after source dyoainit and dy-a <profile> <build>.