Skip to content

rpetit3/camlhmp

Repository files navigation

DOI

camlhmp

πŸͺ camlhmp πŸͺ - Classification through yAML Heuristic Mapping Protocol

camlhmp is a tool for generating organism typing tools from YAML schemas. Through discussions with Tim Read, we identified a need for a straightforward method to define and manage typing schemas for organisms of interest. YAML was chosen for its simplicity and readability.

Full documentation for camlhmp can be found at https://rpetit3.github.io/camlhmp/.

Purpose

The primary purpose of camlhmp is to provide a framework that enables researchers to independently define typing schemas for their organisms of interest using YAML. This approach facilitates the management and analysis biological data for researchers at any level of experience.

camlhmp does not supply pre-defined typing schemas. Instead, it equips researchers with the necessary tools to create and maintain their own schemas, ensuring these schemas can easily remain up to date with the latest scientific developments.

Finally, the development of camlhmp was driven by a practical need to streamline maintenance of multiple organism typing tools. Managing these tools separately is time-consuming and challenging. camlhmp simplifies this by providing a single framework for each tool.

Quick Start

To quickly get started with camlhmp, you can install it through Bioconda and run the command-line interface:

# Install camlhmp through Bioconda
conda create -n camlhmp -c conda-forge -c bioconda camlhmp
conda activate camlhmp
camlhmp --help

# Example usage of camlhmp-blast-alleles
# Acquire test data
wget https://raw.githubusercontent.com/rpetit3/camlhmp/refs/heads/main/tests/data/blast/alleles/spn-pbptype.yaml
wget https://raw.githubusercontent.com/rpetit3/camlhmp/refs/heads/main/tests/data/blast/alleles/spn-pbptype.fasta
wget https://github.com/rpetit3/camlhmp/raw/refs/heads/main/tests/data/blast/alleles/SRR2912551.fna.gz

# Run camlhmp-blast-alleles
camlhmp-blast-alleles \
    --yaml spn-pbptype.yaml \
    --targets spn-pbptype.fasta \
    --input SRR2912551.fna.gz

Running camlhmp-blast-alleless with following parameters:
    --input SRR2912551.fna.gz
    --yaml spn-pbptype.yaml
    --targets spn-pbptype.fasta
    --outdir ./
    --prefix camlhmp
    --min-pident 95
    --min-coverage 95

Starting camlhmp for S. pneumoniae PBP typing...
Running tblastn...
Processing hits...
Final Results...
                               S. pneumoniae PBP typing
┏━━━┳━━━┳━━━┳━━━┳━━━┳━━━┳━━━┳━━━┳━━━┳━━━━┳━━━┳━━━━┳━━━┳━━━━┳━━━┳━━━━┳━━━┳━━━━┳━━━┳━━━━┓
┃ … ┃ … ┃ … ┃ … ┃ … ┃ … ┃ … ┃ … ┃ … ┃ 1… ┃ … ┃ 2… ┃ … ┃ 2… ┃ … ┃ 2… ┃ … ┃ 2… ┃ … ┃ 2… ┃
┑━━━╇━━━╇━━━╇━━━╇━━━╇━━━╇━━━╇━━━╇━━━╇━━━━╇━━━╇━━━━╇━━━╇━━━━╇━━━╇━━━━╇━━━╇━━━━╇━━━╇━━━━┩
β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚    β”‚ 0 β”‚ 1… β”‚ … β”‚ 5… β”‚   β”‚ 2  β”‚ … β”‚ 1… β”‚ … β”‚    β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”˜
Writing outputs...
Final predicted type written to ./camlhmp.tsv
tblastn results written to ./camlhmp.tblastn.tsv

For more example commands and outputs, see the documentation for each command:

Installation

camlhmp is available through PyPI and Bioconda. While you can install it through PyPi, it is recommended to install it through BioConda so that non-Python dependencies are also installed.

System Requirements

camlhmp has been developed and tested on x86-64 Linux and macOS systems.

OS Architecture Supported?
Linux x86-64 βœ…
Linux aarch64 ❌ (missing dependencies)
macOS x86-64 βœ…
macOS arm64 ❌ (missing dependencies)
Windows x86-64 ❌ _(consider using WSL2) _

Tip

Docker containers are available from biocontainers/camlhmp which can be used with the --platform flag to run on Apple Silicon and ARM-based Linux systems.

Dependencies

camlhmp relies on the following dependencies:

dependencies:
  python:
    - biopython >=1.83
    - pyyaml >=6.0.1
    - executor >=23.2
    - rich >=13.7.1,<14
    - rich-click >=1.6.0
  non_python:
    - blast >=2.15.0
    - pigz

Bioconda Installation

conda create -n camlhmp -c conda-forge -c bioconda camlhmp
conda activate camlhmp
camlhmp
πŸͺ camlhmp πŸͺ - Classification through YAML Heuristic Mapping Protocol

Available camlhmp commands
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ command               ┃ description                                                          ┃
┑━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ camlhmp-blast-alleles β”‚ Classify assemblies using BLAST against alleles of a set of genes    β”‚
β”‚ camlhmp-blast-regions β”‚ Classify assemblies using BLAST against larger genomic regions       β”‚
β”‚ camlhmp-blast-targets β”‚ Classify assemblies using BLAST against individual genes or proteins β”‚
β”‚ camlhmp-extract       β”‚ Extract typing targets from a set of reference sequences             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

PyPi Installation

To install camlhmp through PyPi, you can can use pip:

pip install camlhmp
camlhmp
πŸͺ camlhmp πŸͺ - Classification through YAML Heuristic Mapping Protocol

Available camlhmp commands
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ command               ┃ description                                                          ┃
┑━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ camlhmp-blast-alleles β”‚ Classify assemblies using BLAST against alleles of a set of genes    β”‚
β”‚ camlhmp-blast-regions β”‚ Classify assemblies using BLAST against larger genomic regions       β”‚
β”‚ camlhmp-blast-targets β”‚ Classify assemblies using BLAST against individual genes or proteins β”‚
β”‚ camlhmp-extract       β”‚ Extract typing targets from a set of reference sequences             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Warning

Installing through PyPi will not install non-Python dependencies. You will need to ensure these are installed manually.

Citing camlhmp

If you make use of camlhmp in your analysis, please cite the following:

Naming

If I'm being honest, I really wanted to name a tool with "camel" in it because they are my wife's favorite animalπŸͺ and they also remind me of my friends in Oman!

Once it was decided YAML was going to be the format for defining schemas, I quickly stumbled on "Classification through YAML" and quickly found out I wasn't the only once who thought of "CAML". But, no matter, it was decided it would be something with "CAML", then Tim Read came with the save and suggested "Heuristic Mapping Protocol". So, here we are - camlhmp!

License

I'm not a lawyer and MIT has always been my go-to license. So, MIT it is!

Artificial Intelligence Disclaimer

As of v1.1.3, camlhmp has been developed with minimal assistance of Artificial Intelligence (AI). GitHub Copilot was used for auto-completion, but otherwise all code was written and reviewed by the author.

Funding

Support for this project came (in part) from the Wyoming Public Health Division, and the Center for Applied Pathogen Epidemiology and Outbreak Control (CAPE).

Wyoming Public Health Division Center for Applied Pathogen Epidemiology and Outbreak Control

About

πŸͺClassification through yAML Heuristic Mapping Protocol πŸͺ

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages