Skip to content

VerisimilitudeX/DNAnalyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,734 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DNAnalyzer

Next-Generation On-Device DNA Insights

Private. Precise. Powered by AI.

Copyright Release Build Status DOI License: MIT

CodeQL OpenSSF Scorecard Dependabot DeepSource

About

DNAnalyzer is an AI-powered, privacy-first platform for genomic analysis. All computation happens locally on your device, so genetic data never leaves your machine. The project is fiscally sponsored by Hack Club's 501(c)(3) (EIN 81-2908499).

Founded by Piyush Acharya with 50 contributors drawn from Microsoft Research, the University of Macedonia, and Northeastern University.

Supported by the Claude for Student Builders program (Anthropic API credits) and the YC AI Student Starter Pack (over $25,000 in AI-devtool credits across Azure, AWS, OpenAI, Anthropic, xAI, and more) as a participant in YC AI Startup School.

Why It Matters

Industry Standard DNAnalyzer
$100 average cost for DNA sequencing Free analysis
Up to $600 for basic health insights No usage fees
78% of testing companies share genetic data with third parties 100% local: no data leaves your device
Breaches regularly expose millions of users (e.g. 23andMe, 6.9M users in 2023) Zero central storage

Compromised genetic data is permanent. Unlike a password, you cannot change it.

Core Capabilities

Capability Description
Codon and protein detection Identifies protein-coding regions, amino-acid chains, and genomic indicators
GC-rich region analysis Locates promoter regions by 45 to 60 percent GC content
Promoter element identification Detects BRE, TATA, INR, and DPE transcription initiation elements
Neurological genomic markers Screens for variants linked to autism, ADHD, and schizophrenia
Multi-format FASTA integration Parses FASTA, FASTQ, and plain-text input from uploads or external sources
CLI automation Command-line interface for scripting and batch analysis
Ancestry estimation Continental ancestry from 23andMe or AncestryDNA exports, on device
Polygenic risk scoring Per-variant contribution reports with missing-variant flags
Smith-Waterman alignment Optional PyOpenCL GPU acceleration with a pure-Python CPU fallback

Quickstart

Docker (no Java install required)

git clone https://github.com/VerisimilitudeX/DNAnalyzer.git
cd DNAnalyzer
docker compose up --build

Once the containers are up, the stack listens on your machine at:

Service URL
Web UI http://localhost:3000
REST API http://localhost:8080
Swagger docs http://localhost:8080/swagger-ui/index.html

Stop with docker compose down.

Manual build

git clone https://github.com/VerisimilitudeX/DNAnalyzer.git
cd DNAnalyzer
./gradlew build

This produces two jars under build/libs/:

Jar Purpose Entry point
DNAnalyzer-<version>-boot.jar Spring Boot REST API DNAnalyzer.api.ApiApplication
DNAnalyzer-<version>-plain.jar CLI fat jar DNAnalyzer.Main

Running the CLI

The simplest path is the launcher script, which auto-selects a jar or falls back to gradle run:

./easy_dna.sh your_file.fa basic       # Standard analysis
./easy_dna.sh your_file.fa detailed    # Comprehensive analysis
./easy_dna.sh your_file.fa mutations   # Generate mutations
./easy_dna.sh your_file.fa all         # Complete suite
./easy_dna.sh your_file.fa custom      # Interactive mode

Override the jar path with DNANALYZER_JAR=/path/to/jar if needed.

The equivalent direct invocation:

java -jar build/libs/DNAnalyzer-1.2.1-plain.jar your_file.fa

Analysis profiles

java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --profile research your_file.fa
# Available: basic, detailed, quick, research, mutation, clinical
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --profile list

Output layout

Each CLI run writes into a timestamped directory under output/:

output/dnanalyzer_output_<filename>_<timestamp>/
  charts/     # QC visualizations (PNG)
  sequences/  # Generated mutations and processed sequences (FASTA)
  reports/    # Analysis reports and summaries (HTML)

Optional: AI-generated reports

When an OpenAI key is available, each run produces a researcher report and a layperson report alongside the numeric output.

export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o-mini     # optional

Pass --no-ai to skip the model call.

REST API

Start the API alone with ./gradlew bootRun. All endpoints live under /api/v1.

Endpoint Method Description
/api/v1/status GET Health check and version metadata
/api/v1/analyze POST (multipart) Full analysis pipeline on an uploaded FASTA/FASTQ/plain-text sequence
/api/v1/base-pairs POST (JSON) Base-pair counts, percentages, and GC content
/api/v1/reading-frames POST (JSON) Open reading frames (forward and reverse)
/api/v1/find-proteins POST (JSON) Top 10 candidate proteins by length
/api/v1/manipulate POST (JSON) Reverse, complement, or reverse-complement a sequence
/api/v1/parse POST (multipart) Extract the first sequence record from FASTA/FASTQ/plain uploads
/api/v1/analyze-genetic POST (multipart) Score 23andMe/AncestryDNA genotype files against bundled PRS panels
curl -F [email protected] http://localhost:8080/api/v1/analyze

curl -X POST http://localhost:8080/api/v1/base-pairs \
     -H 'Content-Type: application/json' \
     -d '{"sequence": "ATGCGCATTA"}'

curl -F geneticFile=@my_23andme.txt -F snpAnalysis=true \
     http://localhost:8080/api/v1/analyze-genetic

Full reference: docs/API_REFERENCE.md.

Polygenic Risk Scores

./gradlew run --args='--23andme my_data.txt --prs assets/risk/heart_disease_prs.csv sample.fa'

The CLI parses the standard tab-delimited 23andMe export, aligns it with each provided weight table, and reports the raw and normalized contribution of every SNP. Missing or uncallable variants are flagged so you can assess coverage before acting on a score.

Walkthrough and example outputs: docs/usage/polygenic-risk-scoring.md.

Trait predictions are educational only. Do not use them for medical decisions.

GPU-Accelerated Smith-Waterman

Run the Python module directly:

python -m src.python.gpu_smith_waterman SEQ1 SEQ2

Or invoke it from the CLI by combining --sw-align with --align:

java -jar build/libs/DNAnalyzer-1.2.1-plain.jar sample.fa --align reference.fa --sw-align
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --align query.fa reference.fa --sw-align

Implementation notes: docs/developer/GPU_Smith_Waterman.md.

Packaging Analysis Sessions

Archive a run (inputs, logs, HTML report) into a timestamped ZIP:

./scripts/package-session.sh sample.fa

Documentation

Entry points for humans and AI agents:

Doc Purpose
AGENTS.md Orientation for agentic AI and automation
docs/README.md Index of all documentation
docs/ARCHITECTURE.md System architecture and diagrams
docs/REPOSITORY_MAP.md Directory-by-directory guide
docs/API_REFERENCE.md REST and CLI reference
docs/getting-started.md First-time setup
SECURITY.md Security policy and private reporting

Contributing

Contributions are welcome at every experience level.

Stars Issues Pull Requests Discord

Impact Metrics

Metric Current Value
GitHub Stars 179
Forks 75
Contributors 51
Merged pull requests 0
Release asset downloads 247

These numbers are refreshed by the metrics-refresh.yml workflow.

Academic Citation

@software{Acharya_DNAnalyzer_ML-Powered_DNA_2022,
  author  = {Acharya, Piyush},
  doi     = {10.5281/zenodo.14556577},
  month   = oct,
  title   = {{DNAnalyzer: ML-Powered DNA Analysis Platform}},
  url     = {https://github.com/VerisimilitudeX/DNAnalyzer},
  version = {3.6.1},
  year    = {2022}
}

Terms of Use

DNAnalyzer is provided "as-is". Use of this software implies acceptance of all associated risks and liabilities. DNAnalyzer disclaims responsibility for any loss or damage arising from its use. Contact: [email protected].

DNAnalyzer, (C) Piyush Acharya 2026. Fiscally sponsored 501(c)(3) nonprofit (EIN 81-2908499), licensed under the MIT License.

Project Growth

Star History Chart

Support DNAnalyzer

23andMe

Get 10% off your order
DNAnalyzer earns $20 per referral

23andMe Referral

Ancestry Membership

Get up to 24% off membership
DNAnalyzer earns $10 per referral

Ancestry Referral