GitHub - VerisimilitudeX/DNAnalyzer: Precision genomics for everyone, everywhere. Powered by private AI.

Next-Generation On-Device DNA Insights

Private. Precise. Powered by AI.

About

DNAnalyzer is an AI-powered, privacy-first platform for genomic analysis. All computation happens locally on your device, so genetic data never leaves your machine. The project is fiscally sponsored by Hack Club's 501(c)(3) (EIN 81-2908499).

Founded by Piyush Acharya with 50 contributors drawn from Microsoft Research, the University of Macedonia, and Northeastern University.

Supported by the Claude for Student Builders program (Anthropic API credits) and the YC AI Student Starter Pack (over $25,000 in AI-devtool credits across Azure, AWS, OpenAI, Anthropic, xAI, and more) as a participant in YC AI Startup School.

Why It Matters

Industry Standard	DNAnalyzer
$100 average cost for DNA sequencing	Free analysis
Up to $600 for basic health insights	No usage fees
78% of testing companies share genetic data with third parties	100% local: no data leaves your device
Breaches regularly expose millions of users (e.g. 23andMe, 6.9M users in 2023)	Zero central storage

Compromised genetic data is permanent. Unlike a password, you cannot change it.

Core Capabilities

Capability	Description
Codon and protein detection	Identifies protein-coding regions, amino-acid chains, and genomic indicators
GC-rich region analysis	Locates promoter regions by 45 to 60 percent GC content
Promoter element identification	Detects BRE, TATA, INR, and DPE transcription initiation elements
Neurological genomic markers	Screens for variants linked to autism, ADHD, and schizophrenia
Multi-format FASTA integration	Parses FASTA, FASTQ, and plain-text input from uploads or external sources
CLI automation	Command-line interface for scripting and batch analysis
Ancestry estimation	Continental ancestry from 23andMe or AncestryDNA exports, on device
Polygenic risk scoring	Per-variant contribution reports with missing-variant flags
Smith-Waterman alignment	Optional PyOpenCL GPU acceleration with a pure-Python CPU fallback

Quickstart

Docker (no Java install required)

git clone https://github.com/VerisimilitudeX/DNAnalyzer.git
cd DNAnalyzer
docker compose up --build

Once the containers are up, the stack listens on your machine at:

Service	URL
Web UI	`http://localhost:3000`
REST API	`http://localhost:8080`
Swagger docs	`http://localhost:8080/swagger-ui/index.html`

Stop with docker compose down.

Manual build

git clone https://github.com/VerisimilitudeX/DNAnalyzer.git
cd DNAnalyzer
./gradlew build

This produces two jars under build/libs/:

Jar	Purpose	Entry point
`DNAnalyzer-<version>-boot.jar`	Spring Boot REST API	`DNAnalyzer.api.ApiApplication`
`DNAnalyzer-<version>-plain.jar`	CLI fat jar	`DNAnalyzer.Main`

Running the CLI

The simplest path is the launcher script, which auto-selects a jar or falls back to gradle run:

./easy_dna.sh your_file.fa basic       # Standard analysis
./easy_dna.sh your_file.fa detailed    # Comprehensive analysis
./easy_dna.sh your_file.fa mutations   # Generate mutations
./easy_dna.sh your_file.fa all         # Complete suite
./easy_dna.sh your_file.fa custom      # Interactive mode

Override the jar path with DNANALYZER_JAR=/path/to/jar if needed.

The equivalent direct invocation:

java -jar build/libs/DNAnalyzer-1.2.1-plain.jar your_file.fa

Analysis profiles

java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --profile research your_file.fa
# Available: basic, detailed, quick, research, mutation, clinical
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --profile list

Output layout

Each CLI run writes into a timestamped directory under output/:

output/dnanalyzer_output_<filename>_<timestamp>/
  charts/     # QC visualizations (PNG)
  sequences/  # Generated mutations and processed sequences (FASTA)
  reports/    # Analysis reports and summaries (HTML)

Optional: AI-generated reports

When an OpenAI key is available, each run produces a researcher report and a layperson report alongside the numeric output.

export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o-mini     # optional

Pass --no-ai to skip the model call.

REST API

Start the API alone with ./gradlew bootRun. All endpoints live under /api/v1.

Endpoint	Method	Description
`/api/v1/status`	GET	Health check and version metadata
`/api/v1/analyze`	POST (multipart)	Full analysis pipeline on an uploaded FASTA/FASTQ/plain-text sequence
`/api/v1/base-pairs`	POST (JSON)	Base-pair counts, percentages, and GC content
`/api/v1/reading-frames`	POST (JSON)	Open reading frames (forward and reverse)
`/api/v1/find-proteins`	POST (JSON)	Top 10 candidate proteins by length
`/api/v1/manipulate`	POST (JSON)	Reverse, complement, or reverse-complement a sequence
`/api/v1/parse`	POST (multipart)	Extract the first sequence record from FASTA/FASTQ/plain uploads
`/api/v1/analyze-genetic`	POST (multipart)	Score 23andMe/AncestryDNA genotype files against bundled PRS panels

curl -F [email protected] http://localhost:8080/api/v1/analyze

curl -X POST http://localhost:8080/api/v1/base-pairs \
     -H 'Content-Type: application/json' \
     -d '{"sequence": "ATGCGCATTA"}'

curl -F geneticFile=@my_23andme.txt -F snpAnalysis=true \
     http://localhost:8080/api/v1/analyze-genetic

Full reference: docs/API_REFERENCE.md.

Polygenic Risk Scores

./gradlew run --args='--23andme my_data.txt --prs assets/risk/heart_disease_prs.csv sample.fa'

The CLI parses the standard tab-delimited 23andMe export, aligns it with each provided weight table, and reports the raw and normalized contribution of every SNP. Missing or uncallable variants are flagged so you can assess coverage before acting on a score.

Walkthrough and example outputs: docs/usage/polygenic-risk-scoring.md.

Trait predictions are educational only. Do not use them for medical decisions.

GPU-Accelerated Smith-Waterman

Run the Python module directly:

python -m src.python.gpu_smith_waterman SEQ1 SEQ2

Or invoke it from the CLI by combining --sw-align with --align:

java -jar build/libs/DNAnalyzer-1.2.1-plain.jar sample.fa --align reference.fa --sw-align
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --align query.fa reference.fa --sw-align

Implementation notes: docs/developer/GPU_Smith_Waterman.md.

Packaging Analysis Sessions

Archive a run (inputs, logs, HTML report) into a timestamped ZIP:

./scripts/package-session.sh sample.fa

Documentation

Entry points for humans and AI agents:

Doc	Purpose
AGENTS.md	Orientation for agentic AI and automation
docs/README.md	Index of all documentation
docs/ARCHITECTURE.md	System architecture and diagrams
docs/REPOSITORY_MAP.md	Directory-by-directory guide
docs/API_REFERENCE.md	REST and CLI reference
docs/getting-started.md	First-time setup
SECURITY.md	Security policy and private reporting

Contributing

Contributions are welcome at every experience level.

Impact Metrics

Metric	Current Value
GitHub Stars	179
Forks	75
Contributors	51
Merged pull requests	0
Release asset downloads	247

These numbers are refreshed by the metrics-refresh.yml workflow.

Academic Citation

@software{Acharya_DNAnalyzer_ML-Powered_DNA_2022,
  author  = {Acharya, Piyush},
  doi     = {10.5281/zenodo.14556577},
  month   = oct,
  title   = {{DNAnalyzer: ML-Powered DNA Analysis Platform}},
  url     = {https://github.com/VerisimilitudeX/DNAnalyzer},
  version = {3.6.1},
  year    = {2022}
}

Terms of Use

DNAnalyzer is provided "as-is". Use of this software implies acceptance of all associated risks and liabilities. DNAnalyzer disclaims responsibility for any loss or damage arising from its use. Contact: [email protected].

DNAnalyzer, (C) Piyush Acharya 2026. Fiscally sponsored 501(c)(3) nonprofit (EIN 81-2908499), licensed under the MIT License.

Project Growth

Support DNAnalyzer

23andMe

Get 10% off your order
DNAnalyzer earns $20 per referral

Ancestry Membership

Get up to 24% off membership
DNAnalyzer earns $10 per referral

Name		Name	Last commit message	Last commit date
Latest commit History 1,734 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
assets		assets
docs		docs
gradle/wrapper		gradle/wrapper
installer		installer
sample-plugins		sample-plugins
scripts		scripts
src		src
web		web
.deepsource.toml		.deepsource.toml
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE_zh.md		LICENSE_zh.md
README.md		README.md
SECURITY.md		SECURITY.md
build.gradle		build.gradle
docker-compose.yml		docker-compose.yml
easy_dna.sh		easy_dna.sh
gradlew		gradlew
gradlew.bat		gradlew.bat
nginx.conf		nginx.conf
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Next-Generation On-Device DNA Insights

About

Why It Matters

Core Capabilities

Quickstart

Docker (no Java install required)

Manual build

Running the CLI

Analysis profiles

Output layout

Optional: AI-generated reports

REST API

Polygenic Risk Scores

GPU-Accelerated Smith-Waterman

Packaging Analysis Sessions

Documentation

Contributing

Impact Metrics

Academic Citation

Terms of Use

Project Growth

Support DNAnalyzer

23andMe

Ancestry Membership

About

Uh oh!

Releases 15

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Next-Generation On-Device DNA Insights

About

Why It Matters

Core Capabilities

Quickstart

Docker (no Java install required)

Manual build

Running the CLI

Analysis profiles

Output layout

Optional: AI-generated reports

REST API

Polygenic Risk Scores

GPU-Accelerated Smith-Waterman

Packaging Analysis Sessions

Documentation

Contributing

Impact Metrics

Academic Citation

Terms of Use

Project Growth

Support DNAnalyzer

23andMe

Ancestry Membership

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 15

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages