DNAnalyzer is an AI-powered, privacy-first platform for genomic analysis. All computation happens locally on your device, so genetic data never leaves your machine. The project is fiscally sponsored by Hack Club's 501(c)(3) (EIN 81-2908499).
Founded by Piyush Acharya with 50 contributors drawn from Microsoft Research, the University of Macedonia, and Northeastern University.
Supported by the Claude for Student Builders program (Anthropic API credits) and the YC AI Student Starter Pack (over $25,000 in AI-devtool credits across Azure, AWS, OpenAI, Anthropic, xAI, and more) as a participant in YC AI Startup School.
| Industry Standard | DNAnalyzer |
|---|---|
| $100 average cost for DNA sequencing | Free analysis |
| Up to $600 for basic health insights | No usage fees |
| 78% of testing companies share genetic data with third parties | 100% local: no data leaves your device |
| Breaches regularly expose millions of users (e.g. 23andMe, 6.9M users in 2023) | Zero central storage |
Compromised genetic data is permanent. Unlike a password, you cannot change it.
| Capability | Description |
|---|---|
| Codon and protein detection | Identifies protein-coding regions, amino-acid chains, and genomic indicators |
| GC-rich region analysis | Locates promoter regions by 45 to 60 percent GC content |
| Promoter element identification | Detects BRE, TATA, INR, and DPE transcription initiation elements |
| Neurological genomic markers | Screens for variants linked to autism, ADHD, and schizophrenia |
| Multi-format FASTA integration | Parses FASTA, FASTQ, and plain-text input from uploads or external sources |
| CLI automation | Command-line interface for scripting and batch analysis |
| Ancestry estimation | Continental ancestry from 23andMe or AncestryDNA exports, on device |
| Polygenic risk scoring | Per-variant contribution reports with missing-variant flags |
| Smith-Waterman alignment | Optional PyOpenCL GPU acceleration with a pure-Python CPU fallback |
git clone https://github.com/VerisimilitudeX/DNAnalyzer.git
cd DNAnalyzer
docker compose up --buildOnce the containers are up, the stack listens on your machine at:
| Service | URL |
|---|---|
| Web UI | http://localhost:3000 |
| REST API | http://localhost:8080 |
| Swagger docs | http://localhost:8080/swagger-ui/index.html |
Stop with docker compose down.
git clone https://github.com/VerisimilitudeX/DNAnalyzer.git
cd DNAnalyzer
./gradlew buildThis produces two jars under build/libs/:
| Jar | Purpose | Entry point |
|---|---|---|
DNAnalyzer-<version>-boot.jar |
Spring Boot REST API | DNAnalyzer.api.ApiApplication |
DNAnalyzer-<version>-plain.jar |
CLI fat jar | DNAnalyzer.Main |
The simplest path is the launcher script, which auto-selects a jar or falls back to gradle run:
./easy_dna.sh your_file.fa basic # Standard analysis
./easy_dna.sh your_file.fa detailed # Comprehensive analysis
./easy_dna.sh your_file.fa mutations # Generate mutations
./easy_dna.sh your_file.fa all # Complete suite
./easy_dna.sh your_file.fa custom # Interactive modeOverride the jar path with DNANALYZER_JAR=/path/to/jar if needed.
The equivalent direct invocation:
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar your_file.fajava -jar build/libs/DNAnalyzer-1.2.1-plain.jar --profile research your_file.fa
# Available: basic, detailed, quick, research, mutation, clinical
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --profile listEach CLI run writes into a timestamped directory under output/:
output/dnanalyzer_output_<filename>_<timestamp>/
charts/ # QC visualizations (PNG)
sequences/ # Generated mutations and processed sequences (FASTA)
reports/ # Analysis reports and summaries (HTML)
When an OpenAI key is available, each run produces a researcher report and a layperson report alongside the numeric output.
export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o-mini # optionalPass --no-ai to skip the model call.
Start the API alone with ./gradlew bootRun. All endpoints live under /api/v1.
| Endpoint | Method | Description |
|---|---|---|
/api/v1/status |
GET | Health check and version metadata |
/api/v1/analyze |
POST (multipart) | Full analysis pipeline on an uploaded FASTA/FASTQ/plain-text sequence |
/api/v1/base-pairs |
POST (JSON) | Base-pair counts, percentages, and GC content |
/api/v1/reading-frames |
POST (JSON) | Open reading frames (forward and reverse) |
/api/v1/find-proteins |
POST (JSON) | Top 10 candidate proteins by length |
/api/v1/manipulate |
POST (JSON) | Reverse, complement, or reverse-complement a sequence |
/api/v1/parse |
POST (multipart) | Extract the first sequence record from FASTA/FASTQ/plain uploads |
/api/v1/analyze-genetic |
POST (multipart) | Score 23andMe/AncestryDNA genotype files against bundled PRS panels |
curl -F [email protected] http://localhost:8080/api/v1/analyze
curl -X POST http://localhost:8080/api/v1/base-pairs \
-H 'Content-Type: application/json' \
-d '{"sequence": "ATGCGCATTA"}'
curl -F geneticFile=@my_23andme.txt -F snpAnalysis=true \
http://localhost:8080/api/v1/analyze-geneticFull reference: docs/API_REFERENCE.md.
./gradlew run --args='--23andme my_data.txt --prs assets/risk/heart_disease_prs.csv sample.fa'The CLI parses the standard tab-delimited 23andMe export, aligns it with each provided weight table, and reports the raw and normalized contribution of every SNP. Missing or uncallable variants are flagged so you can assess coverage before acting on a score.
Walkthrough and example outputs: docs/usage/polygenic-risk-scoring.md.
Trait predictions are educational only. Do not use them for medical decisions.
Run the Python module directly:
python -m src.python.gpu_smith_waterman SEQ1 SEQ2Or invoke it from the CLI by combining --sw-align with --align:
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar sample.fa --align reference.fa --sw-align
java -jar build/libs/DNAnalyzer-1.2.1-plain.jar --align query.fa reference.fa --sw-alignImplementation notes: docs/developer/GPU_Smith_Waterman.md.
Archive a run (inputs, logs, HTML report) into a timestamped ZIP:
./scripts/package-session.sh sample.faEntry points for humans and AI agents:
| Doc | Purpose |
|---|---|
| AGENTS.md | Orientation for agentic AI and automation |
| docs/README.md | Index of all documentation |
| docs/ARCHITECTURE.md | System architecture and diagrams |
| docs/REPOSITORY_MAP.md | Directory-by-directory guide |
| docs/API_REFERENCE.md | REST and CLI reference |
| docs/getting-started.md | First-time setup |
| SECURITY.md | Security policy and private reporting |
Contributions are welcome at every experience level.
| Metric | Current Value |
|---|---|
| GitHub Stars | 179 |
| Forks | 75 |
| Contributors | 51 |
| Merged pull requests | 0 |
| Release asset downloads | 247 |
These numbers are refreshed by the metrics-refresh.yml workflow.
@software{Acharya_DNAnalyzer_ML-Powered_DNA_2022,
author = {Acharya, Piyush},
doi = {10.5281/zenodo.14556577},
month = oct,
title = {{DNAnalyzer: ML-Powered DNA Analysis Platform}},
url = {https://github.com/VerisimilitudeX/DNAnalyzer},
version = {3.6.1},
year = {2022}
}DNAnalyzer is provided "as-is". Use of this software implies acceptance of all associated risks and liabilities. DNAnalyzer disclaims responsibility for any loss or damage arising from its use. Contact: [email protected].
DNAnalyzer, (C) Piyush Acharya 2026. Fiscally sponsored 501(c)(3) nonprofit (EIN 81-2908499), licensed under the MIT License.
