██╗ ██╗███████╗ ██████╗ █████╗
██║ ██║██╔════╝██╔════╝ ██╔══██╗
██║ ██║█████╗ ██║ ███╗███████║
╚██╗ ██╔╝██╔══╝ ██║ ██║██╔══██║
╚████╔╝ ███████╗╚██████╔╝██║ ██║
╚═══╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝
Named after Vega — David Bowie's 1995 double album. The neon-star motif.
VEGA is a four-mode spectral library engine that covers the full library lifecycle — from validation and health assessment through creation, continuous optimisation, and head-to-head benchmarking. It is the only tool in the ZIGGY ecosystem that treats spectral libraries as living artifacts: inspect them, build them from scratch, improve them, and compare them side by side against real data.
"The 1. Outside album was Bowie rebuilding himself — taking everything apart and putting it back together as something better. VEGA does the same for your spectral libraries."
PROBE → Validate an existing library — flags, health score, per-entry diagnostics
BUILD → Create a new library from raw .d files, FASTA, or de novo sequences
REBUILD → Optimise an existing library — filter, fill gaps, re-score, re-predict
BENCH → Search one run with every library via DIA-NN — compare ID counts side by side
PROBE scores every precursor in a spectral library against a suite of quality flags. A single health score (0–100) summarises the result. Individual flags explain exactly what is wrong.
| Tier | Colour | Flags |
|---|---|---|
| CRITICAL | red | MISSING_CCS, ZERO_FRAGMENTS, IMPOSSIBLE_MASS |
| WARNING | amber | LOW_ION_COVERAGE, DUPLICATE_PRECURSOR, PREDICTED_LIBRARY, NEAR_SPECTRAL_DUPLICATE, CHIMERIC_OR_MISASSIGNED |
| INFO | blue | SINGLETON_PROTEIN, LOW_CHARGE_DIVERSITY, DECOY_CONTAMINATION |
| Score | Verdict | Meaning |
|---|---|---|
| 90–100 | ★ Excellent | Deploy immediately |
| 70–89 | ✓ Good | Minor issues; usable |
| 50–69 | △ Marginal | Visible ID loss expected |
| 0–49 | ✗ Poor | Rebuild recommended |
| File | Description |
|---|---|
vega_report.json |
Per-entry flags, health score, summary statistics, CCS stats |
vega_flagged.tsv |
Entries with at least one flag (for downstream triage) |
BUILD constructs a spectral library from three complementary sources, then merges and deduplicates them into a single, validated .tsv for DIA search.
| Mode | Input | Best for |
|---|---|---|
| Empirical | .d DDA runs + FASTA |
Highest fidelity — real observed spectra |
| Predicted | FASTA only | No raw files needed; fast; Koina-powered |
| Hybrid | .d runs + FASTA |
Fill empirical gaps with predictions — maximum coverage |
| De novo | .d runs (Casanovo) |
Discover peptides outside the FASTA |
| Preset | Mode | Charges | Conf. | CCS | Use case |
|---|---|---|---|---|---|
| Max IDs | Hybrid | 2–4 | 0.65 | ✅ | Maximum peptide identifications |
| High Quality | Hybrid | 2–3 | 0.82 | ✅ | Fewer entries, higher confidence |
| Quick Start | Predicted | 2–4 | 0.70 | ✅ | No raw files — FASTA only |
| Discovery | De novo | 2–5 | 0.60 | ✅ | Novel peptides, non-tryptic |
| Phospho | Hybrid | 2–4 | 0.75 | ✅ | Phosphoproteomics |
VEGA selects the best Koina model automatically based on the instrument type detected in the .d file:
| Instrument | MS2 model | 1/K₀ model |
|---|---|---|
| timsTOF | Prosit_2023_intensity_timsTOF |
IM2Deep |
| Orbitrap | Prosit_2020_intensity_HCD |
AlphaPeptDeep_ccs_generic |
| QTOF | Prosit_2020_intensity_HCD |
AlphaPeptDeep_ccs_generic |
Prediction calls (MS2, RT, CCS) run in parallel via ThreadPoolExecutor. Duplicate peptide sequences are sent to Koina only once and fanned back to all entries — typically 2–4× speedup over sequential per-entry calls.
Input sources
│
├─ Empirical: .d DDA runs → TimsData SDK → PSM table (BOWIE)
│ → real MS2 spectra + observed RT + measured 1/K₀
│
├─ Predicted: FASTA digest → Koina API
│ → predicted MS2 + predicted RT + predicted 1/K₀
│
├─ De novo: .d runs → Casanovo → novel sequences
│ → Koina MS2/RT/CCS predictions
│
├─ Merge + deduplicate (empirical wins on conflict)
│
├─ CCS correction (run-specific calibration polynomial)
│
├─ PROBE validation → flag and health-score the finished library
│
└─ Write: library.tsv + vega_provenance.json
REBUILD takes any existing spectral library (regardless of origin) and applies a pipeline of targeted improvements — without ever modifying the original file.
| Step | Action |
|---|---|
| Filter bad entries | Remove CRITICAL-flagged precursors |
| Remove singletons | Drop proteins with only one precursor |
| Fill missing 1/K₀ | Predict CCS for entries without ion mobility via IM2Deep |
| Apply CCS correction | Re-calibrate 1/K₀ values using a run-specific polynomial |
| Re-predict MS2 | Replace fragment intensities with fresh Koina predictions |
| PROBE validation | Score the output — report improvement |
| Preset | Filters bad¹ | Strict quality² | Fill 1/K₀ | Apply CCS | Re-predict | Use case |
|---|---|---|---|---|---|---|
| ⚡ Max IDs | ✅ | ❌ | ✅ | ✅ | ❌ | Maximise recoverable entries |
| ◈ High Quality | ✅ | ✅ | ✅ | ✅ | ❌ | Remove noise, maximise confidence |
| K₀ CCS Native | ❌ | ❌ | ✅ | ✅ | ❌ | Add 1/K₀ to a 2D library |
| ✦ Discovery | ❌ | ❌ | ✅ | ✅ | ❌ | Preserve every entry, add CCS |
| ↺ Full Refresh | ✅ | ❌ | ✅ | ✅ | ✅ | Complete re-prediction from Koina |
¹ Filters bad — removes entries with 0–2 peaks (empty / too-few-peaks) and chimeric near-duplicates detected by spectral fingerprint matching.
² Strict quality — additionally removes marginal 3–4 peak entries. For .blib / .msp libraries with observation copy-count data, also removes singletons (peptides seen only once across MS runs).
REBUILD always writes to a new file. If the output name matches the input, a timestamp is appended automatically:
my_library.tsv → (input, unchanged)
my_library_20250516_143022.tsv → (rebuilt output)
BENCH answers the most practical question in DIA proteomics: which spectral library gives the most identifications on my data?
The engine is fixed — always DIA-NN. The only variable is the spectral library. BENCH searches one diaPASEF run against every library in your collection sequentially and presents the results side by side.
Input: one diaPASEF .d run + all libraries in your library folder
│
├─ Library A → DIA-NN search → n_proteins / n_peptides / n_PSMs
├─ Library B → DIA-NN search → n_proteins / n_peptides / n_PSMs
├─ Library C → DIA-NN search → n_proteins / n_peptides / n_PSMs
│ …
└─ Results table: ranked by PSMs, with vs-best % bar
| Column | Description |
|---|---|
| Library | Name and format of the spectral library |
| Health | PROBE health score (0–100) if previously validated |
| Precursors | Entry count in the library |
| Proteins | Unique protein groups at 1% FDR |
| Peptides | Unique peptide sequences at 1% FDR |
| PSMs | Total precursor matches at 1% FDR |
| vs Best | PSM count as % of the best-performing library |
Results are cached between sessions — re-run a single library or wipe and start fresh.
git clone https://github.com/MKrawitzky/VEGA.git
cd VEGA
pip install -r requirements.txtfrom vega import LibraryValidator
report = LibraryValidator("E:/libs/human_dia.tsv").validate()
print(f"Health: {report.health_score}/100 — {report.verdict}")
for flag, count in report.flag_counts.items():
print(f" {flag}: {count}")from vega import LibraryBuilder
lib = LibraryBuilder(
raw_paths = ["E:/data/run1.d", "E:/data/run2.d"],
fasta_path = "E:/assets/human.fasta",
preset = "max_ids", # max_ids | high_quality | quick | discovery | phospho
output_dir = "E:/libs/",
output_name = "human_new.tsv",
)
result = lib.build()
print(f"{result.n_precursors} precursors → {result.tsv_path}")from vega import LibraryRebuilder
rebuilt = LibraryRebuilder(
input_path = "E:/libs/human_dia.tsv",
preset = "high_quality", # max_ids | high_quality | ccs_native | discovery | full_refresh
output_dir = "E:/libs/",
output_name = "human_dia.tsv", # safe — timestamp appended if name conflicts
).rebuild()
print(f"Health: {rebuilt.health_before} → {rebuilt.health_after} (+{rebuilt.improvement_pct}%)")
print(f"Entries: {rebuilt.n_precursors_in} → {rebuilt.n_precursors_out}")
print(f"CCS filled: {rebuilt.ccs_filled} | MS2 re-predicted: {rebuilt.ms2_repredicted}")VEGA is fully integrated into the ZIGGY dashboard — use the VEGA tab to PROBE, BUILD, REBUILD, or BENCH any library without writing a line of code.
| File | Description |
|---|---|
library.tsv |
Final spectral library in DIA-NN / Spectronaut format |
vega_report.json |
PROBE flags, health score, CCS stats, per-entry diagnostics |
vega_flagged.tsv |
Flagged entries only (for triage) |
vega_provenance.json |
All build/rebuild parameters, Koina models, timing, versions |
vega/
├── __init__.py ← LibraryValidator, LibraryBuilder, LibraryRebuilder public API
├── validator.py ← PROBE — flag engine + health score
├── flags.py ← Flag definitions, severity tiers, descriptions
├── builder.py ← BUILD — empirical / predicted / hybrid / de novo
├── rebuilder.py ← REBUILD — filter → fill → correct → re-predict → validate
├── koina.py ← Koina REST client: parallel calls, deduplication, model defaults
├── digest.py ← Tryptic / non-specific digest + mod enumeration
├── ccs_correction.py ← Run-specific CCS calibration polynomial
├── presets.py ← BUILD_PRESETS + REBUILD_PRESETS
└── writer.py ← TSV + provenance JSON output
- Python 3.9+
numpy,scipy,polarsrequests(Koina API)- TimsData SDK (for empirical / de novo BUILD modes)
- Optional:
casanovo(de novo BUILD mode) - Optional: Koina endpoint (predicted / hybrid / REBUILD modes)
VEGA is one of seven in-house engines inside ZIGGY:
| Engine | Mode | Approach |
|---|---|---|
| VEGA | Library | PROBE · BUILD · REBUILD · BENCH · Koina-powered predictions |
| BOWIE | DDA + DIA | 4D database search · presets · open search |
| GauDIA | DIA | Fragment index + cosine + BH-FDR |
| Copperfield | DIA | GauDIA + CCS gate + Kálmán RT + Percolator |
| PHANTOM | DIA | Particle filter + EKF + GRU + EB-FDR |
| Goya | DDA | SGD logistic · 8 features · BH-FDR |
| Zyna | DIA post | Chimeric deconvolution |
| Silent Heroes | Advisor | Valley-bounded FWHM · gradient-aware DIA windows · Help Me Michael |
ZIGGY / STAN Academic License — Copyright © 2024–2026 Michael Krawitzky
Free for: academic research · non-profit · education · government-funded research · core facility internal QC
Commercial use requires a license: for-profit companies · CROs & pharma · fee-for-service
Contact: [email protected]
Michael Krawitzky — The Peptide Wizard
github.com/MKrawitzky
"I'm deranged… I'm deranged… down, down, down…"
🌟 VEGA — where spectral libraries meet Outside 🌟