Formula IDentification from tandem mass spectra by Deep LEarning
FIDDLE is a deep learning method for predicting molecular formulas from MS/MS spectra. This repository contains the full research codebase for model training, evaluation, and paper reproduction.
- Paper: Nature Communications (2025)
- End-user CLI: msfiddle
- Try this demo! FIDDLE on Hugging Face
Breaking change (v2.0.0): The rescore model has been redesigned (Siamese architecture), see details in CHANGELOG.md.
-
Install Anaconda, if not already installed.
-
Create the environment with the necessary packages:
conda env create -f environment.yml- (optional) Install BUDDY and SIRIUS following the respective installation instructions provided in each tool's documentation.
To use the pre-trained models, please use the following scripts to download the weights from the release page and place them in the ./check_point/ directory:
- Orbitrap models:
fiddle_tcn_orbitrap.pt: formula prediction model on Orbitrap spectrafiddle_rescore_orbitrap.pt: rescore model on Orbitrap spectra
- Q-TOF models:
fiddle_tcn_qtof.pt: formula prediction model on Q-TOF spectrafiddle_rescore_qtof.pt: rescore model on Q-TOF spectra
bash ./running_scripts/download_models.shThe input format is mgf, where title, precursor_mz, precursor_type, collision_energy fields are required. Here, we sampled 21 spectra from the EMBL-MCF 2.0 dataset as an example.
BEGIN IONS
TITLE=EMBL_MCF_2_0_HRMS_Library000531
PEPMASS=129.01941
CHARGE=1-
PRECURSOR_TYPE=[M-H]-
PRECURSOR_MZ=129.01941
COLLISION_ENERGY=50.0
SMILES=[H]OC(=O)C([H])=C(C(=O)O[H])C([H])([H])[H]
FORMULA=C5H6O4
THEORETICAL_PRECURSOR_MZ=129.018785
PPM=4.844255818912111
SIMULATED_PRECURSOR_MZ=129.02032113281717
41.2041 0.410228
55.7698 0.503672
56.8647 0.461943
85.0296 100.0
129.0196 8.036902
END IONS
Run FIDDLE!
python run_fiddle.py --test_data ./demo/input_msms.mgf \
--config_path ./config/fiddle_tcn_orbitrap.yml \
--resume_path ./check_point/fiddle_tcn_orbitrap.pt \
--rescore_resume_path ./check_point/fiddle_rescore_orbitrap.pt \
--result_path ./demo/output_fiddle.csv --device 0If you'd like to integrate the results from SIRIUS and BUDDY, please organize the results in the format shown in ./demo/buddy_output.csv and ./demo/sirius_output.csv, and provide them to run FIDDLE:
python run_fiddle.py --test_data ./demo/input_msms.mgf \
--config_path ./config/fiddle_tcn_orbitrap.yml \
--resume_path ./check_point/fiddle_tcn_orbitrap.pt \
--rescore_resume_path ./check_point/fiddle_rescore_orbitrap.pt \
--buddy_path ./demo/output_buddy.csv \
--sirius_path ./demo/output_sirius.csv \
--result_path ./demo/output_fiddle_all.csv --device 0See test_caffeine.py for a worked example running FIDDLE on a caffeine Orbitrap spectrum fetched live from GNPS.
All scripts should be run from the repository root (FIDDLE/).
| Script | Description |
|---|---|
running_scripts/experiments_test_benchmark.sh |
Evaluate on external benchmarks (CASMI 2016, CASMI 2017, EMBL-MCF 2.0) |
running_scripts/experiments_test_nist23.sh |
Evaluate on NIST23 |
running_scripts/experiments_test_chimeric.sh |
Evaluate on chimeric spectra |
running_scripts/experiments_test_noised.sh |
Evaluate under noise conditions |
running_scripts/experiments_ablation_study.sh |
Run ablation study |
running_scripts/experiments_demo.sh |
Run demo experiment |
running_scripts/train_released_models.sh |
Train TCN and rescore models for both Orbitrap and Q-TOF |
For training from scratch, see the train scripts (train_tcn_gpus.py, train_tcn_gpus_cl.py, train_rescore.py) and the corresponding config files in ./config/.
@article{hong2025fiddle,
title={FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra},
author={Hong, Yuhui and Li, Sujun and Ye, Yuzhen and Tang, Haixu},
journal={Nature Communications},
volume={16},
number={1},
pages={11102},
year={2025},
publisher={Nature Publishing Group UK London}
}