Skip to content

josiehong/FIDDLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FIDDLE

DOI License

Formula IDentification from tandem mass spectra by Deep LEarning

FIDDLE is a deep learning method for predicting molecular formulas from MS/MS spectra. This repository contains the full research codebase for model training, evaluation, and paper reproduction.

Breaking change (v2.0.0): The rescore model has been redesigned (Siamese architecture), see details in CHANGELOG.md.

Set up

Requirements

  1. Install Anaconda, if not already installed.

  2. Create the environment with the necessary packages:

conda env create -f environment.yml
  1. (optional) Install BUDDY and SIRIUS following the respective installation instructions provided in each tool's documentation.

Pre-trained Model Weights

To use the pre-trained models, please use the following scripts to download the weights from the release page and place them in the ./check_point/ directory:

  • Orbitrap models:
    • fiddle_tcn_orbitrap.pt: formula prediction model on Orbitrap spectra
    • fiddle_rescore_orbitrap.pt: rescore model on Orbitrap spectra
  • Q-TOF models:
    • fiddle_tcn_qtof.pt: formula prediction model on Q-TOF spectra
    • fiddle_rescore_qtof.pt: rescore model on Q-TOF spectra
bash ./running_scripts/download_models.sh

Usage

The input format is mgf, where title, precursor_mz, precursor_type, collision_energy fields are required. Here, we sampled 21 spectra from the EMBL-MCF 2.0 dataset as an example.

BEGIN IONS
TITLE=EMBL_MCF_2_0_HRMS_Library000531
PEPMASS=129.01941
CHARGE=1-
PRECURSOR_TYPE=[M-H]-
PRECURSOR_MZ=129.01941
COLLISION_ENERGY=50.0
SMILES=[H]OC(=O)C([H])=C(C(=O)O[H])C([H])([H])[H]
FORMULA=C5H6O4
THEORETICAL_PRECURSOR_MZ=129.018785
PPM=4.844255818912111
SIMULATED_PRECURSOR_MZ=129.02032113281717
41.2041 0.410228
55.7698 0.503672
56.8647 0.461943
85.0296 100.0
129.0196 8.036902
END IONS

Run FIDDLE!

python run_fiddle.py --test_data ./demo/input_msms.mgf \
                    --config_path ./config/fiddle_tcn_orbitrap.yml \
                    --resume_path ./check_point/fiddle_tcn_orbitrap.pt \
                    --rescore_resume_path ./check_point/fiddle_rescore_orbitrap.pt \
                    --result_path ./demo/output_fiddle.csv --device 0

If you'd like to integrate the results from SIRIUS and BUDDY, please organize the results in the format shown in ./demo/buddy_output.csv and ./demo/sirius_output.csv, and provide them to run FIDDLE:

python run_fiddle.py --test_data ./demo/input_msms.mgf \
                    --config_path ./config/fiddle_tcn_orbitrap.yml \
                    --resume_path ./check_point/fiddle_tcn_orbitrap.pt \
                    --rescore_resume_path ./check_point/fiddle_rescore_orbitrap.pt \
                    --buddy_path ./demo/output_buddy.csv \
                    --sirius_path ./demo/output_sirius.csv \
                    --result_path ./demo/output_fiddle_all.csv --device 0

See test_caffeine.py for a worked example running FIDDLE on a caffeine Orbitrap spectrum fetched live from GNPS.

Reproduce paper results

All scripts should be run from the repository root (FIDDLE/).

Script Description
running_scripts/experiments_test_benchmark.sh Evaluate on external benchmarks (CASMI 2016, CASMI 2017, EMBL-MCF 2.0)
running_scripts/experiments_test_nist23.sh Evaluate on NIST23
running_scripts/experiments_test_chimeric.sh Evaluate on chimeric spectra
running_scripts/experiments_test_noised.sh Evaluate under noise conditions
running_scripts/experiments_ablation_study.sh Run ablation study
running_scripts/experiments_demo.sh Run demo experiment
running_scripts/train_released_models.sh Train TCN and rescore models for both Orbitrap and Q-TOF

For training from scratch, see the train scripts (train_tcn_gpus.py, train_tcn_gpus_cl.py, train_rescore.py) and the corresponding config files in ./config/.

Citation

@article{hong2025fiddle,
  title={FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra},
  author={Hong, Yuhui and Li, Sujun and Ye, Yuzhen and Tang, Haixu},
  journal={Nature Communications},
  volume={16},
  number={1},
  pages={11102},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

About

[Nature Communications] Formula IDentification from tandem mass spectra by Deep LEarning

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors