Formula IDentification from tandem mass spectra by Deep LEarning
The source code for the training and evaluation of FIDDLE, as well as for the inference of FIDDLE using results from SIRIUS and BUDDY, is provided (see detailed commands in ./running_scripts/
). A PyPI package and a website-based service for FIDDLE will be available soon.
Preprint: https://www.biorxiv.org/content/10.1101/2024.11.25.625316v1
-
Install Anaconda, if not already installed.
-
Create the environment with the necessary packages:
conda env create -f environment.yml
- (optional) Install BUDDY and SIRIUS following the respective installation instructions provided in each tool's documentation.
To use the pre-trained models, please use the following scripts to download the weights from the release page and place them in the ./check_point/
directory:
- Orbitrap models:
fiddle_tcn_orbitrap.pt
: formula prediction model on Orbitrap spectrafiddle_fdr_orbitrap.pt
: confidence score prediction model on Orbitrap spectra
- Q-TOF models:
fiddle_tcn_qtof.pt
: formula prediction model on Q-TOF spectrafiddle_fdr_qtof.pt
: confidence score prediction model on Q-TOF spectra
bash ./running_scripts/download_models.sh
The input format is mgf
, where title
, precursor_mz
, precursor_type
, collision_energy
fields are required. Here, we sampled 21 spectra from the EMBL-MCF 2.0 dataset as an example.
BEGIN IONS
TITLE=EMBL_MCF_2_0_HRMS_Library000531
PEPMASS=129.01941
CHARGE=1-
PRECURSOR_TYPE=[M-H]-
PRECURSOR_MZ=129.01941
COLLISION_ENERGY=50.0
SMILES=[H]OC(=O)C([H])=C(C(=O)O[H])C([H])([H])[H]
FORMULA=C5H6O4
THEORETICAL_PRECURSOR_MZ=129.018785
PPM=4.844255818912111
SIMULATED_PRECURSOR_MZ=129.02032113281717
41.2041 0.410228
55.7698 0.503672
56.8647 0.461943
85.0296 100.0
129.0196 8.036902
END IONS
Run FIDDLE!
python run_fiddle.py --test_data ./demo/input_msms.mgf \
--config_path ./config/fiddle_tcn_orbitrap.yml \
--resume_path ./check_point/fiddle_tcn_orbitrap.pt \
--fdr_resume_path ./check_point/fiddle_fdr_orbitrap.pt \
--result_path ./demo/output_fiddle.csv --device 0
If you'd like to integrate the results from SIRIUS and BUDDY, please organize the results in the format shown in ./demo/buddy_output.csv
and ./demo/sirius_output.csv
, and provide them to run FIDDLE:
python run_fiddle.py --test_data ./demo/input_msms.mgf \
--config_path ./config/fiddle_tcn_orbitrap.yml \
--resume_path ./check_point/fiddle_tcn_orbitrap.pt \
--fdr_resume_path ./check_point/fiddle_fdr_orbitrap.pt \
--buddy_path ./demo/output_buddy.csv \
--sirius_path ./demo/output_sirius.csv \
--result_path ./demo/output_fiddle_all.csv --device 0
@article{hong2024fiddle,
title={FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra},
author={Hong, Yuhui and Li, Sujun and Ye, Yuzhen and Tang, Haixu},
journal={bioRxiv},
pages={2024--11},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.