MS²PIP is a tool to predict MS² signal peak intensities from peptide sequences. It employs the XGBoost machine learning algorithm and is written in Python.
You can install MS²PIP on your machine by following the instructions below or the extended install instructions. For a more user friendly experience, we created a web server . There, you can easily upload a list of peptide sequences, after which the corresponding predicted MS² spectra can be downloaded in multiple file formats. The web server can also be contacted through the RESTful API.
To generate a predicted spectral library starting from a FASTA file, we developed a pipeline called fasta2speclib. Usage of this pipeline is described in fasta2speclib_config.md. Fasta2speclib was developed in collaboration with the ProGenTomics group for the MS²PIP for DIA project.
If you use MS²PIP for your research, please cite the following articles:
- Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research https://doi.org/10.1093/nar/gkz299
- Degroeve, S., Maddelein, D., & Martens, L. (2015). MS²PIP prediction server: compute and visualize MS² peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Research, 43(W1), W326–W330. https://doi.org/10.1093/nar/gkv542
- Degroeve, S., & Martens, L. (2013). MS²PIP: a tool for MS/MS peak intensity prediction. Bioinformatics (Oxford, England), 29(24), 3199–203. https://doi.org/10.1093/bioinformatics/btt544
Please also take note of and mention the MS²PIP-version and model-version you used.
Download the latest release
and unzip. MS2PIPc runs on Python 3.5 or greater and the required Python packages are listed
in requirements.txt
. MS2PIPc requires machine specific compilation of the
C-code:
sh compile.sh
Check out the extended install instructions for a more detailed explanation.
MS2PIPc comes with pre-trained models for a variety of fragmentation methods and modifications. These models can easily be applied by configuring MS2PIPc in the config.txt file and providing a list of peptides in the form of a PEPREC file.
usage: ms2pipC.py [-h] [-c FILE] [-s FILE] [-w FILE] [-m INT] <peptide file>
positional arguments:
<peptide file> list of peptides
optional arguments:
-h, --help show this help message and exit
-c FILE config file (by default config.txt)
-s FILE .mgf MS2 spectrum file (optional)
-w FILE write feature vectors to FILE.{pkl,h5} (optional)
-m INT number of cpu's to use
Several MS2PIPc options need to be set in this config file.
The models that should be used are set as model=X
where X is one of the
currently supported MS2PIP models (see MS2PIP Models).
The fragment ion error tolerance is set as frag_error=X
where is X is
the tolerance in Da.
PTMs (see further) are set as ptm=X,Y,opt,Z
for each internal PTM
where X is a string that represents the PTM, Y is the difference in Da
associated with the PTM, opt is a required for compatibility with
other CompOmics projects, and Z is the amino acid that is modified by the PTM.
For N- and C-terminal modifications, Z should be N-term
or C-term
,
respectively.
To apply the pre-trained models you need to pass only a <peptide file>
to ms2pipC.py
. This file contains the peptide sequences for which you
want to predict the b- and y-ion peak intensities. The file is space
separated and contains four columns with the following header names:
spec_id
: an id for the peptide/spectrummodifications
: a string indicating the modified amino acidspeptide
: the unmodified amino acid sequencecharge
: charge state to predict
The spec_id column is a unique identifier for each peptide that will
be used in the TITLE field of the predicted MS2 .mgf
file. The
modifications
column is a string that lists the PTMs in the peptide.
Each PTM is written as A|B
where A is the location of the PTM in the
peptide (the first amino acid has location 1, location 0 is used for
n-term modifications, while -1 is used for c-term modifications) and B
is a string that represent the PTM as defined in the config file (-c
command line argument). Multiple PTMs in the modifications
column are
concatenated with '|'.
As an example, suppose the config file contains the line
ptm=Cam,57.02146,opt,C
ptm=Ace,42.010565,opt,N-term
ptm=Glyloss,-58.005479,opt,C-term
then a modifications string could like 0|Ace|2|Cam|5|Cam|-1|Glyloss
which means that the second and fifth amino acid is modified with Cam
,
that there is an N-terminal modification Ace
, and that there is a
C-terminal modification Glyloss
.
In the conversion_tools
folder, we provide a host of Python scripts
to convert common search engine output files to a PEPREC file.
The predictions are saved in a .csv
file with the name
<peptide_file>_predictions.csv
.
If you want the output to be in the form of an .mgf
file, replace the
variable mgf
in line 716 of ms2pipC.py
.
Currently the following models are supported in MS²PIP:
HCD
, CID
, TTOF5600
, TMT
, iTRAQ
,
iTRAQphospho
, HCDch2
and CIDch2
. The last two "ch2" models also include predictions for doubly charged fragment ions (b++ and y++), next to the predictions for singly charged b- and y-ions.
If you use MS²PIP for your research, always mention the MS²PIP-version (see releases page) and model-version (see table below) you used.
Model | Current version | Train-test dataset (unique peptides) | Evaluation dataset (unique peptides) | Median Pearson correlation on evaluation dataset |
---|---|---|---|---|
HCD | v20190107 | MassIVE-KB (1 623 712) | PXD008034 (35 269) | 0.903786 |
CID | v20190107 | NIST CID Human (340 356) | NIST CID Yeast (92 609) | 0.904947 |
iTRAQ | v20190107 | NIST iTRAQ (704 041) | PXD001189 (41 502) | 0.905870 |
iTRAQphospho | v20190107 | NIST iTRAQ phospho (183 383) | PXD001189 (9 088) | 0.843898 |
TMT | v20190107 | Peng Lab TMT Spectral Library (1 185 547) | PXD009495 (36 137) | 0.950460 |
TTOF5600 | v20190107 | PXD000954 (215 713) | PXD001587 (15 111) | 0.746823 |
HCDch2 | v20190107 | MassIVE-KB (1 623 712) | PXD008034 (35 269) | 0.903786 (+) and 0.644162 (++) |
CIDch2 | v20190107 | NIST CID Human (340 356) | NIST CID Yeast (92 609) | 0.904947 (+) and 0.813342 (++) |
For optimal results, your experimental data should match the properties of the MS²PIP model.
Model | Fragmentation method | MS² mass analyzer | Peptide properties |
---|---|---|---|
HCD | HCD | Orbitrap | Tryptic digest |
CID | CID | Linear ion trap | Tryptic digest |
iTRAQ | HCD | Orbitrap | Tryptic digest, iTRAQ-labeled |
iTRAQphospho | HCD | Orbitrap | Tryptic digest, iTRAQ-labeled, enriched for phosphorylation |
TMT | HCD | Orbitrap | Tryptic digest, TMT-labeled |
TTOF5600 | CID | Quadrupole Time-of-Flight | Tryptic digest |
HCDch2 | HCD | Orbitrap | Tryptic digest |
CIDch2 | CID | Linear ion trap | Tryptic digest |
To train custom MS2PIPc models, please refer to Training new MS2PIP models on our Wiki pages.