MS²PIP

MS²PIP is a tool to predict MS² signal peak intensities from peptide sequences. It employs the XGBoost machine learning algorithm and is written in Python.

You can install MS²PIP on your machine by following the instructions below or the extended install instructions. For a more user friendly experience, we created a web server . There, you can easily upload a list of peptide sequences, after which the corresponding predicted MS² spectra can be downloaded in multiple file formats. The web server can also be contacted through the RESTful API.

To generate a predicted spectral library starting from a FASTA file, we developed a pipeline called fasta2speclib. Usage of this pipeline is described in fasta2speclib_config.md. Fasta2speclib was developed in collaboration with the ProGenTomics group for the MS²PIP for DIA project.

If you use MS²PIP for your research, please cite the following articles:

Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research https://doi.org/10.1093/nar/gkz299
Degroeve, S., Maddelein, D., & Martens, L. (2015). MS²PIP prediction server: compute and visualize MS² peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Research, 43(W1), W326–W330. https://doi.org/10.1093/nar/gkv542
Degroeve, S., & Martens, L. (2013). MS²PIP: a tool for MS/MS peak intensity prediction. Bioinformatics (Oxford, England), 29(24), 3199–203. https://doi.org/10.1093/bioinformatics/btt544

Please also take note of and mention the MS²PIP-version and model-version you used.

Installation

Download the latest release and unzip. MS2PIPc runs on Python 3.5 or greater and the required Python packages are listed in requirements.txt. MS2PIPc requires machine specific compilation of the C-code:

sh compile.sh

Check out the extended install instructions for a more detailed explanation.

Predicting MS2 peak intensities

MS2PIPc comes with pre-trained models for a variety of fragmentation methods and modifications. These models can easily be applied by configuring MS2PIPc in the config.txt file and providing a list of peptides in the form of a PEPREC file.

MS2PIPc command line interface

usage: ms2pipC.py [-h] [-c FILE] [-s FILE] [-w FILE] [-m INT] <peptide file>

positional arguments:
  <peptide file>  list of peptides

optional arguments:
  -h, --help      show this help message and exit
  -c FILE         config file (by default config.txt)
  -s FILE         .mgf MS2 spectrum file (optional)
  -w FILE         write feature vectors to FILE.{pkl,h5} (optional)
  -m INT          number of cpu's to use

Config file

Several MS2PIPc options need to be set in this config file.

The models that should be used are set as model=X where X is one of the currently supported MS2PIP models (see MS2PIP Models).

The fragment ion error tolerance is set as frag_error=X where is X is the tolerance in Da.

PTMs (see further) are set as ptm=X,Y,opt,Z for each internal PTM where X is a string that represents the PTM, Y is the difference in Da associated with the PTM, opt is a required for compatibility with other CompOmics projects, and Z is the amino acid that is modified by the PTM. For N- and C-terminal modifications, Z should be N-term or C-term, respectively.

PEPREC file

To apply the pre-trained models you need to pass only a <peptide file> to ms2pipC.py. This file contains the peptide sequences for which you want to predict the b- and y-ion peak intensities. The file is space separated and contains four columns with the following header names:

spec_id: an id for the peptide/spectrum
modifications: a string indicating the modified amino acids
peptide: the unmodified amino acid sequence
charge: charge state to predict

The spec_id column is a unique identifier for each peptide that will be used in the TITLE field of the predicted MS2 .mgf file. The modifications column is a string that lists the PTMs in the peptide. Each PTM is written as A|B where A is the location of the PTM in the peptide (the first amino acid has location 1, location 0 is used for n-term modifications, while -1 is used for c-term modifications) and B is a string that represent the PTM as defined in the config file (-c command line argument). Multiple PTMs in the modifications column are concatenated with '|'.

As an example, suppose the config file contains the line

ptm=Cam,57.02146,opt,C
ptm=Ace,42.010565,opt,N-term
ptm=Glyloss,-58.005479,opt,C-term

then a modifications string could like 0|Ace|2|Cam|5|Cam|-1|Glyloss which means that the second and fifth amino acid is modified with Cam, that there is an N-terminal modification Ace, and that there is a C-terminal modification Glyloss.

In the conversion_tools folder, we provide a host of Python scripts to convert common search engine output files to a PEPREC file.

The predictions are saved in a .csv file with the name <peptide_file>_predictions.csv. If you want the output to be in the form of an .mgf file, replace the variable mgf in line 716 of ms2pipC.py.

MS²PIP models

Currently the following models are supported in MS²PIP: HCD, CID, TTOF5600, TMT, iTRAQ, iTRAQphospho, HCDch2 and CIDch2. The last two "ch2" models also include predictions for doubly charged fragment ions (b++ and y++), next to the predictions for singly charged b- and y-ions.

If you use MS²PIP for your research, always mention the MS²PIP-version (see releases page) and model-version (see table below) you used.

Models, version numbers, and the train and test datasets used to create each model

Model	Current version	Train-test dataset (unique peptides)	Evaluation dataset (unique peptides)	Median Pearson correlation on evaluation dataset
HCD	v20190107	MassIVE-KB (1 623 712)	PXD008034 (35 269)	0.903786
CID	v20190107	NIST CID Human (340 356)	NIST CID Yeast (92 609)	0.904947
iTRAQ	v20190107	NIST iTRAQ (704 041)	PXD001189 (41 502)	0.905870
iTRAQphospho	v20190107	NIST iTRAQ phospho (183 383)	PXD001189 (9 088)	0.843898
TMT	v20190107	Peng Lab TMT Spectral Library (1 185 547)	PXD009495 (36 137)	0.950460
TTOF5600	v20190107	PXD000954 (215 713)	PXD001587 (15 111)	0.746823
HCDch2	v20190107	MassIVE-KB (1 623 712)	PXD008034 (35 269)	0.903786 (+) and 0.644162 (++)
CIDch2	v20190107	NIST CID Human (340 356)	NIST CID Yeast (92 609)	0.904947 (+) and 0.813342 (++)

MS² acquisition information and peptide properties of the models' training datasets

For optimal results, your experimental data should match the properties of the MS²PIP model.

Model	Fragmentation method	MS² mass analyzer	Peptide properties
HCD	HCD	Orbitrap	Tryptic digest
CID	CID	Linear ion trap	Tryptic digest
iTRAQ	HCD	Orbitrap	Tryptic digest, iTRAQ-labeled
iTRAQphospho	HCD	Orbitrap	Tryptic digest, iTRAQ-labeled, enriched for phosphorylation
TMT	HCD	Orbitrap	Tryptic digest, TMT-labeled
TTOF5600	CID	Quadrupole Time-of-Flight	Tryptic digest
HCDch2	HCD	Orbitrap	Tryptic digest
CIDch2	CID	Linear ion trap	Tryptic digest

To train custom MS2PIPc models, please refer to Training new MS2PIP models on our Wiki pages.

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
conversion_tools		conversion_tools
cython_modules		cython_modules
manuscripts/2019		manuscripts/2019
models		models
ms2pip_tools		ms2pip_tools
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
compile.sh		compile.sh
config.txt		config.txt
config_unimod.txt		config_unimod.txt
fasta2speclib.py		fasta2speclib.py
fasta2speclib_config.json		fasta2speclib_config.json
fasta2speclib_config.md		fasta2speclib_config.md
ms2pipC.py		ms2pipC.py
plot_optimization_result.py		plot_optimization_result.py
requirements.txt		requirements.txt
setup.py		setup.py
train_lightGBM.py		train_lightGBM.py
train_lightGBM_single.py		train_lightGBM_single.py
train_xgboost_c.py		train_xgboost_c.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MS²PIP

Installation

Predicting MS2 peak intensities

MS2PIPc command line interface

Config file

PEPREC file

MS²PIP models

Models, version numbers, and the train and test datasets used to create each model

MS² acquisition information and peptide properties of the models' training datasets

About

Releases

Packages

Languages

License

brvpuyve/ms2pip_c

Folders and files

Latest commit

History

Repository files navigation

MS²PIP

Installation

Predicting MS2 peak intensities

MS2PIPc command line interface

Config file

PEPREC file

MS²PIP models

Models, version numbers, and the train and test datasets used to create each model

MS² acquisition information and peptide properties of the models' training datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages