Subset-neighbor-search is a method for FDR control from tandem mass spectrometry data, applicable when only a subset of peptides or proteins are of interest.
You can run subset-neighbor-search with the following workflow.
First you need to generate a list of relevant and irrelevant peptides. Generating a list of peptides can be performed using the tide-index command in Crux. Specifically, you can run the following command separately for both the relevant fasta database and the irrelevant fasta database.
path-to-crux/crux tide-index --peptide-list T name-of-fasta-file.fa
The above command will generate a text file named tide-index.peptides.txt that contains four columns: target sequence, decoy sequence, mass, and protein ID. Once you have two tide-index.peptides.txt you will need to run the pepsim.py script. You can see how to run this script and what the output looks like via the following help message.
python pepsim.py -h
Note that sequences from the first input will show up as the first column of the output and sequences from the second input will show up as the second column of the output. Typically, the first input will be relevant peptides and the second file will be irrelevant peptides. The output is printed to the console so be sure to save it by redirecting the output to a file. For example,
python pepsim.py --mz-thresh 50 file1 file2 >log.txt
Once pepsim.py is complete (note that run time can take a while), the unique set of peptides found in the second column of the output will be considered the set of "neighbor peptides". Concat this set of peptide sequences with the unique set of "relevant sequences" generated by tide-index to form the database that will be used in subset-neighbor-search. Using this new database, perform a database search with your favorite database search engine. Then filter out any PSMs (both target and decoy) that match to a "neighbor peptide". Finally, estimate the FDR on the remaining set of PSMs. Please note that it is important to filter out these neighbor peptide PSMs prior to FDR estimation.
Congrats, you have succesfully run subset-neighbor-search!
If you use subset-neighbor-search in your work please cite:
Subset-neighbor-search requires the following:
- Python 3
- pyteomics