-
Notifications
You must be signed in to change notification settings - Fork 47
Search Task
trishorts edited this page Feb 14, 2022
·
1 revision
This page provides a numerical list of the available settings and their functions within the MetaMorpheus Search Task GUI. Its purpose is to assist new users as they navigate the GUI. The explanation for each setting is present both here and within the GUI by hovering over any given parameter.
It is organized into four sections:
- Basic Options
- Advanced Options: File Loading Parameters
- Advanced Options: Search Parameters
- Advanced Options: Post-Search Analysis
- Enter Task Name -- This allows the user to name their task for future reference. It has no impact on the actual search.
- Protease -- An enyzme used to digest a protein into peptides, usually at specified amino acid positions. Top-down and non-specific searches are available options in this pull-down menu.
- Max Missed Cleavages -- The number of missed cleavages allowed during digestion by the specified protease.
- Max Mods Per Peptide -- The highest number of PTMs allowed on an individual peptidoform. Increasing this number dramatically increases the combinatorics and results in longer search times.
- Min Peptide Length -- The minimum allowed peptide length. Peptides present in the sample that contain lengths lower than this will be missed or incorrectly identified because their theoretical sequence will not be generated. The default is 7 because peptides shorter are difficult to confidently identify and noticeably increase search time.
- Max Peptide Length -- The maximum allowed peptide length. Peptides present in the sample that contain lengths greater than this will be missed or incorrectly identified. The default is infinity (empty).
- Precursor Mass Tolerance -- Generally specified in ppm, but can be in daltons. This is the difference in mass between the observed precursor and the theoretical peptide. Additionally, this tolerance is used for any notches/bins/mass shifts (such as monoisotopic errors) that are specified in "Mass Difference Acceptor Criterion" (52, under "Advanced Options")
- Product Mass Tolerance -- We recommend ppm for orbitrap data, but can be in daltons. This is the difference in mass between the product ions generated by fragmentation and the theoretical peptide's theoretical fragmentation spectra.
- Dissociation Type -- The dissociation type used to fragment intact peptides and produce product ions for the tandem mass spectra.
- Initiator Methionine -- Truncation of the initiator methionine is a common in vivo process to create functional proteins. We recommend to search for proteins both with and without initiator methionine, but this can be changed to always cleave or always retain.
- Fixed -- Fixed modifications are applied to EVERY amino acid in the database specified in the list. The default fixed modification is carbamidomethylation of cysteine, which is common for digested protein samples that have been reduced and alkylated with iodoacetamide. This fixed modification can be deselected. Other fixed modifications can be selected where appropriate. Labels, such as TandemMassTag, would be appropriate fixed modifications.
- Variable -- Variable modifications are also applied to EVERY amino acid in the database both as on and as off. Variable modifications should be used with caution because they massively increase the sizes of the target and decoy databases, which effect false-positive and false-negative rates. Variable oxidation of methionine is the default option. This is common for bottom-up proteomics because of its prevalance.
- Apply protein parsimony and construct protein groups -- Protein parsimony is performed on the set of observed peptide spectral matches. Once parsimony is performed, the protein assignment for each PSM is updated. Selecting protein parsimony is required for match between runs and protein quantification.
- Require at least two peptides to identify protein -- Two unique peptides can be required by the user for overly stringent protein parsimony. Several papers have debunked this heuristic, but it has been implemented for historical reasons.
- Treat modified peptides as different peptides -- There are a couple of instances where this is appropriate. First, two or more proteins may share a particular peptide sequence. However, the correct parent protein may be discernible because of specified PTMs. When this box is checked, modified peptides that are not distinguishable by sequence may be used as unique peptides for parsimony if they are distinguishable by PTMs. Second, proteoforms may be distinguished by the position of their PTMs.
- No Quantification -- No quantification. Good for qualitative experiments.
- LFQ: Quantify peptides/proteins with FlashLFQ -- Peptides and proteins are quantified using FlashLFQ when this box is checked. FlashLFQ is an ultrafast label-free quantification algorithm for mass-spectrometry proteomics.
- SILAC/SILAM: Quantify peptides/proteins with stable isotope labels -- Good for SILAC experiments. Peptides and proteins are quantified using a modified version of FlashLFQ when this box is checked. If a single peptide is identified, FlashLFQ will attempt to quantify all labeled/unlabeled versions of that peptide independent of if they were fragmented.
- ppm Peakfinding Tolerance -- Specifies the parent mass tolerance used for quantification.
- Match between runs -- Peptides not selected for fragmentation may still be identified using the match between runs feature. Here, any peptide identified in one raw file is sought in other raw files in a small mass and retention time window around the original.
- Normalize Quantification Results -- Quantification values are normalized based on the "Experimental Design" settings.
- Write .mzID -- output an mzID file for use in downstream analyses.
- Write decoys -- output decoy matches (annotated as decoys).
- Write contaminants -- output contaminantmatches (annotated as contaminants).
- Filter results to q-value -- outputs only matches that have a q-value less than or equal to the specified value.
- Use Provided Precursor -- Specifies that the precursor mass reported in the raw data file should be used as one of the options in the search.
- Deconvolute Precursors -- Look back at the isolation window for all possible precursors that could be reasonable co-fragmented. This can produce large differences in precursor mass depending on the charge state and m/z of the fragmented peak. It also enables multiple peptides to be identified from a single MS2.
- Deconvolution Max Assumed Charge State -- Charge states larger than this are discarded or incorrectly identified as harmonics. The default is set to 12 for bottom-up data. This value should be increased in top-down searches to some other appropriate value (e.g. 60).
- Trim MS1 Peaks -- If checked, some MS1 peaks are removed based on the rules below. The effect of peak removal is a significant improvement to search speed and also fewer false positive matches to baseline noise peaks.
- Trim MS2 Peaks -- If checked, some MS2 peaks are removed based on the rules below. The effect of peak removal is significant improvement to search speed and also fewer false positive matches to baseline noise peaks.
- Top N Peaks -- Maximum (upper bound) number of peaks to consider in both MS1 and MS2.
- Minimum Intensity Ratio -- Trim all peaks whose intensity, when divided by the maximum intensity peak in the scan, is below this ratio. The default is 0.01 (1%).
- Nominal window width Thomsons --
- Number of windows --
- Normalize peaks in each window --
- Classic Search -- In the classic search, each MS2 spectrum is compared against all theoretical target and decoy peptide spectra where the parent mass masses fall within the specified precursor mass tolerance. The highest scoring match, measured by the number of matching MS2 peaks, is reported in the output.
- Modern Search -- In the modern search, all theoretical target and decoy peptide MS2 fragments are placed in a lookup table at the start of the run. Then the identity of each fragment peak of an experimental MS2 spectrum is sought in the look-up table. The theoretical target or decoy peptide with the most matching fragment peaks, regardless of precursor mass, is recorded in the results file. The modern search is similar to that used in MSFragger. For issues with RAM, see (40).
- Semi-Specific Search -- In the Semi-Specific Search, a novel digestion and search strategy is used to rapidly identify peptides with up to one non-specific cleavage (In Review). Peptide FDR is stratified by cleavage specificity. For issues with RAM, see (40).
- Non-Specific Search -- In the Non-Specific Search, a novel digestion and search strategy is used to rapidly identify peptides with non-specific cleavages (In Review). Peptide FDR is stratified by cleavage specificity, such that it remains beneficial to specify your protease. If no protease was used, the protease should be set to "non-specific". For issues with RAM, see (40).
- Number of Database Partitions -- Modern (37), Semi-Specific (38), and Non-Specific Searches use an index of theoretical target and decoy peptides that can exceed the RAM capacity of even very large modern desktop computers. If you find that you are experiencing memory issues, you may divide the database into more than one partition prior to searching. Each partition is searched separately, but all partitions are searched before producing the search results. Search results are unaffected by selecting more than one partition, and there is a negligible change in performance.
- Generate target proteins -- Search for peptides originating from the digestion of the proteins in the provided database(s). This can be unchecked for users who prefer to search target and decoy databases separately.
- Generate decoy proteins -- Search for peptides originating from the digestion of decoy proteins informed by the provided database(s).
- Generate reversed decoys -- Generate decoys by reversing the protein sequences in the provided database(s).
- Generate slided decoys -- Generate decoys by shifting amino acids within each protein sequence in the provided database(s).
- Max Modification Isoforms -- The maximum number of theoretical peptidoforms/proteoforms allowed to be generated from a single theoretical peptide/protein for bottom-up/top-down, respectively. The combinatorics produced from large numbers of variable and annotated modifications necessitates that a line is drawn somewhere.
- Min read depth for variants --
- Max heterozygous variants for combinatorics --
- Generate Complementary Ions -- Additional fragment ions are added to the observed MS2 spectrum by substracting each observed MS2 fragment peak from the observed precursor mass (plus a dissociation type-specific mass shift). This process generates complementary ions for each observed peak, such that a y1 ion would yield an additional b(n-1) ion, where n is the length of the peptide. Using this parameter effectively doubles the number of matched fragment ions (and thus the MetaMorpheus score) for most peptides. However, it can be particularly useful when the observed precursor mass is different from the theoretical precursor mass (i.e. a notch search or an open search). Consider a peptide with an N-terminal acetylation that is not present in the theoretical database. When comparing the observed acetylated peptide spectrum with the non-acetylated theoretical spectrum, none of the b- and/or c-ions will match and only y- and/or z-ions will be counted. However, if "Generate Complementary Ions" is selected, then the observed b- and/or c-ions will have complementary products generated that will match to theoretical y- and/or z-ions. This parameter is thus useful when searching for unknown PTMs through notch and open mass searches. This parameter is unchecked by default, with the exception of speedy non-specific and semi-specific searches. Selecting "Semi-Specific Search" or "Non-Specific Search" will automatically check this box.
- N-Terminal Ions -- Search for ions originating from the N-terminus (e.g. a, b, c).
- C-Terminal Ions -- Search for ions originating from the C-terminus (e.g. x, y, z).
- Max Threads -- Specifies the maximum number of threads that MetaMorpheus will use during the run. The default is based on each individual's machine's maximum number of threads minus one.
- Mass Difference Acceptor Criterion -- This is used for the comparison of the observed and theoretical peptide precursor masses. Notches or bins can be specified to allow for matching between observed and theoretical peptides of different precursor masses. Create your own mass tolerance using the following syntax: name dot # ppm #1, #2 etc. where the first # is the ppm error and the 2nd and so on #s are the missed monoisotopics. 0 is zero missed monoisotope, 1 is one missed monoisotope, etc. You can also perform an interval search using the syntax name interval [#,#],etc. where each # is either a min or max in Da with both numbers relative to 0 mass shift (the precursor mass). Any number of intervals is allowed.
- Report PSM Ambiguity -- If multiple theoretical peptide sequences match equally well to a given MS2, all peptides will be reported (i.e "PEPTIDE|PEPTLDE", "PROT1|PROT2"). If unchecked, a random peptide from the multiple will be reported.
- Minimum score allowed -- This is the minimum number of matching peaks required for a PSM to appear in the results files.
- Use Delta Scores For FDR -- Use the delta score for statistical analysis instead of the Morpheus score. The delta score for a given spectra is the Morpheus score of the highest scoring theoretical peptide minus the score of the second highest scoring theoretical peptide. It is a powerful tool to discriminate against similar/ambiguous sequences.
- Construct mass-difference histogram -- This feature is particularly useful in the analysis of open mass tolerance (modern) search results. Mass shifts observed are clustered and analyzed for amino-acid specificity. This can help greatly in the assignment of mass shifts to PTMs and amino acid substitutions.
- Write Pruned Database -- G-PTM-D can add a number of possible modifications to a database. The Write Pruned Database function can be used following the second pass search to create a database with only those modifications observed on peptides detected with FDR < 1%. This eliminates superfluous PTMs. Write if in DB -- This is used in conjunction with the Write Pruned Database feature. It keeps UniProt annotated modifications in the database regardless of whether or not they were observed on a peptide with FDR < 1%.