Skip to content

Latest commit

 

History

History
executable file
·
176 lines (165 loc) · 8.98 KB

OUTPUT.md

File metadata and controls

executable file
·
176 lines (165 loc) · 8.98 KB

Sipros Ensemble Output

Sipros is a database-searching algorithm for peptide and protein identification in shotgun meta/proteomics. The output format of the various final and intermediate files generated by the Sipros are described below.

Final Result Files

  1. ${output_dir}/${ms2_filename}_${search_name}.psm.txt : This file contains PSM level results from the PSM filtering. The columns in this file are
Filename = Filename of input MS2 file
ScanNumber = Scan number of the PSM
ParentCharge = Charge state of the PSM
MeasuredParentMass = Measured parent mass
CalculatedParentMass = Calculated parent mass from peptide sequence
MassErrorDa = Mass error in Da with 1-Da error correction
MassErrorPPM = Mass error in PPM with 1-Da error correction
ScanType = Scan type of the PSM
SearchName = Sipros search name
ScoringFunction = Scoring function used in the search
Score = Predicted Probability of being true PSM
DeltaZ = Difference between the best PSM score and the next best PSM of this scan
DeltaP = Difference between the best modified PSM and its PTM isoform
IdentifiedPeptide = Identified peptide sequence with potential PTMs and mutations
OriginalPeptide = Original peptide sequence in the FASTA file
ProteinNames = Names of proteins of the peptide
ProteinCount = Number of proteins that the peptide can be assigned to
TargetMatch = T for target match and F for decoy match
  1. ${output_dir}/${ms2_filename}_${search_name}.pep.txt : This file contains Peptide level results from the Peptide filtering. The columns in this file are
IdentifiedPeptide = Identified peptide sequence with potential PTMs and mutations
ParentCharge = Charge state of identified peptide
OriginalPeptide = Original peptide sequence in the FASTA file
ProteinNames = Names of proteins of the peptide
ProteinCount = Number of proteins that the peptide can be assigned to
TargetMatch = T for target match and F for decoy match
SpectralCount = Number of PSMs in which the peptide is identified
BestScore = The best score of those PSMs
PSMs = List of PSMs for the peptide: MS2_Filename[Scan_Number]
ScanType = Scan type of those PSMs
SearchName = Sipros search name
  1. ${output_dir}/${ms2_filename}_${search_name}.pro.txt : This file contains Protein level results from the Protein assembling. The columns in this file are
ProteinID = Names of the protein
Run#_UniquePeptideCounts = Number of unique peptides in a run
Run#_TotalPeptideCounts = Number of all peptides in a run
Run#_UniqueSpectrumCounts = Number of unique PSM in a run
Run#_TotalSpectrumCounts = Number of all PSM in a run
Run#_BalancedSpectrumCounts = Balanced spectrum count in a run
Run#_NormalizedBalancedSpectrumCounts = Normalized Balanced spectrum count in a run
ProteinDescription = Protein description
TargetMatch = T for target match and F for decoy match
  1. ${output_dir}/${ms2_filename}_${search_name}.pro2psm.txt : This file contains the spectrum count and related statistics for each identified protein. The columns in this file are
+ = Marker of a protein line
ProteinID = Names of the protein
Run#_UniquePeptideCounts = Number of unique peptides in a run
Run#_TotalPeptideCounts = Number of all peptides in a run
Run#_UniqueSpectrumCounts = Number of unique PSM in a run
Run#_TotalSpectrumCounts = Number of all PSM in a run
Run#_BalancedSpectrumCounts = Balanced spectrum count in a run
Run#_NormalizedBalancedSpectrumCounts = Normalized Balanced spectrum count in a run
ProteinDescription = Protein description
TargetMatch = T for target match and F for decoy match

* = Marker of a peptide line
IdentifiedPeptide = Identified peptide sequence with potential PTMs and mutations
ParentCharge = Charge state of identified peptide
OriginalPeptide = Original peptide sequence in the FASTA file
ProteinNames = Names of proteins of the peptide
ProteinCount = Number of proteins that the peptide can be assigned to
TargetMatch = T for target match and F for decoy match
SpectralCount = Number of PSMs in which the peptide is identified
BestScore = The best score of those PSMs
PSMs = List of PSMs for the peptide: MS2_Filename[Scan_Number]
ScanType = Scan type of those PSMs
SearchName = Sipros search name
  1. ${output_dir}/${ms2_filename}_${search_name}.pro2pep.txt : This file contains the peptide count and related statistics for each identified protein. The columns in this file are
+ = Marker of a protein line
ProteinID = Names of the protein
Run#_UniquePeptideCounts = Number of unique peptides in a run
Run#_TotalPeptideCounts = Number of all peptides in a run
Run#_UniqueSpectrumCounts = Number of unique PSM in a run
Run#_TotalSpectrumCounts = Number of all PSM in a run
Run#_BalancedSpectrumCounts = Balanced spectrum count in a run
ProteinDescription = Protein description
TargetMatch = T for target match and F for decoy match

* = Marker of a PSM line
Filename = Filename of input MS2 file
ScanNumber = Scan number of the PSM
ParentCharge = Charge state of the PSM
MeasuredParentMass = Measured parent mass
CalculatedParentMass = Calculated parent mass from peptide sequence
MassErrorDa = Mass error in Da with 1-Da error correction
MassErrorPPM = Mass error in PPM with 1-Da error correction
ScanType = Scan type of the PSM
SearchName = Sipros search name
ScoringFunction = Scoring function used in the search
Score = Score
DeltaZ = Difference between the best PSM and the next best PSM of this scan
DeltaP = Difference between the best modified PSM and its PTM isoform
IdentifiedPeptide = Identified peptide sequence with potential PTMs and mutations
OriginalPeptide = Original peptide sequence in the FASTA file
ProteinNames = Names of proteins of the peptide
ProteinCount = Number of proteins that the peptide can be assigned to
TargetMatch = T for target match and F for decoy match

Intermediate Files

  1. ${output_dir}/${ms2_filename}_${search_name}_Spe2Pep.txt: This file is an intermediate file generated by the database-searching of Sipros Ensemble. The columns in this file are
+ = Marker of a spectrum line
Filename = Filename of input MS2 file
ScanNumber = Scan number of the PSM
ParentCharge = Charge state of the PSM
MeasuredParentMass = Measured parent mass
ScanType = Scan type of the PSM
SearchName = Sipros search name
TotalIntensity = Sum of all the peak intensities of the PSM
MaxIntensity = Maximum intensity of the PSM
* = Marker of a peptide line
IdentifiedPeptide = Identified peptide sequence with potential PTMs and mutations
OriginalPeptide = Original peptide sequence in the FASTA file
CalculatedParentMass = Calculated parent mass from peptide sequence
MVH = Multivariate hypergeometric Score
Xcorr = Cross correlation score
WDP = Weighted dot product score
ProteinNames = Names of proteins of the peptide
  1. ${output_dir}/${ms2_filename}_${search_name}.tab : This file is an intermediate file generated by summarizing all the Spe2Pep.txt files. The columns are the features could be used for ensemble learning steps. The columns in this file are
FileName = Filename of input MS2 file
ScanNumber = Scan number of the PSM
ParentCharge = Charge state of the PSM
MeasuredParentMass = Measured parent mass
ScanType = Scan type of the PSM
SearchName = Sipros search name
IdentifiedPeptide = Identified peptide sequence with potential PTMs and mutations
OriginalPeptide = Original peptide sequence in the FASTA file
CalculatedParentMass = Calculated parent mass from peptide sequence
MVH = Score by using the MVH scoreing function
Xcorr = Score by using the Xcorr scoring function
WDP = Score by using the weighted-dot-product scoring function
ProteinNames = Names of proteins of the peptide
ScoreAgreement = Count of scores that rank the current PSM as the top one
DeltaRP1 = Fractional difference between current and best Rank Product based on MVH
DeltaRP2 = Fractional difference between current and best Rank Product based on XCorr
DeltaRP3 = Fractional difference between current and best Rank Product based on WDP
DeltaRS1 = Fractional difference between current and best MVH
DeltaRS2 = Fractional difference between current and best XCorr
DeltaRS3 = Fractional difference between current and best WDP
DiffRP1 = Difference between current and next best Rank Product based on MVH
DiffRP2 = Difference between current and next best Rank Product based on Xcorr
DiffRP3 = Difference between current and next best Rank Product based on WDP
DiffRS1 = Difference between current and next best MVH
DiffRS2 = Difference between current and next best Xcorr
DiffRS3 = Difference between current and next best WDP
DiffNorRP1 = Fractional difference between current and next best Rank Product based on MVH 
DiffNorRP2 = Fractional difference between current and next best Rank Product based on Xcorr
DiffNorRP3 = Fractional difference between current and next best Rank Product based on WDP
DiffNorRS1 = Fractional difference between current and next best MVH
DiffNorRS2 = Fractional difference between current and next best Xcorr
DiffNorRS3 = Fractional difference between current and next best WDP
RetentionTime =  Retention time
LocalRank = Rank by using the rank product
DeltaP = Difference between the best modified PSM and its PTM isoform