From 02ecb52e8bd0831e987bbccc16a8987d17d570d3 Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Wed, 7 Feb 2018 15:38:37 +0100 Subject: [PATCH] fix: List-of-Metrics as Vignette --- PTXQC_list-of-metrics.html | 584 ------------------ R/fcn_misc.R | 22 - R/qcMetric_EVD.R | 5 +- R/qcMetric_MSMSScans.R | 3 +- README.md | 13 +- .../PTXQC_list-of-metrics_template.Rmd | 63 -- man/createListOfFunctions.Rd | 11 - man/createListOfPTXQCMetrics.Rd | 20 - vignettes/PTXQC-ListOfMetrics.Rmd | 85 +++ 9 files changed, 93 insertions(+), 713 deletions(-) delete mode 100644 PTXQC_list-of-metrics.html delete mode 100644 inst/reportTemplate/PTXQC_list-of-metrics_template.Rmd delete mode 100644 man/createListOfFunctions.Rd delete mode 100644 man/createListOfPTXQCMetrics.Rd create mode 100644 vignettes/PTXQC-ListOfMetrics.Rmd diff --git a/PTXQC_list-of-metrics.html b/PTXQC_list-of-metrics.html deleted file mode 100644 index ec02251..0000000 --- a/PTXQC_list-of-metrics.html +++ /dev/null @@ -1,584 +0,0 @@ - - - - - - - - - - - - - -PTXQC List of Metrics - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - -
-

1 Overview

-

The following metrics are implemented in PTXQC.

-

Reasons why metrics might not appear in every report

-
    -
  • applicable only to certain types of data, e.g. SILAC or TMT
  • -
  • metric was disabled manually using the YAML config file
  • -
  • missing input data (incomplete tables), e.g. only evidence.txt is present
  • -
-

Some metrics are incompletely named here, since they might contain (hithero unknown) thresholds.

-
-
-

2 Metrics

-
-

2.1 Average Overall Quality

-
-Internal metric to compute the average quality across all other metrics -
-

-back to top -

-
-
-

2.2 EVD: Charge

-
-

Charge distribution per Raw file. For typtic digests, peptides of charge 2 (one N-terminal and one at tryptic C-terminal R or K residue) should be dominant. Ionization issues (voltage?), in-source fragmentation, missed cleavages and buffer irregularities can cause a shift (see http://onlinelibrary.wiley.com/doi/10.1002/mas.21544/abstract ). The charge distribution should be similar across Raw files. Consistent charge distribution is paramount for comparable 3D-peak intensities across samples.

-

Heatmap score EVD: Charge: Deviation of the charge 2 proportion from a representative Raw file (‘qualMedianDist’ function).

-
-

-back to top -

-
-
-

2.3 EVD: ID rate over RT

-
-

Judge column occupancy over retention time. Ideally, the LC gradient is chosen such that the number of identifications (here, after FDR filtering) is uniform over time, to ensure consistent instrument duty cycles. Sharp peaks and uneven distribution of identifications over time indicate potential for LC gradient optimization. See http://www.ncbi.nlm.nih.gov/pubmed/24700534 for details.

-

Heatmap score EVD: ID rate over RT: Scored using ‘Uniform’ scoring function, i.e. constant receives good score, extreme shapes are bad.

-
-

-back to top -

-
-
-

2.4 EVD: MBR Align

-
-

MBR Alignment: First of two steps (1=align, 2=transfer) during Match-between-runs. This plot is based purely on real MS/MS ids. Ideally, RTs of identical peptides should be equal (i.e. very small residual RT delta) across Raw files after alignment.

-

MaxQuants RT correction is shown in blue – it should be well within the alignment search window (20min by default) set during MaxQuant configuration. The resulting residual RT delta after RT alignment (compared to a reference Raw file), is shown as green/red dots. One dot represents one peptide (incl. charge). Every dot (peptide) outside an allowed residual delta RT (1min by default) is colored red. All others are green.

-

If moving ‘red’ dots to the horizontal zero-line (to make them green) requires large RT shifts, then increasing the alignment search window might help MaxQuant to find a better alignment.

-

Heatmap score EVD: MBR Align: fraction of ‘green’ vs. ‘green+red’ peptides.

-
-

-back to top -

-
-
-

2.5 EVD: MBR auxilliary

-
-

Auxililiary plots – experimental – without scores.

-

Heatmap score: none.

-
-

-back to top -

-
-
-

2.6 EVD: MBR ID-Transfer

-
-

MBR Transfer: Last of two steps (1=align, 2=transfer) during Match-between-runs. If MaxQuant only transfers peptide ID’s which are not present in the target file, then each Raw file should not have any duplicates of identical peptides (incl. charge). Sometimes, a single or split 3D-peak gets annotated multiple times, that’s ok. However, the same peptide should not be annotated twice (or more) at vastly different points in RT.

-

This plot shows three columns: - left: the ‘genuine’ situation (pretending that no MBR was computed) - middle: looking only at transferred IDs - right: combined picture (a mixture of left+middle, usually)

-

Each peptide falls into three categories (the colors): - single (good, because it has either one genuine OR a transferred ID). - in-group (also good, because all ID’s are very close in RT) - out-group (bad, spread across the RT gradient – should not be possible; a false ID)

-

Heatmap score EVD: MBR ID-Transfer: The fraction of non-out-group peptides (i.e. good peptides) in the middle column. This score is ‘pessimistic’ because if few ID’s were transferred, but all of them are bad, the score is bad, even though the majority of peptides is still ok (because they are genuine). However, in this case MBR provides few (and wrong) additional information, and should be disabled.

-
-

-back to top -

-
-
-

2.7 EVD: Pep Missing Values

-
-

Missing peptide intensities per Raw file from evidence.txt. This metric shows the fraction of missing peptides compared to all peptides seen in the whole experiment. The more Raw files you have, the higher this fraction is going to be (because there is always going to be some exotic [low intensity?] peptide which gets [falsely] identified in only a single Raw file). A second plot shows how many peptides (Y-axis) are covered by at least X Raw files. A third plot shows the density of the observed (line) and the missing (filled area) data. To reconstruct the distribution of missing values, an imputation strategy is required, so the argument is somewhat circular here. If all Raw files are (technical) replicates, i.e. we can expect that missing peptides are indeed present and have an intensity similar to the peptides we do see, then the median is a good estimator. This method performs a global normalization across Raw files (so their observed intensitiy distributions have the same mean), before computing the imputed values. Afterwards, the distributions are de-normalized again (shifting them back to their) original locations – but this time with imputed peptides.

-

Peptides obtained via Match-between-run (MBR) are accounted for (i.e. are considered as present = non-missing). Thus, make sure that MBR is working as intended (see MBR metrics).

-

Warning: this metric is meaningless for fractionated data! TODO: compensate for lower scores in large studies (with many Raw files), since peptide FDR is accumulating!?

-

Heatmap score [EVD: Pep Missing]: Linear scale of the fraction of missing peptides.

-
-

-back to top -

-
-
-

2.8 EVD: MS2 Oversampling

-
-

An oversampled 3D-peak is defined as a peak whose peptide ion (same sequence and same charge state) was identified by at least two distinct MS2 spectra in the same Raw file. For high complexity samples, oversampling of individual 3D-peaks automatically leads to undersampling or even omission of other 3D-peaks, reducing the number of identified peptides. Oversampling occurs in low-complexity samples or long LC gradients, as well as undersized dynamic exclusion windows for data independent acquisitions.

-

Heatmap score [EVD: MS2 Oversampling]: The percentage of non-oversampled 3D-peaks.

-
-

-back to top -

-
-
-

2.9 EVD: Pep Count

-
-

Number of unique (i.e. not counted twice) peptide sequences including modifications (after FDR) per Raw file. A configurable target threshold is indicated as dashed line.

-

If MBR was enabled, three categories (‘genuine (exclusive)’, ‘genuine + transferred’, ‘transferred (exclusive)’ are shown, so the user can judge the gain that MBR provides.
-Peptides in the ‘genuine + transferred’ category were identified within the Raw file by MS/MS, but at the same time also transferred to this Raw file using MBR. This ID transfer can be correct (e.g. in case of different charge states), or incorrect – see MBR-related metrics to tell the difference. Ideally, the ‘genuine + transferred’ category should be rather small, the other two should be large.

-

If MBR would be switched off, you can expect to see the number of peptides corresponding to ‘genuine (exclusive)’ + ‘genuine + transferred’. In general, if the MBR gain is low and the MBR scores are bad (see the two MBR-related metrics), MBR should be switched off for the Raw files which are affected (could be a few or all).

-

Heatmap score [EVD: Pep Count (>%1.0f)]: Linear scoring from zero. Reaching or exceeding the target threshold gives a score of 100%%.

-
-

-back to top -

-
-
-

2.10 EVD: Pep Intensity

-
-

Peptide precursor intensity per Raw file from evidence.txt. Low peptide intensity usually goes hand in hand with low MS/MS identifcation rates and unfavourable signal/noise ratios, which makes signal detection harder. Also instrument acquisition time increases for trapping instruments.

-

Failing to reach the intensity threshold is usually due to unfavorable column conditions, inadequate column loading or ionization issues. If the study is not a dilution series or pulsed SILAC experiment, we would expect every condition to have about the same median log-intensity (of 2%1.1f). The relative standard deviation (RSD) gives an indication about reproducibility across files and should be below 5%%.

-

Depending on your setup, your target thresholds might vary from PTXQC’s defaults. Change the threshold using the YAML configuration file.

-

Heatmap score [EVD: Pep Intensity (>%1.1f)]: Linear scale of the median intensity reaching the threshold, i.e. reaching 221 of 223 gives score 0.25.

-
-

-back to top -

-
-
-

2.11 EVD: MS Cal-Post

-
-

Precursor mass accuracy after calibration. Failed samples from precalibration data are still marked here. Ppm errors should be centered on zero and their spread is expected to be significantly smaller than before calibration.

-

Heatmap score EVD: MS Cal-Post: The variance and centeredness around zero of the calibrated distribution (function GaussDev).

-
-

-back to top -

-
-
-

2.12 EVD: MS Cal-Pre

-
-

Mass accurary before calibration. Outliers are marked as such (‘out-of-search-tol’) using ID rate and standard deviation as additional information (if available). If any Raw file is flagged ‘failed’, increasing MaxQuant’s first-search tolerance (20ppm by default, here: %1.1f ppm) might help to enable successful recalibration. A bug in MaxQuant sometimes leads to excessively high ppm mass errors (>104) reported in the output data. However, this can sometimes be corrected for by re-computing the delta mass error from other data. If this is the case, a warning (‘bugfix applied’) will be shown.

-

Heatmap score [EVD: MS Cal Pre (%1.1f)]: the centeredness (function CenteredRef) of uncalibrated masses in relation to the search window size.

-
-

-back to top -

-
-
-

2.13 EVD: Prot Count

-
-

Number of Protein groups (after FDR) per Raw file. A configurable target threshold is indicated as dashed line.

-

If MBR was enabled, three categories (‘genuine (exclusive)’, ‘genuine + transferred’, ‘transferred (exclusive)’ are shown, so the user can judge the gain that MBR provides. Here, ‘transferred (exclusive)’ means that this protein group has peptide evidence which originates only from transferred peptide IDs. The quantification is (of course) always from the local Raw file. Proteins in the ‘genuine + transferred’ category have peptide evidence from within the Raw file by MS/MS, but at the same time also peptide IDs transferred to this Raw file using MBR were used. It is not unusual to see the ‘genuine + transferred’ category be the rather large, since a protein group usually has peptide evidence from both sources. To see of MBR worked, it is better to look at the two MBR-related metrics.

-

If MBR would be switched off, you can expect to see the number of protein groups corresponding to ‘genuine (exclusive)’ + ‘genuine + transferred’. In general, if the MBR gain is low and the MBR scores are bad (see the two MBR-related metrics), MBR should be switched off for the Raw files which are affected (could be a few or all).

-

Heatmap score [EVD: Prot Count (>%1.0f)]: Linear scoring from zero. Reaching or exceeding the target threshold gives a score of 100%%.

-
-

-back to top -

-
-
-

2.14 EVD: Reporter intensity

-
-

ITRAQ/TMT reporter intensity boxplots of all PSMs for each channel and Raw file. The opacity (alpha value) of the bar correlates to the number of PSMs with non-zero abundance (1.0 = full labeling; 0.0 = no reporter ions; see heatmap scoring below).

-

There is a similar ‘Experimental Group’ based metric/plot based on proteins.txt.

-

PTXQC uses isotope-corrected intensities (eliminating channel carry-over) to allow for detection of empty channels, e.g. due to mis-labeling. If MaxQuant did no isotope correction (i.e. corrected and uncorrected channels are equal), the plot title will show a warning. The scores are too optimistic in this case (since carry-over will be mistaken for actual signal).

-

Note: global labelling efficiency can only be judged indirectly with this metric, since isobaric reporters where set as fixed modification. Thus, MaxQuant. will only identify labeled peptides in the first place. Observing only very few peptides (see peptide count metric), is a good indicator. However, if only the labeling of a few channels failed, this will be noticable here!

-

Labeling can still be poor, even though identification was successful. In this case, the boxplots will touch the left (0 intensity) side of the plot.

-

A labeling efficiency (LE) is computed per Raw file AND channel as: the percentage of PSMs which have non-zero reporter intensity. Ideally LE reaches 100 percent (all peptides have an intensity in the channel; biological missingness ignored).

-

Heatmap score: minimum labeling efficiency per Raw file across all channels. I.e. for 4-plex ITRAQ and two Raw files, there will be 8 labeling efficiency (LE) values. Each Raw file is now scored by the minimum LE of all its 4 channels.

-
-

-back to top -

-
-
-

2.15 EVD: RT Peak Width

-
-

One parameter of optimal and reproducible chromatographic separation is the distribution of widths of peptide elution peaks, derived from the evidence table. Ideally, all Raw files show a similar distribution, e.g. to allow for equal conditions during dynamic precursor exclusion, RT alignment or peptide quantification.

-

Heatmap score EVD: RT Peak Width: Scored using BestKS function, i.e. the D statistic of a Kolmogoriv-Smirnoff test.

-
-

-back to top -

-
-
-

2.16 EVD: Contaminants

-
-

PTXQC will explicitly show the five most abundant external protein contaminants (as detected via MaxQuant’s contaminants FASTA file) by Raw file, and summarize the remaining contaminants as ‘other’. This allows to track down which proteins exactly contaminate your sample. Low contamination is obviously better. The ‘Abundance class’ models the average peptide intensity in each Raw file and is visualized using varying degrees of transparency. It is not unusual to see samples with low sample content to have higher contamination. If you see only one abundance class (‘mid’), this means all your Raw files have roughly the same peptide intensity distribution.

-

Heatmap score EVD: Contaminants: as fraction of summed intensity with 0 = sample full of contaminants; 1 = no contaminants

-
-

-back to top -

-
-
-

2.17 EVD:Contaminant

-
-

User defined contaminant plot based on peptide intensities and counts. Usually used for Mycoplasma detection, but can be used for an arbitrary (set of) proteins.

-

All proteins (and their peptides) which contain the search string from the YAML file are considered contaminants. The contaminant’s search string is searched in the full FASTA header in proteinGroups.txt. If proteinGroups.txt is not available/found, only protein identifiers can be considered. The search realm used is given in the plot subtitle. You should choose the contaminant name to be distinctive. Only peptides belonging to a single protein group are considered when computing the fractions (contaminant vs. all), since peptides shared across multiple groups are potentially false positives.

-

Two abundance measures are computed per Raw file:

-
    -
  • fraction of contaminant intensity (used for scoring of the metric)
  • -
  • fraction of contaminant spectral counts (as comparison; both should be similar)
  • -
-

If the intensity fraction exceeds the threshold (indicated by the dashed horizontal line) a contamination is assumed.

-

For each Raw file exceeding the threshold an additional plot giving cumulative Andromeda peptide score distributions is shown. This allows to decide if the contamination is true. Contaminant scores should be equally high (or higher), i.e. to the right, compared to the sample scores. Each graph’s subtitle is augmented with a p-value of the Kologorov-Smirnoff test of this data (Andromeda scores of contaminant peptides vs. sample peptides). If the p-value is high, there is no score difference between the two peptide populations. In particular, the contaminant peptides are not bad-scoring, random hits. These p-values are also shown in the first figure for each Raw file. Note that the p-value is purely based on Andromeda scores and is independent of intensity or spectral counts.

-

Heatmap score [EVD: Contaminant ]: boolean score, i.e. 0% (fail) if the intensity threshold was exceeded; otherwise 100% (pass).

-
-

-back to top -

-
-
-

2.18 MSMS: s

-
-

Under optimal digestion conditions (high enzyme grade etc.), only few missed cleavages (MC) are expected. In general, increased MC counts also increase the number of peptide signals, thus cluttering the available space and potentially provoking overlapping peptide signals, biasing peptide quantification. Thus, low MC counts should be favored. Interestingly, it has been shown recently that incorporation of peptides with missed cleavages does not negatively influence protein quantification (see http://pubs.acs.org/doi/abs/10.1021/pr500294d ). However this is true only if all samples show the same degree of digestion. High missed cleavage values can indicate for example, either a) failed digestion, b) a high (post-digestion) protein contamination, or c) a sample with high amounts of unspecifically degraded peptides which are not digested by trypsin.

-

In the rare case that ‘no enzyme’ was specified in MaxQuant, neither scores nor plots are shown.

-

Heatmap score [MSMS: MC]: the fraction (0% - 100%) of fully cleaved peptides per Raw file

-Heatmap score [MSMS: MC Var]: each Raw file is scored for its deviation (score: MedianDist) from the ‘average’ digestion state of the current study. -
-

-back to top -

-
-
-

2.19 MSMS: MS2 Cal

-
-

MS/MS decalibration metric. If most of the fragments are within tighter bounds, you can reduce the fragment mass tolerance to obtain more identifications under the same FDR. On the other hand, if the fragment mass errors are not centered on zero, a recalibration of the instrument should be performed. If the (Gaussian-like) distribution is cut too severely on either side by the search tolerance window in MaxQuant, you might be able to increase the number of identifications by allowing for a wider MS/MS search window when re-running MaxQuant. However, the number of decoy identifications will increase as well, potentially offsetting any gain when FDR is applied.

-

Heatmap score [MSMS: MS2 Cal (Analyzer)]: rewards centeredness around 0 ppm/Da (function Centered).

-
-

-back to top -

-
-
-

2.20 MS2 Scans: Dependent Peps

-
-

Prominent dependent peptides are shown, and how they contribute to increase identification numbers. You can use this metric to assess if specifying another variable (or even fixed) modification makes sense to boost your ID rate (remember that dependent peptides are only an add-on in MaxQuant and do not count towards global ID rates or quantification!). You can also the use the DP-modifications to compare samples (e.g. with modified sample preparation or biological conditions where you expect drastic changes).

-

Hits with ‘Unmodified’ are removed since they do not provide a lot of additional information. The legend provides the top target sites (AAs) for each modification in percent (e.g. ‘Oxidation - nterm(8) L(9) H(7)’ means that of all dependent peptides with Oxidation, 8% of there are nterm, 9% are on L, 7% on H).

-

Heatmap score [MS2 Scans: DepPep]: No score.

-
-

-back to top -

-
-
-

2.21 MS2 Scans: Ion Inj Time

-
-

Ion injection time score - should be as low as possible to allow fast cycles. Correlated with peptide intensity. Note that this threshold needs customization depending on the instrument used (e.g., ITMS vs. FTMS).

-

Heatmap score [MS2 Scans: Ion Inj Time]: Linear score as fraction of MS/MS below the threshold.

-
-

-back to top -

-
-
-

2.22 MS2 Scans: Intensity

-
-

MS/MS identifications can be ‘bad’ for a couple of reasons. It could be computational, i.e. ID rates are low because you specified the wrong protein database or modifications (not our concern here). Another reason is low/missing signals for fragment ions, e.g. due to bad (quadrupole/optics) ion transmission (charging effects), too small isolation windows, etc.

-

Hence, we plot the TIC and base peak intensity of all MS/MS scans (incl. unidentified ones) per Raw file. Depending on the setup, these intensities can vary, but telling apart good from bad samples should never be a problem. If you only have bad samples, you need to know the intensity a good sample would reach.

-

To automatically score this, we found that the TIC should be 10-100x larger than the base peak, i.e. there should be many other ions which are roughly as high (a good fragmentation ladder). If there are only a few spurious peaks (bad MS/MS), the TIC is much lower. Thus, we score the ratio BP * 10 > TIC (this would be 100% score). If it’s only BP * 3 < TIC, we say this MS/MS failed (0%). Anything between 3x and 10x gets a score in between. The score for the Raw file is computed as the median score across all its MS/MS scans.

-Heatmap score [MS2 Scans: Intensity]: Linear score (0-100%) between 3 < (TIC / BP) < 10. -
-

-back to top -

-
-
-

2.23 MS2 Scans: TopN high

-
-

Reaching TopN on a regular basis indicates that all sections of the LC gradient deliver a sufficient number of peptides to keep the instrument busy. This metric somewhat summarizes ‘TopN over RT’.

-

Heatmap score [MS2 Scans: TopN high]: rewards if TopN was reached on a regular basis (function qualHighest)

-
-

-back to top -

-
-
-

2.24 MS2 Scans: TopN ID over N

-
-

Looking at the identification rates per scan event (i.e. the MS/MS scans after a survey scan) can give hints on how well scheduled precursor peaks could be fragmented and identified. If performance drops for the later MS/MS scans, then the LC peaks are probably not wide enough to deliver enough eluent or the intensity threshold to trigger the MS/MS event should be lowered (if LC peak is already over), or increased (if LC peak is still to weak to collect enough ions).

-

Heatmap score [MS2 Scans: TopN ID over N]: Rewards uniform identification performance across all scan events.

-
-

-back to top -

-
-
-

2.25 MS2 Scans: TopN over RT

-
-

TopN over retention time. Similar to ID over RT, this metric reflects the complexity of the sample at any point in time. Ideally complexity should be made roughly equal (constant) by choosing a proper (non-linear) LC gradient. See http://www.ncbi.nlm.nih.gov/pubmed/24700534 for details.

-

Heatmap score [MS2 Scans: TopN over RT]: Rewards uniform (function Uniform) TopN events over time.

-
-

-back to top -

-
-
-

2.26 PAR: MQ Parameters

-
-MaxQuant parameters, extracted from parameters.txt (abbreviated as ‘PAR’), summarizes the settings used for the MaxQuant analysis. Key parameters are MaxQuant version, Re-quantify, Match-between-runs and mass search tolerances. A list of protein database files is also provided, allowing to track database completeness and database version information (if given in the filename). -
-

-back to top -

-
-
-

2.27 PG: Contaminants

-
-

External protein contamination should be controlled for, therefore MaxQuant ships with a comprehensive, yet customizable protein contamination database, which is searched by MaxQuant by default. PTXQC generates a contamination plot derived from the proteinGroups (PG) table showing the fraction of total protein intensity attributable to contaminants. The plot employs transparency to discern differences in the group-wise summed protein abundance. This allows to delineate a high contamination in high complexity samples from a high contamination in low complexity samples (e.g. from in-gel digestion). If you see only one abundance class (‘mid’), this means all your groups have roughly the same summed protein intensity. Note that this plot is based on experimental groups, and therefore may not correspond 1:1 to Raw files.

-

Heatmap score: none (since data source proteinGroups.txt is not related 1:1 to Raw files)

-
-

-back to top -

-
-
-

2.28 PG: LFQ intensity

-
-

Label-free quantification (LFQ) intensity boxplots by experimental groups. Groups are user-defined during MaxQuant configuration. This plot displays a (customizable) threshold line for the desired mean of LFQ intensity of proteins. Raw files which underperform in Raw intensity, are likely to show an increased mean here, since only high-abundance proteins are recovered and quantifyable by MaxQuant in this Raw file. The remaining proteins are likely to receive an LFQ value of 0 (i.e. do not contribute to the distribution). The height of the bar correlates to the number of proteins with non-zero abundance.

-

Contaminants are shown as overlayed yellow boxes, whose height corresponds to the number of contaminant proteins. The position of the box gives the intensity distribution of the contaminants.

-

Heatmap score: none (since data source proteinGroups.txt is not related 1:1 to Raw files)

-
-

-back to top -

-
-
-

2.29 PG: Principal Component

-
-

Principal components plots of experimental groups (as defined during MaxQuant configuration).

-

This plot is shown only if more than one experimental group was defined. If LFQ was activated in MaxQuant, an additional PCA plot for LFQ intensities is shown. Similarly, if iTRAQ/TMT reporter intensities are detected.

-

Since experimental groups and Raw files do not necessarily correspond 1:1, this plot cannot use the abbreviated Raw file names, but instead must rely on automatic shortening of group names.

-

Heatmap score: none (since data source proteinGroups.txt is not related 1:1 to Raw files)

-
-

-back to top -

-
-
-

2.30 PG: Ratio

-
-

This plot shows log2 ratios for SILAC-like experiments (whenever MaxQuant reports a set of ’ratio.*’ columns). Useful to spot unequal channel mixing during sample preparation. If equal mixing is expected, the distribution should be unimodal and its mode close to 1 (i.e., a 1:1 ratio), as indicated by a visual guidance line. Multimodal distributions are flagged as such automatically. If PTXQC detects ratios deviating strongly from 1:1 (parameterized by default beyond the range between 1:4 and 4:1), PTXQC automatically assumes a pulsed experiment and reports the label incorporation in percent for all groups.

-

Heatmap score: none (since data source proteinGroups.txt is not related 1:1 to Raw files)

-
-

-back to top -

-
-
-

2.31 PG: raw intensity

-
-

Intensity boxplots by experimental groups. Groups are user-defined during MaxQuant configuration. This plot displays a (customizable) threshold line for the desired mean intensity of proteins. Groups which underperform here, are likely to also suffer from a worse MS/MS id rate and higher contamination due to the lack of total protein loaded/detected. If possible, all groups should show a high and consistent amount of total protein. The height of the bar correlates to the number of proteins with non-zero abundance.

-

Contaminants are shown as overlayed yellow boxes, whose height corresponds to the number of contaminant proteins. The position of the box gives the intensity distribution of the contaminants.

-

Heatmap score: none (since data source proteinGroups.txt is not related 1:1 to Raw files)

-
-

-back to top -

-
-
-

2.32 PG: Reporter intensity

-
-

ITRAQ/TMT reporter intensity boxplots by experimental groups. Groups are user-defined during MaxQuant configuration. This plot displays a (customizable) threshold line for the desired mean of reporter ion intensity of proteins. The height of the bar correlates to the number of proteins with non-zero abundance.

-

Contaminants are shown as overlayed yellow boxes, whose height corresponds to the number of contaminant proteins. The position of the box gives the intensity distribution of the contaminants. Contaminants should be lower compared to label-free samples, since all contaminants introduced after the labeling should not be identified by Andromeda (since they lack the isobaric tag).

-

There is a similar ‘Raw file’ based metric/plot based on evidence.txt.

-

Heatmap score: none (since data source proteinGroups.txt is not related 1:1 to Raw files)

-
-

-back to top -

-
-
-

2.33 SM: MS2 ID rate

-
-

MS/MS identification rate per Raw file from summary.txt (SM). Each Raw file is colored according to its ID rate and categorized into performance bins as ‘bad’, ‘ok’ and ‘great’. Raw files below ‘ok’, are listed separately on the next page of the report for convenient follow-up.

-

The thresholds for the bins are

-

%s

-Heatmap score [SM: MS2 IDrate (>%1.0f)]: reaches 1 (=100%%) if the threshold for ‘great’ is reached or exceeded. -
-

-back to top -

-
-
- - - - -
- - - - - - diff --git a/R/fcn_misc.R b/R/fcn_misc.R index 5569d91..6adbe51 100644 --- a/R/fcn_misc.R +++ b/R/fcn_misc.R @@ -1206,28 +1206,6 @@ peakWidthOverTime = function(data, RT_bin_width = 2) return(retLStats) } -#' -#' Create Html file with list of QC metrics -#' -#' @param outdir Target dir where PTXQC_list-of-metrics_template.html is written -#' @param outname Filename (without directory) -#' @return Complete filename of written html file -#' -#' @importFrom rmarkdown render pandoc_available -#' -#' @export -#' -createListOfPTXQCMetrics = function(outdir = getwd(), outname = "PTXQC_list-of-metrics.html") -{ - html_template = system.file("./reportTemplate/PTXQC_list-of-metrics_template.Rmd", package="PTXQC") - cat(paste0("HTML TEMPLATE: ", html_template, "\n")) - ## Rmarkdown: convert to Markdown, and then to HTML (or PDF) ... - lst_qcMetrics_ord = getMetricsObjects() - filename = file.path(outdir, outname) - render(html_template, output_file = filename) - return (filename) -} - #' Get all currently available metrics #' #' @param DEBUG_PTXQC Use qc objects from the package (FALSE) or from environment (TRUE/DEBUG) diff --git a/R/qcMetric_EVD.R b/R/qcMetric_EVD.R index 08361b1..5117ee0 100644 --- a/R/qcMetric_EVD.R +++ b/R/qcMetric_EVD.R @@ -825,7 +825,7 @@ qcMetric_EVD_Charge = setRefClass( "Charge distribution per Raw file. For typtic digests, peptides of charge 2 (one N-terminal and one at tryptic C-terminal R or K residue) should be dominant. Ionization issues (voltage?), in-source fragmentation, missed cleavages and buffer irregularities can -cause a shift (see [http://onlinelibrary.wiley.com/doi/10.1002/mas.21544/abstract](Bittremieux 2017, DOI: 10.1002/mas.21544) ). +cause a shift (see [Bittremieux 2017, DOI: 10.1002/mas.21544](http://onlinelibrary.wiley.com/doi/10.1002/mas.21544/abstract) ). The charge distribution should be similar across Raw files. Consistent charge distribution is paramount for comparable 3D-peak intensities across samples. @@ -866,8 +866,7 @@ qcMetric_EVD_IDoverRT = setRefClass( Ideally, the LC gradient is chosen such that the number of identifications (here, after FDR filtering) is uniform over time, to ensure consistent instrument duty cycles. Sharp peaks and uneven distribution of identifications over time indicate potential for LC gradient optimization. -See [http://www.ncbi.nlm.nih.gov/pubmed/24700534](Moruz et al., GradientOptimizer: An open-source graphical environment for calculating optimized gradients in reversed-phase -liquid chromatography, Proteomics, 06/2014; 14) for details. +See [Moruz 2014, DOI: 10.1002/pmic.201400036](http://www.ncbi.nlm.nih.gov/pubmed/24700534) for details. Heatmap score [EVD: ID rate over RT]: Scored using 'Uniform' scoring function, i.e. constant receives good score, extreme shapes are bad. ", diff --git a/R/qcMetric_MSMSScans.R b/R/qcMetric_MSMSScans.R index 04ae60f..02514f2 100644 --- a/R/qcMetric_MSMSScans.R +++ b/R/qcMetric_MSMSScans.R @@ -13,8 +13,7 @@ qcMetric_MSMSScans_TopNoverRT = setRefClass( helpTextTemplate = "TopN over retention time. Similar to ID over RT, this metric reflects the complexity of the sample at any point in time. Ideally complexity should be made roughly equal (constant) by choosing a proper (non-linear) LC gradient. -See [http://www.ncbi.nlm.nih.gov/pubmed/24700534](Moruz et al., GradientOptimizer: An open-source graphical environment for calculating optimized gradients in reversed-phase -liquid chromatography, Proteomics, 06/2014; 14) for details. +See [Moruz 2014, DOI: 10.1002/pmic.201400036](http://www.ncbi.nlm.nih.gov/pubmed/24700534) for details. Heatmap score [MS2 Scans: TopN over RT]: Rewards uniform (function Uniform) TopN events over time. ", diff --git a/README.md b/README.md index f2cee87..d8197fb 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ PTXQC ### Latest changes / Change log - - v0.92.03 - Feb 2018: [List of Metrics][ListOfMetrics] added + - v0.92.03 - Feb 2018: Full List of Metrics added as vignette - v0.92.02 - Jan 2018: plots and metrics of reporter intensity (iTRAQ, TMT, ...) for labeled MSn experiments - v0.92.01 - Oct 2017: fix issue #41 (partial data error) - v0.92.00 - Oct 2017: cleaner R interface; log file for drag'n'drop; fix boxPlots issue (usually for large experiments only); @@ -47,18 +47,17 @@ you can browse the vignettes using either of these commands within R: help(package="PTXQC") browseVignettes(package = 'PTXQC') -If you do not want to wait that long, have a look at the ['vignettes' subfolder][3]. -The top part contains a small table with technical gibberish, but the rest is identical to the -vignettes you would see in R. +If you do not want to wait that long, you can look at the +[latest online vignette at CRAN](https://cran.r-project.org/web/packages/PTXQC/vignettes/) You will find documentation on + - Full List of Quality Metrics with help text - Input and Output - Report customization - (for MaxQuant users) Usage of Drag'n'drop - (for R users) code examples in R -For a comprehensive overview on the types of metrics available see [List of Metrics/Plots][ListOfMetrics], -which also contains a full description for each metric (as seen in the Help section of a Html report). +The 'List of Metrics' vignette contains a full description for each metric (as seen in the Help section of a Html report). ### Installation @@ -141,10 +140,8 @@ The input data is available in the ['inst/examples' subfolder][2]. We recommend to use the most recent PTXQC for the best user experience. - [ListOfMetrics]: http://htmlpreview.github.io/?https://github.com/cbielow/PTXQC/blob/master/PTXQC_list-of-metrics.html [1]: https://github.com/cbielow/PTXQC/tree/master/inst/dragNdrop [2]: https://github.com/cbielow/PTXQC/tree/master/inst/examples - [3]: https://github.com/cbielow/PTXQC/tree/master/vignettes [issuetracker]: https://github.com/cbielow/PTXQC/issues [JPR_PTXQC]: https://github.com/cbielow/PTXQC/releases/tag/v0.69.3 [Ref_VignFAQ]: https://github.com/cbielow/PTXQC/blob/master/vignettes/PTXQC-FAQ.Rmd diff --git a/inst/reportTemplate/PTXQC_list-of-metrics_template.Rmd b/inst/reportTemplate/PTXQC_list-of-metrics_template.Rmd deleted file mode 100644 index 2c4ac45..0000000 --- a/inst/reportTemplate/PTXQC_list-of-metrics_template.Rmd +++ /dev/null @@ -1,63 +0,0 @@ ---- -title: "PTXQC List of Metrics" -output: - html_document: - mathjax: null - number_sections: yes - toc: yes - pdf_document: - toc: yes ---- - - - - - - - -```{r setup, include=FALSE} -## global options -knitr::opts_chunk$set(echo=FALSE, warning=FALSE, error=FALSE, message=FALSE, fig.width=10) -``` - -# Overview - -The following metrics are implemented in PTXQC.
- -Reasons why metrics might not appear in every report - - * applicable only to certain types of data, e.g. SILAC or TMT - * metric was disabled manually using the YAML config file - * missing input data (incomplete tables), e.g. only evidence.txt is present - - -Some metrics are incompletely named here, since they might contain (hithero unknown) thresholds. - -# Metrics -```{r metrics, echo=FALSE, results="asis"} - for (qcm in lst_qcMetrics_ord) - { - newname = gsub("(.*)\\(.*" , "\\1", gsub("[\\*~%]" , " ", gsub("[\\^\">]" , "", qcm$qcName))) - cat(paste0('\n -## ', newname, ' -\n\n -
', qcm$helpTextTemplate, '
-

[back to top](#TOC)

\n\n')) - } -``` - diff --git a/man/createListOfFunctions.Rd b/man/createListOfFunctions.Rd deleted file mode 100644 index 2016ebe..0000000 --- a/man/createListOfFunctions.Rd +++ /dev/null @@ -1,11 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/fcn_misc.R -\name{createListOfFunctions} -\alias{createListOfFunctions} -\title{Create Html file with list of QC metrics} -\usage{ -createListOfFunctions() -} -\description{ -Create Html file with list of QC metrics -} diff --git a/man/createListOfPTXQCMetrics.Rd b/man/createListOfPTXQCMetrics.Rd deleted file mode 100644 index 8eaf0b2..0000000 --- a/man/createListOfPTXQCMetrics.Rd +++ /dev/null @@ -1,20 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/fcn_misc.R -\name{createListOfPTXQCMetrics} -\alias{createListOfPTXQCMetrics} -\title{Create Html file with list of QC metrics} -\usage{ -createListOfPTXQCMetrics(outdir = getwd(), - outname = "PTXQC_list-of-metrics.html") -} -\arguments{ -\item{outdir}{Target dir where PTXQC_list-of-metrics_template.html is written} - -\item{outname}{Filename (without directory)} -} -\value{ -Complete filename of written html file -} -\description{ -Create Html file with list of QC metrics -} diff --git a/vignettes/PTXQC-ListOfMetrics.Rmd b/vignettes/PTXQC-ListOfMetrics.Rmd new file mode 100644 index 0000000..5f36b31 --- /dev/null +++ b/vignettes/PTXQC-ListOfMetrics.Rmd @@ -0,0 +1,85 @@ +--- +title: "List of Metrics" +author: "Chris Bielow " +date: '`r Sys.Date()`' +output: + html_document: + mathjax: null + number_sections: yes + toc: no + pdf_document: + toc: no +vignette: > + %\VignetteIndexEntry{List of Metrics} + %\VignetteEngine{knitr::rmarkdown} + \usepackage[utf8]{inputenc} +--- + + + + + + + + +```{r setup, include=TRUE, echo=FALSE, results="asis"} +## global options +knitr::opts_chunk$set(echo=FALSE, warning=FALSE, error=FALSE, message=FALSE, fig.width=10) + +lst_qcMetrics_ord = PTXQC:::getMetricsObjects() + +txt_TOC = "# Table of Contents + - [Overview](#Overview) + - [Metrics](#Metrics) +" +txt_BODY = "" +for (qcm in lst_qcMetrics_ord) +{ + newname = gsub("(.*)\\(.*" , "\\1", gsub("[\\*~%]" , " ", gsub("[\\^\">]" , "", qcm$qcName))) + ## remove weird symbols (to serve as link) + newname_lnk = gsub("[^a-zA-Z0-9]", "", newname) + txt_TOC = paste0(txt_TOC, paste0(" - [", newname, "](#", newname_lnk, ")\n")) + txt_BODY = paste0(txt_BODY, '\n +## ', newname, ' +\n +
', qcm$helpTextTemplate, '
+

-- back to top --

\n\n') +} + +## print the TOC +cat(txt_TOC) + +``` + +# Overview + +The following metrics are implemented in PTXQC.
+ +Reasons why metrics might not appear in every report + + * applicable only to certain types of data, e.g. SILAC or TMT + * metric was disabled manually using the YAML config file + * missing input data (incomplete tables), e.g. only evidence.txt is present + + +Some metrics are incompletely named here, since they might contain (hithero unknown) thresholds. + +# Metrics
+```{r metrics, echo=FALSE, results="asis"} + cat(txt_BODY) +``` + +If the above list is empty, the vignette was not compiled (e.g. when viewing on GitHub). +Please see https://CRAN.R-project.org/package=PTXQC --> Vignettes for a compiled version. +