Skip to content

Example

Martin Černý edited this page May 22, 2017 · 5 revisions

Example usage of CyGenexpi

To follow this example, download the example data. It contains a Cytoscape session with a small fragment of B. subtilis regulatory network (file "example1.cys") from (Arrieta-Ortiz et al. 2015, original source) and expression time series collected during germination (file "B.subtilis-germination (GSE6865).tsv"; from Keijser, B.J. et al. 2007, original source).

The big picture

In a typical usage of Genexpi, we have a base network of possible regulations (e.g., from an ChIP-seq experiment or from literature) and we want to determine, which of the regulations are supported by time series expression data (e.g., micro array or RNA-seq). To do so, Genexpi tries to fit a simplified, but biologically plausible model of each regulation.

The example presented here is constrained to a small fragment of a regulatory network to be concise, but Genexpi works quickly (below 1 minute) with networks of hundreds or thousands of regulations (depending on the actual hardware used).

Prerequisities

A prerequisity for using Genexpi is an OpenCL-capable device. A modern GPU is recommended, but OpenCL is also supported by all recent processors as well as Xeon Phi cards. See Configuration for further details. Configuration dialog is also shown the first time you use any OpenCL-dependent feature of Genexpi.

Let's do this

  1. Preparing data 1. Open the example session (example1.cys) in Cytoscape, the network fragment we will work with looks like this:

    2. Run the Genexpi wizard, which you can find at Apps -> Genexpi -> Genexpi wizard.

    3. The first step of the wizard appears, asking for the time series data. We do not have the data in the session yet (hence the empty choice), so we need to import the time series. We have a .TSV file se we choose the appropriate option. (SOFT files are the format used by Gene expression omnibus). If you have a SOFT file, the steps are almost the same, for more details, check out the SOFT-file tutorial of Cy-dataseries plugin.

4. This shows the import dialog. Since the data is already nicely formatted (time points in the header, each row starts with the name of the gene), there is little we need to change,
    we only set the name of the imported series. We'll go with "Expression":

    [[img/import-dialog.png]]

    More details on importing time series and how to use the other options on the import dialog to import a wide range of file formats can be found in the [documentation for Cy-dataseries plugin](https://github.com/martincerny/cy-dataseries/wiki). Also note that if the data are in log2 ration, they should be exponentiated before use with Genexpi.

5. We are now asked to describe how the time series maps to the nodes in the network. Once again, we can keep the defaults (create a new column "Expression" in the node table 
    and match the gene names used in the time series with the "name" column in the node table):

    [[img/mapping-dialog.png]]

6. Genexpi works best on smoothed time series (it also requires constant time intervals in the time series, which the example data series does not have). So we follow with smoothing:

    [[img/step1-smooth.png]]

7. The "Interactive smoothing" panel is shown in the "Result pane". Here we can adjust the smoothing bandwidth, until we are happy with how the data is interpolated (we chose 7 for this example). 
    The panel shows a random subset of the time series, to see different genes, click "See different examples". Also, estimating at 100 equidistant points is a good choice for us. 

    [[img/smoothing-dialog.png]]

8. Once happy with the parameters, click "Perform smoothing" a dialog to configure where the smoothed time series is stored appears. Once again, we can be happy with the defaults 
    (the new time series will be mapped the same way as the original, only the column for mapping will have a "_smooth" suffix)

    [[img/smoothing-output.png]]
  1. Once the series has been imported and smoothed, we choose it and click next.

  2. Now we need to provide, how much noise we expect in our data. Assuming 20% error (0.2 relative error) has been recommended in the literature. We further know, that small measurements are even less reliable and so we add a 500 minimal error (e.g. measurement error is always at least 500 units). Usually you either set the minimal error (as we did) or set the absolute error (which is added to the relative error). The actual magnitude of the minimal or absolute error depends on your data and measurement protocol.

    Genexpi assumes values that are withing the error margin to be non-distinguishable from the measured value.

    We also want to "Exclude genes fitted with constant synthesis" - as genes that could be fitted this way could be fitted by any regulator.

  3. After clicking "Next", Genexpi categorizes the genes in the network into 3 groups:

    • no change: Expression of such genes is not distinguishable from a flat line (given the error provided), those are marked red.
    • constant synthesis: Expression of such genes could be to a good extent modelled with constant regulatory input, those are marked yellow.
    • useful profile: Expression of this genes is non-trivial, those genes should be further explored.

    The program lets us see the categorization in the result panel and it is possible to override its decisions, simply use the choice near "Change tag to" to change it. Once we are content with the tags associated with all the displayed genes, click "Approve tags" to see and check more genes.

  4. Once all genes have been approved (or if you trust Genexpi to do a good job), click "Next" in the wizard panel to run the actual prediction.

  5. Since some of the genes were excluded from prediction (because their profiles are flat or could be fit by constant synthesis), the main prediction is run only for 2 regulator-target pairs. The results can once again be inspected and checked in the results panel. Here we see the profile of the regulator, the target (including error margins) and the modelled profile, assuming the regulation is real. It also shows the parameters of the modelled profile.

    The fits are categorized into 3 categories: no fit (white), questionable fit (light green) and good fit (dark green). It is also possible to override the categorization made by Genexpi, the same way as in the previous step.

  6. Now, we make sure the "Genexpi" visual style is being used for the network

    and the final network looks like this:

    Here, the nodes (genes) that are orange have been excluded from prediction as their profiles are flat, those that are yellow have been excluded as they could be fit by any regulator (constant synthesis).

    The edges are color coded as below:

    • orange: excluded from prediction
    • green/light green: good fit/questionable fit
    • gray - dotted: predicted, but not fitted.

Interpreting the results

The results, unfortunately, cannot say much about the suspected regulations that were excluded from predictions. There are 3 possible reasons for a regulation to be excluded

  • the regulator had flat profile (no change). We simply do not have sufficent resolution for regulator data to say anything.
  • the target had flat profile (no change). If there is a regulatory effect, its magnitude is smaller than the sensitivity of our measurements.
  • the target could be fitted with constant synthesis. If there is a regulatory effect, even the lowest measured concentration of the regulator is enough to saturate the regulation.

The strongest conclusions can be made, when the regulation could not be fitted. This means that the regulation is not supported by the data and there is either another factor co-regulating the target or there is no regulatory influence.

If the regulation is well fitted, we can consider it plausible, but no model can give a definitive proof, only experiment can.

Clone this wiki locally