Workflow Bakal lab

Biological motivations

Aim to characterise signalling networks that regulate cell shape and proliferation, and understand how deregulation of these networks drives cancer progression.

Biological problems fall into two categories:

Screening to identify candidate genes involved in cell shape regulation, signalling activity, and cell-cycle progression
Live cell imaging to understand the dynamics of signalling networks

The defining technique in all studies is to leverage heterogeneity within populations of cells to solve these problems. For example, using heterogeneity in cell shape and crowding to understand NFkB activation (Sero, 2015).

How we do it

Segmentation and feature extraction

Initial extraction and segmentation steps are specific to the dataset however selection/transformation principles are more broadly used.

Screening:

Imaging is performed using the Perkin-Elmer Opera microscopes.
Datasets are uploaded to a server running the Perkin-Elmer Columbus data management and analysis system.
Drag and drop style GUI allows analysis pipelines to be constructed for segmentation and feature extraction.
Standard features defined within the Columbus package are extracted, feature numbers range from ~20 to ~200.
Outputs are either well averaged vectors, or single cell data.

Note: Columbus represents a GUI for the Acapella scripting language. Scripting in this language provides more flexibility for defining features, e.g. protrusion features used in (Sailem 2014).

live cell analysis

Imaging is performed using the Perkin-Elmer Opera microscope with live cell chamber.
Datasets are converted into tiff sequences
Matlab based marker controlled watershed segmentation:

http://uk.mathworks.com/help/images/examples/marker-controlled-watershed-segmentation.html

Feature extraction using region props and gray comatrix functions.
Matlab based probabilistic tracking as defined in (Magnusson, 2015)

Note: Live cell studies have previously been carried out using the Volocity software package, and attempts to track cells using Columbus have also been made. In the case of Volocity extensive manual work is required, tracking is not automated. Columbus meanwhile can only track very slow moving, or frequently imaged cells.

Initial checks

Following segmentation and feature extraction, initial steps are performed to identify batch effects/plating errors.

Following Z-score normalisation screen-wise:

To detect batch effects: The correlation between well averaged feature vectors for all wells of a screen is calculated. Resulting correlation matrix is plotted as a heat map, very clear visually if batch effects are present.

To detect plating errors: these are generally errors from plate preparation procedures, these are identified by visually inspecting PC scores of well averaged feature vectors mapped onto plates. This is looking for errors effecting rows or blocks of a plate.

Feature selection/Data transformation

Feature selection/Data transformation within the Bakal lab, generally seeks to cluster single cells into specific shapes.

Shapes are identified within populations using reference shapes in earlier work e.g. (Bakal, 2007; Yin, 2014), and unsupervised clustering in more recent work e.g. (Salem, 2014; Cooper, 2015).

Vectors describing the fraction of cells adopting a specific shape are used to understand the difference between perturbations

Prior to shape clustering data transformations, notably PCA and/or binning, are performed upon single cell data, clustering is applied to a lower dimensional feature space. to achieve this:

Single cell profiles from across the screen are sampled and pooled
The transformation is defined on the pooled sample
Clustering is performed on the pooled sample (Model selection is necessary here for unsupervised)
The transformation matrix and clustering parameters are applied to single cell populations from each well.

An visual example of this from (Sailem, 2014)

Figure 2. Single-cell clustering. (a) Average silhouette value for different numbers of clusters using Gaussian mixture modelling (GMM) and hierarchical clustering. Higher averages represent better cluster quality, and the best clustering for this dataset was reached when cells were grouped into seven clusters using hierarchical clustering. (b) Silhouette values of single cells for the best model. (c) Silhouette values of single cells for the best model after correction using KNN. (d) Single-cell data for all TCs are projected in the first three PCs and coloured based on the single-cell hierarchical clustering results, where clusters are denoted by shapes 1–7. Next to each shape cluster is a representative cell shape from that cluster. (e) Qualitative interpretation of PC space.

I have found the quality of a workflow can be assessed using cost functions, which seek to minimise distance measures between feature vectors and maximise the distance between feature vectors of different conditions e.g. Davies Bouldin Index (Cooper 2015).

Those scoring highest with such measures led to feature space’s which best separated positive from negative controls, as well as giving results which agreed best with visual inspection of images.

PCA performed on well averaged feature vectors consistently gives the highest scoring results. However good transformation into shape vectors gives similar results, which are more biologically interpretable, and importantly can be easily used to characterise transitions between shapes (Cooper, 2015).

Optimisation of feature space directly against such cost functions can work, though effective cross validation is required in these instances.

Overall, Data transformation and binning have a much greater effect on cost function than Feature selection

Visualisation

We use a wide array of visualisation techniques, we review these in (Sailem, 2015), the ‘workhorse’ visualisations are scatter plots, heat maps and histograms.

From analysis of such plots, and referring back to original images, once hit’s are identified we draw biological conclusions and identify lines of work to pursue.

Note: Handling Batch Effects

Screen repeats are often performed on different days and occasionally by different experimenters. This I have found to often result in large batch effects between screen repeats.

Plate-wise normalisation, and multiple other methods designed to reduce batch effects have consistently reduced final cost function values (here cost function was measured within single plates and summed).

Canonical correlations analysis (CCA), not only eliminated these batch effects but lower dimensional projections were very structured and lead to identification of a meaningful phenotype, notably G1 arrest, which has since been biologically validated.

More complex optimisation procedures, based on generalising CCA to multiple plates and repeats have also provided insights.

References

Bakal, Chris, et al. "Quantitative morphological signatures define local signaling networks regulating cell morphology." science 316.5832 (2007): 1753-1756.

Cooper, Sam, et al. "Apolar and polar transitions drive the conversion between amoeboid and mesenchymal shapes in melanoma cells." Molecular biology of the cell 26.22 (2015): 4163-4170.

Magnusson, Klas EG, et al. "Global linking of cell tracks using the viterbi algorithm." Medical Imaging, IEEE Transactions on 34.4 (2015): 911-929.

Sailem, Heba, et al. "Cross-talk between Rho and Rac GTPases drives deterministic exploration of cellular shape space and morphological heterogeneity." Open biology 4.1 (2014): 130132.

Sero, Julia E., et al. "Cell shape and the microenvironment regulate nuclear translocation of NF‐κB in breast epithelial and tumor cells." Molecular systems biology 11.3 (2015): 790.

Yin, Zheng, et al. "A screen for morphological complexity identifies regulators of switch-like transitions between discrete cell shapes." Nature cell biology 15.7 (2013): 860-871.

Cytomining Hackathon

Implementing profiling workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Bakal lab

Biological motivations

How we do it

References

Cytomining Hackathon

Groups

Clone this wiki locally