-
Notifications
You must be signed in to change notification settings - Fork 16
Workflow Bakal lab
Aim to characterise signalling networks that regulate cell shape and proliferation, and understand how deregulation of these networks drives cancer progression.
Biological problems fall into two categories:
-
Screening to identify candidate genes involved in cell shape regulation, signalling activity, and cell-cycle progression
-
Live cell imaging to understand the dynamics of signalling networks
The defining technique in all studies is to leverage heterogeneity within populations of cells to solve these problems. For example, using heterogeneity in cell shape and crowding to understand NFkB activation (Sero, 2015).
Segmentation and feature extraction
Initial extraction and segmentation steps are specific to the dataset however selection/transformation principles are more broadly used.
Screening:
- Imaging is performed using the Perkin-Elmer Opera microscopes.
- Datasets are uploaded to a server running the Perkin-Elmer Columbus data management and analysis system.
- Drag and drop style GUI allows analysis pipelines to be constructed for segmentation and feature extraction.
- Standard features defined within the Columbus package are extracted, feature numbers range from ~20 to ~200.
- Outputs are either well averaged vectors, or single cell data.
Note: Columbus represents a GUI for the Acapella scripting language. Scripting in this language provides more flexibility for defining features, e.g. protrusion features used in (Sailem 2014).
live cell analysis
- Imaging is performed using the Perkin-Elmer Opera microscope with live cell chamber.
- Datasets are converted into tiff sequences
- Matlab based marker controlled watershed segmentation:
http://uk.mathworks.com/help/images/examples/marker-controlled-watershed-segmentation.html
- Feature extraction using region props and gray comatrix functions.
- Matlab based probabilistic tracking as defined in (Magnusson, 2015)
Note: Live cell studies have previously been carried out using the Volocity software package, and attempts to track cells using Columbus have also been made. In the case of Volocity extensive manual work is required, tracking is not automated. Columbus meanwhile can only track very slow moving, or frequently imaged cells.
Initial checks
Following segmentation and feature extraction, initial steps are performed to identify batch effects/plating errors.
Following Z-score normalisation screen-wise:
To detect batch effects: The correlation between well averaged feature vectors for all wells of a screen is calculated. Resulting correlation matrix is plotted as a heat map, very clear visually if batch effects are present.
To detect plating errors: these are generally errors from plate preparation procedures, these are identified by visually inspecting PC scores of well averaged feature vectors mapped onto plates. This is looking for errors effecting rows or blocks of a plate.
Feature selection/Data transformation
Feature selection/Data transformation within the Bakal lab, generally seeks to cluster single cells into specific shapes.
Shapes are identified within populations using reference shapes in earlier work e.g. (Bakal, 2007; Yin, 2014), and unsupervised clustering in more recent work e.g. (Salem, 2014; Cooper, 2015).
Vectors describing the fraction of cells adopting a specific shape are used to understand the difference between perturbations
Prior to shape clustering data transformations, notably PCA and/or binning, are performed upon single cell data, clustering is applied to a lower dimensional feature space. to achieve this:
- Single cell profiles from across the screen are sampled and pooled
- The transformation is defined on the pooled sample
- Clustering is performed on the pooled sample (Model selection is necessary here for unsupervised)
- The transformation matrix and clustering parameters are applied to single cell populations from each well.
An visual example of this from (Sailem, 2014)
Figure 2. Single-cell clustering. (a) Average silhouette value for different numbers of clusters using Gaussian mixture modelling (GMM) and hierarchical clustering. Higher averages represent better cluster quality, and the best clustering for this dataset was reached when cells were grouped into seven clusters using hierarchical clustering. (b) Silhouette values of single cells for the best model. (c) Silhouette values of single cells for the best model after correction using KNN. (d) Single-cell data for all TCs are projected in the first three PCs and coloured based on the single-cell hierarchical clustering results, where clusters are denoted by shapes 1–7. Next to each shape cluster is a representative cell shape from that cluster. (e) Qualitative interpretation of PC space.
I have found the quality of a workflow can be assessed using cost functions, which seek to minimise distance measures between feature vectors and maximise the distance between feature vectors of different conditions e.g. Davies Bouldin Index (Cooper 2015).
Those scoring highest with such measures led to feature space’s which best separated positive from negative controls, as well as giving results which agreed best with visual inspection of images.
PCA performed on well averaged feature vectors consistently gives the highest scoring results. However good transformation into shape vectors gives similar results, which are more biologically interpretable, and importantly can be easily used to characterise transitions between shapes (Cooper, 2015).
Optimisation of feature space directly against such cost functions can work, though effective cross validation is required in these instances.
Overall, Data transformation and binning have a much greater effect on cost function than Feature selection
Visualisation
We use a wide array of visualisation techniques, we review these in (Sailem, 2015), the ‘workhorse’ visualisations are scatter plots, heat maps and histograms.
From analysis of such plots, and referring back to original images, once hit’s are identified we draw biological conclusions and identify lines of work to pursue.
Note: Handling Batch Effects
Screen repeats are often performed on different days and occasionally by different experimenters. This I have found to often result in large batch effects between screen repeats.
Plate-wise normalisation, and multiple other methods designed to reduce batch effects have consistently reduced final cost function values (here cost function was measured within single plates and summed).
Canonical correlations analysis (CCA), not only eliminated these batch effects but lower dimensional projections were very structured and lead to identification of a meaningful phenotype, notably G1 arrest, which has since been biologically validated.
More complex optimisation procedures, based on generalising CCA to multiple plates and repeats have also provided insights.
Bakal, Chris, et al. "Quantitative morphological signatures define local signaling networks regulating cell morphology." science 316.5832 (2007): 1753-1756.
Cooper, Sam, et al. "Apolar and polar transitions drive the conversion between amoeboid and mesenchymal shapes in melanoma cells." Molecular biology of the cell 26.22 (2015): 4163-4170.
Magnusson, Klas EG, et al. "Global linking of cell tracks using the viterbi algorithm." Medical Imaging, IEEE Transactions on 34.4 (2015): 911-929.
Sailem, Heba, et al. "Cross-talk between Rho and Rac GTPases drives deterministic exploration of cellular shape space and morphological heterogeneity." Open biology 4.1 (2014): 130132.
Sero, Julia E., et al. "Cell shape and the microenvironment regulate nuclear translocation of NF‐κB in breast epithelial and tumor cells." Molecular systems biology 11.3 (2015): 790.
Yin, Zheng, et al. "A screen for morphological complexity identifies regulators of switch-like transitions between discrete cell shapes." Nature cell biology 15.7 (2013): 860-871.
Implementing profiling workflows
- IA-Lab (AstraZeneca Cambridge)
- Bakal (Inst. Cancer Research London)
- Borgeson (Recursion)
- Boutros (German Cancer Research Center)
- Carpenter (Broad Imaging Platform)
- Carragher (U Edinburgh)
- Clemons (Broad Comp. Chem. Bio)
- de Boer (Maastricht U)
- Frey (U Toronto)
- Horvath (Hungarian Acad of Sciences)
- Huber (EMBL Heidelberg)
- Jaensch (Janssen)
- Jaffe (Broad Comp. Proteomics)
- Jones (Harvard)
- Linington (Simon Fraser U)
- Pelkmans (U Zurich)
- Qiu (Georgia Tech)
- Ross (Novartis High Throughput Biol.)
- Rees (Swansea U)
- Subramanian (Broad CMap)
- Sundaramurthy (Nat. Center for Biol. Sciences)