Workflow Carragher lab

Biological motivations

We are interested in identifying and classifying mechanism of action of known and unknown compounds, target deconvolution coupling high-content imaging with proteomics. Screening known and unknown compounds against complex disease models to better predict in vivo efficacy.

Image analysis and feature extraction

We use a mixture of CellProfiler and MetaXpress (Molecular Devices) to segment and extract measurements from cells. The choice of software depends on the lab member, the number of images and the complexity of the assay.

Image quality control

To detect debris and out-of-focus images, we find outliers in the ImageQuality measurements produced by CellProfiler.

PowerLogLogSlope [1] (DAPI/Hoechst channel), remove images > 3rd quartile + 1.5 * IQR.
Hampel outlier test on all ImageQuality metrics. Remove images that are outliers in a large proportion of metrics.

Remove images with none of very few cells. This is useful for morphological profiling as the results from images containing only 3 or so cells can vary wildly. Although this depends on the purpose of the screen, as in simple assays cell count can be an important measurement.
For the lab members using MetaXpress, they often use a PCA of the data and gate outliers in the first 3 principal components, they find this works pretty well for obvious image artefacts.

Data cleaning

Identify columns that contain all NA values.
Remove rows containing any NA values, sub-setting the data by complete.cases().

Normalize features

This varies depending on the assay. A typical approach would be to take the median value of the negative control per plate, per feature: and subtract this value from the treated wells. Another approach is to divide by the negative control median, this centres the values on the negative control although skews the data toward positive values.

Transform features

We use a robust z-score to scale and centre the values per feature.

Correct for systematic effects

We visualise plate maps as a heatmap by cell count and first principal component to detect systematic effects. Plates are removed if systematic effects are observed. Unfortunately we use plate layouts that prevent the use of a median polish.

Here's an example of a systematic effect when our liquid handling system restarted halfway through a plate and aliquoted half the wells with 2x the staining solution (plate 3322).

Select features / reduce dimensionality

Features are selected by removing pairs of highly correlated features - caret::findCorrelation())
Features are removed that have little to no variance - caret::nearZeroVar()
Features are removed if they have poor correlation between negative and positive control replicates

feature_reduction

If there is an informative positive control, then we use a random forest to classify positive and negative controls, and then select features based on those that most decrease classification accuracy if they were removed.
PCA, using number of principal components that capture x proportion of the variance.

Create per-well profiles

We aggregate each feature by the median for that well.

We often screen across multiple concentrations, especially with smaller compound library or secondary screens when following up hits. For this we have tried concatenating the concentration data to create a f*t length compound vector, where f is the number of features and t is the number of titrations.

Measure similarity between profiles

Typically Manhattan distance (l1 norm) or Spearman correlation between profiles.

Downstream analysis / visualization

We normally use principal component analysis and plot the first two principal components, coloured by cell line or compound mechanism of action to get a visualisation of our data.

We also use hierarchical clustering of compounds to determine how compounds are similar to one another, or to predict the mechanism of action of an unknown compound compared to a known reference set.