Skip to content
pqiu edited this page May 10, 2016 · 59 revisions

Biological motivations

In our group, we are interested in developing computational methods for analysis and visualization of single-cell data, generated by flow cytometry, CyTOF and sequencing. We only recently started to look at image-based single-cell data. The general biological problem is to understand cellular heterogeneity underlying single-cell data, and correlate the cellular heterogeneity with overall phenotypic features such as disease progression, treatment response.

Image analysis and feature extraction

We do not have experience in dealing with images. We have been relying on our collaborators to perform image segmentation and feature extraction. We have looked at cell*feature data matrices derived by CellProfiler and ImageStream software.

Select features / reduce dimensionality

  • Features from image-based data can be highly correlated. We typically perform agglomerative hierarchical clustering to cluster features into groups containing highly correlated features. After that, an average feature is created from each group, so that highly correlated/redundant features are collapsed.
  • Features can be selected or ranked based on training labels of the images. For example, for more informative features, their distributions should be relatively similar across images derived under the same perturbation condition, and more different across images derived under different conditions. Such a uni-variate criterion can be used to select features or rank order features.

Analyze and summarize cellular heterogeneity

We have developed the SPADE algorithm for uncovering the underlying cellular hierarchy of single-cell data generated by flow cytometry and CyTOF. This algorithm can also be applied to image-based single-cell data. SPADE stands for Spanning-tree Progression Analysis of Density-normalized Events (SPADE). Below is the SPADE workflow illustrated using a flow cytometry data set.

  • Input to SPADE is single-cell data matrices for multiple biological samples. Each sample/matrix can be viewed as a point cloud living in a high-dimensional space. The point clouds corresponding to two different samples can be very similar or very different from each other.
* SPADE first performs density-dependent downsampling and concatenate the samples. This process creates a "union" sample that correspond to the union of all individual point clouds in this dataset.

Create per-well profiles

Aggregate single-cell data from each well to create a per-well morphological profile. This is typically done by computing the median across all cells in the well, per feature. Other approaches include methods to first identify sub-populations, then construct a profile by counting the number of cells in each sub-population.

(How we do it)

Measure similarity between profiles

An appropriate similarity metric is crucial to the downstream analysis. Pearson correlation and Euclidean distance are the most common metrics used.

(How we do it)

Downstream analysis / visualization

Analysis/visualization performed after creating profiles. E.g. clustering, classification, visualization using 2D embeddings, etc.


References

  • Qiu P, Simonds EF, Bendall SC, Gibbs KD Jr, Bruggner RV, Linderman MD, Sachs K, Nolan GP, Plevritis SK. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nature Biotechnology, 2011 Oct 2;29(10):886-91. doi: 10.1038/nbt.1991.
  • Qiu P, Inferring phenotypic properties from single-cell characteristics. PLoS One. 2012;7(5):e37038. doi: 10.1371/journal.pone.0037038. Epub 2012 May 25.
  • Ljosa V, Caie PD, Ter Horst R, Sokolnicki KL, Jenkins EL, Daya S, et al. Comparison of methods for image-based profiling of cellular morphological responses to small-molecule treatment. J Biomol Screen. 2013;18: 1321–1329.
Clone this wiki locally