-
Notifications
You must be signed in to change notification settings - Fork 16
Workflow Qiu lab
In our group, we are interested in developing computational methods for analysis and visualization of single-cell data, generated by flow cytometry, CyTOF and sequencing. We only recently started to look at image-based single-cell data. The general biological problem is to understand cellular heterogeneity underlying single-cell data, and correlate the cellular heterogeneity with overall phenotypic features such as disease progression, treatment mechanism.
We do not have experience in dealing with images. We have been relying on our collaborators to perform image segmentation and feature extraction. We look at cell*feature data matrices derived by CellProfiler and ImageStream software.
- Features from image-based data can be highly correlated. We typically perform agglomerative hierarchical clustering to cluster features into groups containing highly correlated features. After that, an average feature is created from each group, so that highly correlated/redundant features are collapsed.
- Features can be selected or ranked based on training labels of the images. For example, for more informative features, their distributions should be relatively similar across images derived from replicates, and more different across images derived under different conditions. Such a uni-variate criterion can be used to select features or rank order features.
We have developed the SPADE algorithm for uncovering the underlying cellular hierarchy of single-cell data generated by flow cytometry and CyTOF (Qiu et al 2011). This algorithm can also be applied to image-based single-cell data. SPADE stands for Spanning-tree Progression Analysis of Density-normalized Events (SPADE). Below is the SPADE workflow illustrated using a flow cytometry data set.
- Input to SPADE is single-cell data matrices for multiple biological samples. Each sample/matrix can be viewed as a point cloud living in a high-dimensional space. The point clouds corresponding to two different samples can be very similar or very different from each other.
Note: this dataset is not published, but the workflow has been published using an AML flow cytometry dataset. (Qiu 2012).
We have applied the SPADE pipeline to single-cell data generated by CellProfiler. The data was published in (Ljosa 2013). Data preprocessing included normalization according to DMSO plates, and feature selection based on variances within and across images. 29 features were selected to build the SPADE tree. The per-well (or per-sample) profile is the cell distribution on the SPADE tree. Below are a few examples of cell distributions on the SPADE tree, where we observe that perturbations with the same mechanism lead to similar distributions.
This dataset has a total of 103 compounds/perturbations to be classified into 13 mechanisms. The heatmap below shows the 103*103 EMD distance, which correlated well with the mechanisms. Based on the EMD matrix, the simple nearest neighbor classifier results in 91% accuracy, comparable to that reported in (Ljosa 2013). We can also use SPADE to visualize the EMD matrix, showing which mechanisms are similar to each other and which mechanisms are far away from each other.
We have an Matlab implementation of SPADE with GUI, available at http://pengqiu.gatech.edu/software/SPADE/index.html.
- Qiu P, Simonds EF, Bendall SC, Gibbs KD Jr, Bruggner RV, Linderman MD, Sachs K, Nolan GP, Plevritis SK. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE, Nature Biotechnology, 2011 Oct 2;29(10):886-91.
- Qiu P, Inferring phenotypic properties from single-cell characteristics. PLoS One. 2012;7(5):e37038.
- Ljosa V, Caie PD, Ter Horst R, Sokolnicki KL, Jenkins EL, Daya S, et al. Comparison of methods for image-based profiling of cellular morphological responses to small-molecule treatment. J Biomol Screen. 2013;18: 1321–1329.
Implementing profiling workflows
- IA-Lab (AstraZeneca Cambridge)
- Bakal (Inst. Cancer Research London)
- Borgeson (Recursion)
- Boutros (German Cancer Research Center)
- Carpenter (Broad Imaging Platform)
- Carragher (U Edinburgh)
- Clemons (Broad Comp. Chem. Bio)
- de Boer (Maastricht U)
- Frey (U Toronto)
- Horvath (Hungarian Acad of Sciences)
- Huber (EMBL Heidelberg)
- Jaensch (Janssen)
- Jaffe (Broad Comp. Proteomics)
- Jones (Harvard)
- Linington (Simon Fraser U)
- Pelkmans (U Zurich)
- Qiu (Georgia Tech)
- Ross (Novartis High Throughput Biol.)
- Rees (Swansea U)
- Subramanian (Broad CMap)
- Sundaramurthy (Nat. Center for Biol. Sciences)