czi-hca-comp-tools · LuyiTian · Apr 27, 2018 · May 1, 2018
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ Easy access to a small collection of benchmark datasets for methods development.
 
 <p align="center">
 	<a href="benchmarks.md">Why do we need data benchmarks?</a>&nbsp;&nbsp;&nbsp;
-	<a href="contributing.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
+	<a href="CONTRIBUTING.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
 </p>
 
 ## Contents
@@ -25,7 +25,8 @@ Instructions for downloading and loading each dataset are in text files in the `
 - [Tabula Muris](datasets/tabula_muris.md) - 20 different mouse organs, both full transcript (SmartSeq2) and UMI-based droplet counting (10x Genomics). [Code repo](https://github.com/czbiohub/tabula-muris) | [Vignette repo](https://github.com/czbiohub/tabula-muris-vignettes) | [Interactive website](http://tabula-muris.ds.czbiohub.org/) | [Download instructions](datasets/tabula_muris.md) #mouse #aorta #bladder #brain #diaphragm #fat #heart #kidney #large_intestine #muscle #liver #lung #mammary_gland #marrow #pancreas #skin #spleen #thymus #tongue #rnaseq #smartseq2 #10x #umi #droplet
 - [Human cortex development](datasets/ucsc_human_cortex.md) - 4000 SmartSeq2 cells from different locations of the developing human fetus.
 - [Conquer](datasets/conquer.md) - [38 datasets](http://imlspenticton.uzh.ch:3838/conquer/) summarized to a count level available as `R` `MultiAssayExperiment` objects.
-- [CellBench pilot data set](https://github.com/LuyiTian/CellBench_data/blob/master/cellbench.md) - mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms.
+- [CellBench pilot data set](https://github.com/LuyiTian/CellBench_data/blob/master/cellbench.md) - mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms. #benchmark #celseq2 #dropseq #10x #cellline
+- [HCA preview dataset gene counting matrix](datasets/HCA_preview_scPipe.md) - HCA preview data gene counting matrix. The gene counting matrix was generated from fastq by [scPipe](https://bioconductor.org/packages/release/bioc/html/scPipe.html). #hca #scPipe
 
 ### Imaging
 

diff --git a/datasets/HCA_preview_scPipe.md b/datasets/HCA_preview_scPipe.md
@@ -0,0 +1,20 @@
+# HCA_Previewdata
+Mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms: celseq2, dropseq, and 10x.
+
+Data was downloaded from [HCA data portal](https://preview.data.humancellatlas.org/) and processed in [this repo](https://github.com/LuyiTian/HCA_Previewdata) using scPipe.
+
+## Metadata
+
+Metadata is stored in [HCA data portal](https://preview.data.humancellatlas.org/) and can be downloaded directly [here](https://preview.data.humancellatlas.org/datasets/melanoma/hca-metadata-melanoma.xlsx).
+
+## Count files for R
+
+You can find SingleCellExperiment object for the dataset, either the [raw data](https://github.com/LuyiTian/HCA_Previewdata/blob/master/rdata/ischaemic_sensitivity_raw.RData?raw=true) or [processed data](https://github.com/LuyiTian/HCA_Previewdata/blob/master/rdata/ischaemic_sensitivity_QC_norm.RData) after quality control and normalization are in the `rdata` folder.
+
+## Exploratory data analysis
+
+Rmd document can be found in `script` folder.
+
+## CSV and MTX files
+
+You can find gene count matrix in `data/<dataset_name>/gene_count.csv.zip`. Quality control metrics generated by scPipe during data preprocessing can be found in `data/<dataset_name>/stat`