Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HCA preview data gene counting matrix #23

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Easy access to a small collection of benchmark datasets for methods development.

<p align="center">
<a href="benchmarks.md">Why do we need data benchmarks?</a>&nbsp;&nbsp;&nbsp;
<a href="contributing.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
<a href="CONTRIBUTING.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
</p>

## Contents
Expand All @@ -25,7 +25,8 @@ Instructions for downloading and loading each dataset are in text files in the `
- [Tabula Muris](datasets/tabula_muris.md) - 20 different mouse organs, both full transcript (SmartSeq2) and UMI-based droplet counting (10x Genomics). [Code repo](https://github.com/czbiohub/tabula-muris) | [Vignette repo](https://github.com/czbiohub/tabula-muris-vignettes) | [Interactive website](http://tabula-muris.ds.czbiohub.org/) | [Download instructions](datasets/tabula_muris.md) #mouse #aorta #bladder #brain #diaphragm #fat #heart #kidney #large_intestine #muscle #liver #lung #mammary_gland #marrow #pancreas #skin #spleen #thymus #tongue #rnaseq #smartseq2 #10x #umi #droplet
- [Human cortex development](datasets/ucsc_human_cortex.md) - 4000 SmartSeq2 cells from different locations of the developing human fetus.
- [Conquer](datasets/conquer.md) - [38 datasets](http://imlspenticton.uzh.ch:3838/conquer/) summarized to a count level available as `R` `MultiAssayExperiment` objects.
- [CellBench pilot data set](https://github.com/LuyiTian/CellBench_data/blob/master/cellbench.md) - mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms.
- [CellBench pilot data set](https://github.com/LuyiTian/CellBench_data/blob/master/cellbench.md) - mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms. #benchmark #celseq2 #dropseq #10x #cellline
- [HCA preview dataset gene counting matrix](datasets/HCA_preview_scPipe.md) - HCA preview data gene counting matrix. The gene counting matrix was generated from fastq by [scPipe](https://bioconductor.org/packages/release/bioc/html/scPipe.html). #hca #scPipe

### Imaging

Expand Down
20 changes: 20 additions & 0 deletions datasets/HCA_preview_scPipe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# HCA_Previewdata
Mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms: celseq2, dropseq, and 10x.

Data was downloaded from [HCA data portal](https://preview.data.humancellatlas.org/) and processed in [this repo](https://github.com/LuyiTian/HCA_Previewdata) using scPipe.

## Metadata

Metadata is stored in [HCA data portal](https://preview.data.humancellatlas.org/) and can be downloaded directly [here](https://preview.data.humancellatlas.org/datasets/melanoma/hca-metadata-melanoma.xlsx).

## Count files for R

You can find SingleCellExperiment object for the dataset, either the [raw data](https://github.com/LuyiTian/HCA_Previewdata/blob/master/rdata/ischaemic_sensitivity_raw.RData?raw=true) or [processed data](https://github.com/LuyiTian/HCA_Previewdata/blob/master/rdata/ischaemic_sensitivity_QC_norm.RData) after quality control and normalization are in the `rdata` folder.

## Exploratory data analysis

Rmd document can be found in `script` folder.

## CSV and MTX files

You can find gene count matrix in `data/<dataset_name>/gene_count.csv.zip`. Quality control metrics generated by scPipe during data preprocessing can be found in `data/<dataset_name>/stat`