forked from denalitherapeutics/archs4
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
76 lines (56 loc) · 2.41 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# Overview
The `archs4` package provides utility functions to query and explore the
expression profiling data made available through the
[ARCHS4 project][archs4web], which is described in the following publication:
[Massive mining of publicly available RNA-seq data from human and mouse][archs4pub].
Because this package requires the user to download a number of data files that
are external to the package, the [installation instructions](#installation) are
*a bit* more involved than other R packages, and we leave them for
[the end of this document](#installation).
# Usage
After [successful installation](#installation) of this package, you can query
the series and samples included in the ARCHS4 repository, as well as materialize
the expresion data into well-known bioconductor assay containers for downstream
analysis.
To query GEO series and samples, you can use the `sample_info` function:
```{r, message=FALSE, warning=FALSE}
library(archs4)
a4 <- Archs4Repository()
ids <- c('GSE89189', 'GSE29943', "GSM1095128", "GSM1095129", "GSM1095130")
sample.info <- sample_info(a4, ids)
head(sample.info)
```
You can use the `as.DGEList` function to materialize an `edgeR::DGEList` from a
an arbitrary number of GEO sample and series identifiers. The only restriction
is that the data from the series/samples must all be from the same species.
The most often use-case will likely be to create a `DGEList` for a given study.
For instance, the GEO series identifier [`"GSE89189"`][blurtongeo] refers to the
expression data generated to support the
[Abud et al. iPSC-Derived Human Microglia-like Cells ...][blurtonpub] paper.
Creating a `DGEList` from this study will create an object with 27,024 genes
across 37 samples in about 1.5 seconds:
```{r, eval = FALSE}
yg <- as.DGEList(a4, "GSE89189", feature_type = "gene")
```
The following command retrieves the 178,135 transcript level counts for this
experiment in about 1.5 seconds, as well:
```{r, eval = FALSE}
yt <- as.DGEList(a4, "GSE89189", feature_type = "transcript")
```
# Installation
```{r child = "inst/rmdparts/installation.Rmd"}
```
[//]: # (References ===========================================================)
```{r child = "inst/rmdparts/references.Rmd"}
```