README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# Overview

The `archs4` package provides utility functions to query and explore the
expression profiling data made available through the
[ARCHS4 project][archs4web], which is described in the following publication:

[Massive mining of publicly available RNA-seq data from human and mouse][archs4pub]. 

Because this package requires the user to download a number of data files that
are external to the package, the [installation instructions](#installation) are
*a bit* more involved than other R packages, and we leave them for 
[the end of this document](#installation).

# Usage

After [successful installation](#installation) of this package, you can query
the series and samples included in the ARCHS4 repository, as well as materialize
the expresion data into well-known bioconductor assay containers for downstream
analysis.

To query GEO series and samples, you can use the `sample_info` function:

```{r, message=FALSE, warning=FALSE}
library(archs4)

a4 <- Archs4Repository()
ids <- c('GSE89189', 'GSE29943', "GSM1095128", "GSM1095129", "GSM1095130")
sample.info <- sample_info(a4, ids)
head(sample.info)
```

You can use the `as.DGEList` function to materialize an `edgeR::DGEList` from a
an arbitrary number of GEO sample and series identifiers. The only restriction
is that the data from the series/samples must all be from the same species.

The most often use-case will likely be to create a `DGEList` for a given study.
For instance, the GEO series identifier [`"GSE89189"`][blurtongeo] refers to the
expression data generated to support the
[Abud et al. iPSC-Derived Human Microglia-like Cells ...][blurtonpub] paper.

Creating a `DGEList` from this study will create an object with 27,024 genes
across 37 samples in about 1.5 seconds:

```{r, eval = FALSE}
yg <- as.DGEList(a4, "GSE89189", feature_type = "gene")
```

The following command retrieves the 178,135 transcript level counts for this
experiment in about 1.5 seconds, as well:

```{r, eval = FALSE}
yt <- as.DGEList(a4, "GSE89189", feature_type = "transcript")
```

# Installation

```{r child = "inst/rmdparts/installation.Rmd"}
```

[//]: # (References ===========================================================)
```{r child = "inst/rmdparts/references.Rmd"}
```