-
Notifications
You must be signed in to change notification settings - Fork 16
/
index.Rmd
83 lines (54 loc) · 4.91 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
output: html_document
bibliography: "bibtexlib.bib"
---
```{r setup, include=FALSE}
source("style.R")
```
# Analysis of Microbiome Community Data in R
Welcome! This primer provides a concise introduction to conducting applied analyses of microbiome data in R.
While this primer does not require extensive knowledge of programming in R, the user is expected to install R and all packages required for this primer.
**Please install the required [software](00--required_software.html) and download the [example data](00--required_data.html) before coming to the workshop.**
## What is covered
This primer provides a concise introduction to conducting the statistical analyses and visualize microbiome data in R based on metabarcoding and high throughput sequencing (HTS).
This primer does not cover "shotgun" metagenomic analysis, which is very different in nature.
The reader is expected to have a very basic understanding of ecological diversity theory and some experience with R.
The techniques presented here assume the raw sequences have been converted to `r gloss$add('exact sequence variants (ESVs)')` or `r gloss$add('operational Taxonomic Units (OTUs)')` and `r gloss$add('Taxonomic classifications', shown = 'classified')` (i.e., assigned a taxonomy) using tools such as QIIME, mothur, or dada2 [@schloss2009introducing; @caporaso2010qiime; @callahan2016dada2].
## Why use R?
R is an open source (free) statistical programming and graphing language that includes
tools for analysis of statistical, ecological diversity and community data, among many other things.
R provides a cohesive environment to analyze data using modular "toolboxes" called `r gloss$add('R package', shown = 'R packages')`.
R runs on all major operating systems including Microsoft Windows, Linux (e.g., Ubuntu), and Apple's OS X.
The general type of analyses done in this workshop could be done in python, Perl, or using command line tools.
We like R for the following reasons:
* We use it (i.e., we are biased).
* R packages are easy to install and not too hard to make.
* The R community is very active and growing. Packages are updated frequently.
* Repositories such as `r gloss$add('the Comprehensive R Archive Network (CRAN)')` and Bioconductor provide some quality control of packages and make them easy to install.
* `r gloss$add('RStudio')` is a great, free graphics user interface.
* R Markdown is well supported, allowing R code to be embedded in documents and output to diverse formats. This website is the output of a set of R Markdown documents. Thus, R markdown can be used as an electronic notebook facilitating for reproducible research [link](http://grunwaldlab.github.io/Reproducible-science-in-R/).
* R has strong graphing and statistical capabilities.
* You can produce publication ready graphics
* R is designed to be an interactive language, providing a natural fit for statistical analyses rather than writing programs.
## The kind of data used in this workshop
This workshop will not start with the raw reads, since the first steps in a metabarcoding workflow are typically done using command line tools such as QIIME or mothur (dada2 is an exception) in the cloud.
Data that can be analysed using techniques presented here is typically the result of the following steps [@comeau2017microbiome]:
1. Sample environments/soil/tissue/water and extract DNA.
1. Perform PCR using standard primers.
1. Sequence using a high-throughput sequencing platform such as the Illumina MiSeq.
1. Call OTUs/ESVs and assign a taxonomic classification by comparing them to a reference database, such as Greengenes.
1. Construct an abundance matrix of read counts for each OTU in each sample.
Here we focus on the statistical analysis and visualizations following OTU calling that include:
- Reading files into R
- Manipulating tabular and taxonomic data
- Heat trees [@foster2017metacoder], stacked bar charts and related visualizations
- Alpha and beta diversity
- Ordination methods
## Help us improve this resource
We hope you enjoy this primer.
Please provide us feedback on any errors you might find or suggestions for improvement.
## Citing this primer
Please cite this primer if you find it useful for your research as: ZSL Foster and NJ Grünwald. 2018. Analysis of Microbiome Community Data in R. DOI: XXX.
*Niklaus J. Grünwald* <a href="https://orcid.org/0000-0003-1656-7602" target="orcid.widget" rel="noopener noreferrer" style="vertical-align:top;"><img src="https://orcid.org/sites/default/files/images/orcid_16x16.png" style="width:1em;margin-right:.5em;" alt="ORCID iD icon">orcid.org/0000-0003-1656-7602</a> and *Zach S. L. Foster* <a href="https://orcid.org/0000-0002-5075-0948" target="orcid.widget" rel="noopener noreferrer" style="vertical-align:top;"><img src="https://orcid.org/sites/default/files/images/orcid_16x16.png" style="width:1em;margin-right:.5em;" alt="ORCID iD icon">orcid.org/0000-0002-5075-0948</a>
© 2018, Corvallis, Oregon, USA
## References