forked from EduardoArle/big_data_biogeography
-
Notifications
You must be signed in to change notification settings - Fork 0
/
wed_diversification_rates_geosse.Rmd
135 lines (102 loc) · 3.98 KB
/
wed_diversification_rates_geosse.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
output: html_document
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=12, fig.height=8, eval = FALSE,
echo=TRUE, warning=FALSE, message=FALSE,
tidy = TRUE, collapse = TRUE,
results = 'hold')
```
# Joint Bayesian estimation of speciation, dispersal and extinction rates: the GeoSSE model
The GeoSSE model jointly infers area-specific speciation rates, dispersal rates and local extinction rates [(Goldberg et al. 2011)](https://academic.oup.com/sysbio/article/60/4/451/1611105). The `mcmc-SSE` program implements a Bayesian algorithm to infer these rates [(Silvestro et al. 2014)](https://onlinelibrary.wiley.com/doi/full/10.1111/evo.12236).
The analysis requires a tree file, a table with the geographic ranges (species with no geographic data or not present in the tree will be automatically removed).
# Load dependent libraries:
```{r}
library("diversitree")
library("picante")
library("phytools")
```
# Load mcmc-SSE library
You will find the mcmc-SSE-lib file in the "example_file" dropbox folder.
```{r}
source("my_path/mcmc-SSE-lib.R")
```
# Prep data
Since the GoeSSE model only allows for 2 areas the input data might need to be modified by merging some of the areas.
Define which areas should be merged
```{r}
data_SPGC <- read.csv("example_data/bombacoideae_biome_classification_details.csv", row.names = 1)
columns_areaA = c(1, 3)
columns_areaB = c(2, 4, 5)
geosse_areas <- comb_areas(data_SPGC, columns_areaA, columns_areaB)
write.table(geosse_areas, file="example_data/traitGeoSSE_owndataconverted.txt", sep="\t", row.names=F, quote=F)
```
# Setup an analysis
Set working directory:
```{r}
setwd("my_path/example_files/")
```
Define tree object (can contain multiple trees, i.e. class "multiPhylo")
```{r}
tree <- read.tree("bombacoideae_phylogeny.tre")
```
Define geographic ranges. Geographic ranges are defined as 0 (present in area A), 1 (present in area B), 2 (present in area AB)
```{r}
range.data <- read.table(file="traitGeoSSE_owndataconverted.txt", h=T, row.names=1)
```
State-specific taxon sampling (A, B, AB)
```{r}
sampling_fractions = c(0.7,0.8,1)
```
Define output file name
```{r}
mcmc.log = "bombacoideae_geosse.log"
```
Run the analysis (you can increase the number of iterations to run a longer analysis and improve convergence)
```{r}
run_mcmc_SSE(tree,
range.data,
model = "geosse",
outfile = mcmc.log,
iterations = 10000,
rho = sampling_fractions)
```
# Process and plot output
Read output file
```{r}
post = read.table(mcmc.log, header=T)
```
Plot trace of sampled likelihoods
```{r}
plot(post$likelihood, type="l")
```
Plot speciation, dispersal, and extinction rates per area:
```{r}
par(mfrow=c(1,3))
boxplot(post[,c("sA","sB","sAB")], col = "blue", main = "speciation rates")
boxplot(post[,c("dA","dB" )], col = "gray", main = "dispersal rates")
boxplot(post[,c("xA","xB" )], col = "red",main = "extinction rates")
```
Are dispersals rates significantly different?
```{r}
delta_dispersal = post$dA - post$dB
hist(delta_dispersal)
```
Calculate the posterior probability of `dAB > dBA`:
```{r}
length(delta_dispersal[delta_dispersal>0]) / length(delta_dispersal)
```
Are extinction rates significantly different?
```{r}
delta_extinction = post$xA - post$xB
hist(delta_extinction)
```
Calculate the posterior probability of `xA > xB`:
```{r}
length(delta_extinction[delta_extinction>0]) / length(delta_extinction)
```
# Run on multiple trees
If provided with a tree object that includes multiple trees (class "multiPhylo") the MCMC can run sequentially on different trees to account for phylogenetic uncertainty. The `burnin` defines how many initial MCMC iterations should be discarded (i.e. not saved in the log file) for each tree and `nTREES` defines on how many trees the analysis is run.
```{r}
run_mcmc_SSE(trees, range.data, model= "geosse", outfile = "geosse.log", iterations = 10000, nTREES=5, burnin=100)
```