Skip to content

Latest commit

 

History

History
49 lines (33 loc) · 2.3 KB

README.md

File metadata and controls

49 lines (33 loc) · 2.3 KB

Collecting and Identification the Outbreak Cluster

The goal of Collecting and Identification the Outbreak Cluster caIRA is to find the cluster based tree and metadata of the data. The package is based on the paper:

Ragonnet-Cronin, M., Hodcroft, E., Hué, S. et al. Automated analysis of phylogenetic clusters. BMC Bioinformatics 14, 317 (2013). https://doi.org/10.1186/1471-2105-14-317

Hall M, Woolhouse M, Rambaut A (2015) Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set. PLoS Comput Biol 11(12): e1004613. https://doi.org/10.1371/journal.pcbi.1004613

This package is the part of Dhihram Tenrisau, MSc Health Data Science summer project, 'Phylodynamic of Norovirus in UK 2003-2023'. The project is supervised by Stéphane Hué

# install.packages("devtools")
devtools::install_github("Dhihram/caIRA")

his package needs the additional package tidyverse, ape, treeio, and dplyr

library(tidyverse)
library(ape)
library(dplyr)
library(treeio)
library(caIRA)

You need the 2 data in this file, the first data is tree files (newick or nexus) and metadata file. The metadata file consists of label, location, and date columns. The label column must be same with the label in tree file.

This package with genclus will utilize:

  1. Finding and clustering the monophylectic groups in the tree
  2. Add the parameter of the clusters: bootstrap_treshold, data_range, and samearea
  3. Keep the maximum monophylectic groups in the cluster identify

the bootstrap_treshold is the minimum bootstrap value to be considered as a cluster. The data_range is the range of the days to be considered as a cluster. The samearea is the boolean value to consider the same area as a cluster.

res <- genclus(tree, metat, bootstrap_threshold = 80, date_range = 30, samearea = TRUE)

This package also has the function beastclus to find the cluster based on the BEAST tree. The function will utilize the genclus function with the additional parameter of post_threshold to find the cluster based on the posterior probability.

res <- beastclus(beast_tree, metadata, post_threshold = 0.50, date_range = 90, samearea = TRUE)

For the manual, you can see here