Skip to content
mbarbini edited this page Aug 8, 2019 · 15 revisions

Table of Contents

  1. Dependencies
  2. How to source microbial data
  3. How to visualize data

Dependencies

Jupyter, RCurl, rjson, IRKernal, pheatmap, ggplot2, RColorBrewer, XML, foreach, parallel, doParallel, data.table, utils, rlist, crul, jsonlite, R.utils, rvest, colorspace, recommenderlab, RAM

Install dependencies using install.packages(c(Jupyter, RCurl, rjson, IRKernal, pheatmap, ggplot2, RColorBrewer, XML, foreach, parallel, doParallel, data.table, utils, rlist, crul, jsonlite, R.utils, rvest, colorspace, recommenderlab, RAM))

How to source microbial data

Files

BacDiveApiCrawler.R
CleanProTrait.R
CombineData.R
ParseIJSEM.R
BacMapCrawler.R
CreateDataTable.R
SourceMicrobialData.ipynb

Functions


BacDiveCrawler

Description

BacDiveCrawler() retrieves information from the BacDive API, organizing it into a formatted table

Usage

BacDiveCrawler(usrname, pass, num_requests = 10, save_file = TRUE)

Arguments

usrname the username for a verified BacDive account

pass the password for a corresponding BacDive account

num_requests the number of bacterial entries to asynchronously download

save_file if true, saves a .csv to the working directory containing the information extracted from the BacDive API

Details

Designed to traverse the API provided by BacDive. The BacDive API provides a database that can easily queried, providing microbial physiology data in the JSON format. Each specie contains its own ‘page’, which details information such as taxonomy, morphology, strain information, and more. This script currently selectively chooses certain traits to record, meaning that there is more data that could be chosen to extracted, if implemented.

Value

Returns a data.frame containing information extracted from BacDive

Warning

Because this traverses the site’s API, it is still limited by internet speeds and the rate at which the site’s server responds. This can be detrimental to the speed at which the script can run. In addition, if the number of bacterial entries requested is too high, it may be too demanding for BacDive's server.

CleanProTrait

Description

CleanProTrait() retrieves information from a file downloaded from the ProTrait Atlas, formatting it into a table

Usage

CleanProTrait(save_file = TRUE)

Arguments

save_file if true, saves a .csv to the working directory containing the information extracted from the ProTrait Atlas

Details

Designed to extract information from a table created by ProTrait. It lacks a format that generalizes traits, instead listing each type of trait (gram-positive, pathogenic in animals, aerobe, etc) as its column. Therefore, this script organizes this table into generalized traits, providing for an easy way to use this table for purposes such as annotation. It will first check if the ProTrait file already exists in the working directory. If it does not, it will download the file to the working directory and start formatting it.

Value

Returns a data.frame containing information extracted from ProTrait

parse.ijsem

Description

parse.ijsem() is a method for parsing the International Journal of Systematic and Evolutionary Microbiology database contains phenotypic information about microbes

Usage

parse.ijsem()

Details

If a local copy of the raw IJSEM datable exists locally, this function will look for it first. Otherwise, it will download a copy locally and then begin parsing it

Value

Returns a data.frame containing information extracted from the metadata and saves a .csv locally

BacMapCrawler


CombineData

Description

CombineData() combines the given tables into a single, formatted table

Usage

CombineData(protrait, bacdive, save_file = TRUE)

Arguments

protrait a data.frame containing the traits sourced from the ProTrait Atlas

bacdive a data.frame containing the traits sources from the BacDive database

save_file if true, saves a .csv to the working directory containing the information resulting from the combined table

Details

CombineData.R is a script that merges the tables extracted from BacDive and ProTrait. This script is required because the column labels produced for each of these tables are different and there are different traits extracted in general. This script works to create one cohesive table.

Value

Returns a data.frame containing a information from the combined tables

Warning

This script lacks the functionality of being able to merge any two given tables. This therefore leaves it limited to the tables produced by the ProTrait and BacDive functions. It also does not yet handle duplicate species.

Data Visualization

Files

HeatMap.R

Functions


load.abundance.data

Description

load.abundance.data() is a method for loading abundance table in .csv files in the appropriate format for use with the heat map creating functions

Usage

load.abundance.data(path, column = 1)

Arguments

path the path from the working directory to the .csv file containing the abundance table

column the column number containing the feature names

Details

The abundance table needs to be loaded into R in such a way that the row names are the feature names, the sample names are the column names, and all its values are numerics.

Value

Returns a numerical matrix created from the abundance table

load.meta.data

Description

load.meta.data() is a method for loading metadata in .csv files in the appropriate format for use with the heat map creating functions

Usage

load.meta.data(path, tax_column = 1)

Arguments

path the path from the working directory to the .csv file containing the metadata

tax_column the column number containing the taxonomical or sample (ie identifying) name for the metadata

Details

This can be used to load feature or sample metadata. Metadata needs to be loaded in such a way that the row names are the identifying names and the traits are the column names.

Value

Returns a data.frame containing information extracted from the metadata

Warning

This will eliminate all duplicate entries from the metadata without merging their data resulting in potential data loss.

create.correlogram

Description

create.correlogram() creates a heat map based on the correlation of features given an abundance table and feature metadata.

Usage

create.correlogram(data, feature_meta, show = TRUE)

Arguments

data abundance data in a numerical matrix

feature_meta a data.frame containing feature metadata

show if true, will display the graph upon completion

Details

The features need to be the rows of the abundance data.

Value

Returns a pheatmap with the following components: row hclusters, column hclusters, kmeans, and gtable

create.heatmap

Description

create.heatmap() creates a heat map based on relative abundance, with row and column dendrograms based on given metadata

Usage

create.heatmap(data, sample_meta, feature_meta, percentile = 0.75, show = FALSE, omit_na = TRUE)

Arguments

data abundance data in a numerical matrix

sample_meta a data.frame containing sample metadata

feature_meta a data.frame containing feature metadata

percentile a filter for displaying only entries with a threshold correlation

show if true, will display the graph upon completion

omit_na whether to eliminate entries that are missing meta data

Details

The features need to be the rows of the abundance data.

Value

Returns a pheatmap with the following components: row hclusters, column hclusters, kmeans, and gtable

one.v.all

Description

one.v.all() uses the create.heatmap function, but filters the metadata such that it labels only a single feature category and type, labeling all others as 'other'

Usage

one.v.all(data, sample_meta, feature_meta, which = 2, percentile = 0.75, show = FALSE, column, trait)

Arguments

data abundance data in a numerical matrix

sample_meta a data.frame containing sample metadata

feature_meta a data.frame containing feature metadata

which a number representing whether to filter the sample(1) or feature(2) metadata

percentile a filter for displaying only entries with a threshold correlation

show if true, will display the graph upon completion

column the column number with the feature category

trait the specific feature type to use

Details

Compare only one feature type against all others in a feature category (ex. aerobic respiration v all other oxygen requirements). The features need to be the rows of the abundance data. Can supply any number of feature categories, but only one will be used.

Value

Returns a pheatmap with the following components: row hclusters, column hclusters, kmeans, and gtable

all.one.v.all

Description

all.one.v.all() uses the one.v.all function, creates a heatmap for every feature type found

Usage

all.one.v.all <- function(data, sample_meta, feature_meta, which = 2, percentile = 0.75, show = FALSE, column, directory='')

Arguments

data abundance data in a numerical matrix

sample_meta a data.frame containing sample metadata

feature_meta a data.frame containing feature metadata

which a number representing whether to filter the sample(1) or feature(2) metadata

percentile a filter for displaying only entries with a threshold correlation

show if true, will display the graph upon completion

column the column number with the feature category

directory the path from the working directory to where the file should be saved

Details

Creates a heatmap for every feature type found (ex. 3 forms of oxygen requirements). The features need to be the rows of the abundance data. Can supply any number of feature categories, but only one will be used. Will automatically name the files based on the trait

Clone this wiki locally