Skip to content

kevinsblake/PathotypeR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PathotypeR

DEC pathotype assignment of E. coli genomes in R.

Diagnostic microbiology has developed several schemes to subtype Escherichia coli. These are useful for understanding the epidemiology and pathogenesis of particular strains. Diarrheagenic E. coli (DEC) pathotypes group E. coli strains which possess similar virulence factors (VF) and cause diseases with similar pathology. Whole-genome sequencing can predict the presence of VF genes in an E. coli isolate with high accuracy, permitting the assignment of DEC pathotypes without additional molecular, biochemical, or phenotypic assays.

PathotypeR assigns E. coli genomes a DEC pathotype based on the presence/absence of specific VF genes. Inputs the output of AMRFinderPlus.

PathotypeR includes two functions:

  1. pathotypeR(): Function that quantifies each samples' VFs and assigns a DEC pathotype. Can also return total VF count per sample, VF presence/absence, the prevalence of each pathotype, and the prevalence of each VF.
  2. amrfinder_process(): Merges AMRFinderPlus output files into a single dataframe. (Called by pathotypeR() but can be used on own.)

Genomes are assigned a DEC pathotype based on the presence/absence of specific VF genes. Namely:

  • Shiga toxin-producing E. coli (STEC): stx1 and/or stx2 (without eae)
  • Enteropathogenic E. coli (EPEC): eae and/or bfpA (without stx1 and/or stx2)
  • Enterohaemorrhagic E. coli (EHEC): stx1 and/or stx2, and eae
  • Enteroinvasive E. coli (EIEC): ipaH
  • Enterotoxigenic E. coli (ETEC): ltcA and/or sta1
  • Enteroaggregative E. coli (EAEC): aatA and/or aaiC and/or aggR
  • Diffusely adherent E. coli (DAEC): afaC and/or afaE
  • none: does not encode any of the above VF genes

Hybrid strains contain genes associated with multiple DEC pathotypes and will be reported as all pathotypes detected (e.g., STEC-EAEC, EPEC-ETEC).

NOTE: PathotypeR does NOT assign pathotypes based on collection site or association with disease, such as: extraintesinal pathogenic E. coli (ExPEC), uropathogenic E. coli (UPEC), neonatal meningitis-associated E. coli (NMEC), and sepsis-associated E. coli (SEPEC).

Content

Installation

Pre-process

Functions

References

Install

Install directly from GitHub:

source("https://raw.github.com/kevinsblake/PathotypeR/main/pathotype.R")

Alternatively, can download and then install using the filepath:

source("dir1/dir2/pathotype.R")

Pre-processing

E. coli genomes of interest must first be run through AMRFinderPlus. See their instructions for recommended usage.

The AMRFinderPlus output must be saved as a .tsv file. This can be done using the output flag: -o ${outdir}/${sample}.tsv. Copy all of these output files into one directory. The filepath of this directory will be the input for PathotypeR.

Functions

pathotypeR()

Description

Function for assigning DEC pathotype to E. coli genomes. First calls amrfinder_process().

Usage

library(dplyr)

pathotypeR(indir, output=c("patho_pred", "patho_prev", "vf_pres", "vf_prev"))

Arguments

indir Filepath to directory containing AMRFinderPlus output files.

output Specifies output. patho_pred = for each sample, outputs VF count and pathotype prediction; patho_prev = for each pathotype, outputs count (i.e. number of samples) and overall prevalence; vf_pres = for each sample, outputs VF presence/absence (1=present, 0=absent); vf_prev = for each VF, outputs count and overall prevalence. Default is patho_pred.

Examples

# Outputs just sample names, VF count, and pathotype prediction
df <- pathotypeR("data/amrfinder")

# Outputs all VFs detected, count, and overall prevalence
df <- pathotypeR("data/amrfinder", output="vf_prev")

amrfinder_process()

Description

Function for merging AMRFinderPlus output files into a single dataframe.

Usage

amrfinder_process(indir)

Arguments

indir Filepath to directory containing AMRFinderPlus output files. suffix Specifies the suffix added to the amrfinder output filename. The filename minus this suffix should be the same as the Name column in the amrfinder output. Default = ".tsv"

Examples

df <- amrfinder_process("data/amrfinder")

References

  • Horesh et al. A comprehensive and high-quality collection of Escherichia coli genomes and their genes. Microb Genom. 2021 Feb;7(2):000499. doi: 10.1099/mgen.0.000499. PMID: 33417534.
  • Jesser & Levy. Updates on defining and detecting diarrheagenic Escherichia coli pathotypes. Curr Opin Infect Dis. 2020 Oct; 33(5): 372–380. doi: 10.1097/QCO.0000000000000665. PMID: 32773499.
  • Robins-Browne et al. Are Escherichia coli pathotypes still relevant in the era of whole-genome sequencing? 2016 Nov 18;6:141. doi: 10.3389/fcimb.2016.00141. eCollection 2016.PMID: 27917373.

About

E. coli DEC pathotype assignment in R

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages