Author: Chenxin Li, Ph.D., Assistant Research Scientist at Department of Crop & Soil Sciences and Center for Applied Genetic Technologies, University of Georgia.
Contact: Chenxin.Li@uga.edu
The main goal of this repository is to empower R users such that we can produce publication quality chromatograms with R. Examples and explanations are below.
The Scripts/
directory contains .Rmd
files that generate the graphics shown below.
It requires R, RStudio, and the rmarkdown package.
- R: R Download
- RStudio: RStudio Download
- rmarkdown can be installed using the intall packages interface in RStudio
- Dependencies
- Required input
- Functions generated by the workflow
- Example output
- Real datasets
- Getting started
- Example script
- Comparison of perspectives
- Additional features
library(tidyverse)
This is a tidyverse based workflow.
The workflow requires the input data to be in the tidy format (each row is an observation, and each column is a variable).
It requires the following 3 columns:
- The column named
x
, which will be the x axis - the column named
y
, which will be the y axis - A
sample
column that indicates the sample ID of each of the traces.
Addition required values:
- a vector of sample IDs
- x_offset, default = 0.2
- y_offset, default = 0.4
- number of traces to plot
This workflow defines a 6 functions in this order:
find_xy_ranges()
takes the tidy input data frame and finds xmin, xmax, ymin, and ymax.make_grid_table()
takes the ranges produced byfind_xy_ranges()
and produce a data frame that will be used to make the coordinate system. Additionally, it requiresx_offset
andy_offset
andnumber_of_traces
.make_axis_table()
takes the ranges produced byfind_xy_ranges()
and produce a data frame that will be used to make the coordinate system.make_coord()
takes the output offind_xy_ranges()
,make_grid_table()
,make_axis_table
, to make a ggplot object that is a blank coordinate system. It also requiresx_offset
andy_offset
andnumber_of_traces
.map_sample_to_trace()
takes a vector of sample IDs and produce a data frame that maps sample IDs to traces (column of 1 to n).plot_traces()
takes the output of all the above and produce a ggplot object.
As a example, let's visualize two sine waves.
The workflow first generates a blank coordinate system, which is a ggplot object (a "grob").
- The coordinate system is definbed by x and y value ranges, as well as number of traces to graph.
- The perspective of the coordinate system is defined by
x_offset
andy_offset
.
Again, the blank coordinate is a "grob" object. We can add ggplot layers to, such as geom, scale, theme, and so on.
The trace plot in its most basic form, is the blank coordinate system + geom_line()
to plot the line traces.
This is showing two sine ways aligned along a parallelogram. This is a grob object. We can add more ggplot layers to it if needed, such as replacing the default color palette. Usually it requires some final touches to make it look nicer.
The best way to use this tool is running ggtraces.Rmd
in the same environment (same RStudio window) in a different tab.
Doing so will deposite the functions needed into the environment.
Then you can simply call the functions one-by-one.
I tried out two real datasets that are very different.
The first one is LC-MS data.
Data from Li et al., 2022
The second one is small RNA metagene (averaged gene) data.
Data from Li et al., 2020 and Li et al., 2022.
Running ggtraces_uses.Rmd
in the Scripts/
directory will generate these graphs.
This is showing the base peak chromatograms (normalized to higest peak) of two samples.
This is showing normalized coverage of 24-nt siRNAs (per 1000 24-nt siRNAs) arround transcription start sites, averaged across all genes.
- Clone the repository to your machine.
- Run
ggtraces.Rmd
underScripts/
. You will need to install the rmarkdown package. - Call each function in order.
- Make final touches (e.g., adjust axis range, axis label, color palette, and so on)
- Done!
metagene <- read_csv("../Data/metagene.csv", col_types = cols())
This is already a tidy data frame. If your data table is not in the tidy format, you'll need to re-format it first.
metagene_2 <- metagene %>%
dplyr::rename(x = `bin start`,
sample = sample_type) %>%
mutate(y = mena_pro_24 * 1000)
The workflow requires x
, y
, and sample
columns.
example3_ranges <- find_xy_ranges(metagene_2)
example3_grid_table <- make_grid_table(example3_ranges, x_offset = 200, y_offset = 150, number_traces = 5)
example3_axis_table <- make_axis_table(example3_ranges)
example3_coord <- make_coord(
grid_table = example3_grid_table,
axis_table = example3_axis_table,
ranges = example3_ranges,
number_traces = 5,
x_offset = 200,
y_offset = 150
)
example3_names <- c("sperm", "egg", "zygote", "seedling")
example3_mapping <- map_sample_to_trace(example3_names)
example3_traces <- plot_traces(
data = metagene_2,
coord = example3_coord,
mapping = example3_mapping,
x_offset = 200,
y_offset = 150,
ranges = example3_ranges,
x_title = "Position relative to TSS",
y_title = "Normalized\ncoverage",
sample_ID_title = "Cell type"
)
- You will need to provide
x_offset
,y_offset
, andnumber_of_traces
. These values differ across experiments. - You will need to provide the names of the traces. They are prodived via
example3_names <- c("sperm", "egg", "zygote", "seedling")
.
Manually adjust axis breaks, axis range, color palette, and axis title position.
Since example3_traces
is a ggplot object, we can easily make additional customizations.
example3_traces +
geom_segment(x = -Inf, xend = -Inf, y = 0, yend = 800, size = 1.1, color = "grey20") +
geom_segment(x = -3000, xend = 2000, y = -Inf, yend = -Inf, size = 1.1, color = "grey20") +
scale_color_manual(values = c("dodgerblue2", "tomato1", "violetred4", "seagreen"),
limits = example3_mapping$sample) +
scale_y_continuous(breaks = c(0, 200, 400, 600, 800)) +
theme(legend.position = "top",
axis.title.y = element_text(hjust = 0.4))
Different x_offset
and y_offset
values changes the apparence of the final product.
- High x_offset and low y_offset facilitate comparisons along y axis. It gives the sensation that we are looking at the graph from the side.
- Low x_offset and high y_offset facilitate comparisons along x axis. It gives the sensation that we are looking at the graph from the top.
Facet plot is a plot type where each line trace gets its own x and y axis.
plot_facet(LC_MS_data_2, x_title = "Retention time (min)", y_title = "Relative intensity") +
scale_color_manual(values = brewer.pal(8, "Set2")[c(1,4)])
The plot_facet()
function requires the tidy data frame as input. x_title
and y_title
are optional.
Defaults are "x" and "y", respectively.
Pherogram is short for electropherogram, where we imagine the traces are moving down a gel. The original y value is now represented as color intensity in the heat map.
plot_pherogram(data = metagene_2,
y_title = "Position relative to TSS",
legend_title = "Normalized\ncoverage",
mapping = example3_mapping)
The plot_pherogram()
function requires the tidy data frame as input.
y_title
argument controls the y axis title (default = "x"), since it was the x value in the original line traces.
legend_title
argument controls the title of the color scale (default = "y"), since it was the y value in the origal line traces.