ggtraces - A tidyverse and grammar of graphics powered line traces visualizer

v.1.0.0 release:

Author: Chenxin Li, Ph.D., Assistant Research Scientist at Department of Crop & Soil Sciences and Center for Applied Genetic Technologies, University of Georgia.

Contact: Chenxin.Li@uga.edu

The main goal of this repository is to empower R users such that we can produce publication quality chromatograms with R. Examples and explanations are below.

The Scripts/ directory contains .Rmd files that generate the graphics shown below. It requires R, RStudio, and the rmarkdown package.

R: R Download
RStudio: RStudio Download
rmarkdown can be installed using the intall packages interface in RStudio

Dependencies

library(tidyverse)

This is a tidyverse based workflow.

Required input

The workflow requires the input data to be in the tidy format (each row is an observation, and each column is a variable).

It requires the following 3 columns:

The column named x, which will be the x axis
the column named y, which will be the y axis
A sample column that indicates the sample ID of each of the traces.

Addition required values:

a vector of sample IDs
x_offset, default = 0.2
y_offset, default = 0.4
number of traces to plot

Functions defined by the workflow

This workflow defines a 6 functions in this order:

find_xy_ranges() takes the tidy input data frame and finds xmin, xmax, ymin, and ymax.
make_grid_table() takes the ranges produced by find_xy_ranges() and produce a data frame that will be used to make the coordinate system. Additionally, it requires x_offset and y_offset and number_of_traces.
make_axis_table() takes the ranges produced by find_xy_ranges() and produce a data frame that will be used to make the coordinate system.
make_coord() takes the output of find_xy_ranges(), make_grid_table(), make_axis_table, to make a ggplot object that is a blank coordinate system. It also requires x_offset and y_offset and number_of_traces.
map_sample_to_trace() takes a vector of sample IDs and produce a data frame that maps sample IDs to traces (column of 1 to n).
plot_traces() takes the output of all the above and produce a ggplot object.

Example output

As a example, let's visualize two sine waves.

The workflow first generates a blank coordinate system, which is a ggplot object (a "grob").

The coordinate system is definbed by x and y value ranges, as well as number of traces to graph.
The perspective of the coordinate system is defined by x_offset and y_offset.

Again, the blank coordinate is a "grob" object. We can add ggplot layers to, such as geom, scale, theme, and so on.

The trace plot in its most basic form, is the blank coordinate system + geom_line() to plot the line traces.

This is showing two sine ways aligned along a parallelogram. This is a grob object. We can add more ggplot layers to it if needed, such as replacing the default color palette. Usually it requires some final touches to make it look nicer.

Real datasets

The best way to use this tool is running ggtraces.Rmd in the same environment (same RStudio window) in a different tab. Doing so will deposite the functions needed into the environment. Then you can simply call the functions one-by-one.

I tried out two real datasets that are very different. The first one is LC-MS data. Data from Li et al., 2022
The second one is small RNA metagene (averaged gene) data. Data from Li et al., 2020 and Li et al., 2022.

Running ggtraces_uses.Rmd in the Scripts/ directory will generate these graphs.

LC-MS data

This is showing the base peak chromatograms (normalized to higest peak) of two samples.

Metagene data

This is showing normalized coverage of 24-nt siRNAs (per 1000 24-nt siRNAs) arround transcription start sites, averaged across all genes.

Getting started

Clone the repository to your machine.
Run ggtraces.Rmd under Scripts/. You will need to install the rmarkdown package.
Call each function in order.
Make final touches (e.g., adjust axis range, axis label, color palette, and so on)
Done!

Example script

Load data

metagene <- read_csv("../Data/metagene.csv", col_types = cols())

This is already a tidy data frame. If your data table is not in the tidy format, you'll need to re-format it first.

Rename columns

metagene_2 <- metagene %>% 
  dplyr::rename(x = `bin start`,
                sample = sample_type) %>% 
  mutate(y = mena_pro_24 * 1000)

The workflow requires x, y, and sample columns.

Run ggtrace functions one by one

example3_ranges <- find_xy_ranges(metagene_2)
example3_grid_table <- make_grid_table(example3_ranges, x_offset = 200, y_offset = 150, number_traces = 5)
example3_axis_table <- make_axis_table(example3_ranges)

example3_coord <- make_coord(
  grid_table = example3_grid_table, 
  axis_table = example3_axis_table,
  ranges = example3_ranges,
  number_traces = 5,
  x_offset = 200,
  y_offset = 150
)

example3_names <- c("sperm", "egg", "zygote", "seedling")
example3_mapping <- map_sample_to_trace(example3_names)

example3_traces <- plot_traces(
  data = metagene_2,
  coord = example3_coord,
  mapping = example3_mapping,
  x_offset = 200,
  y_offset = 150,
  ranges = example3_ranges,
  x_title = "Position relative to TSS",
  y_title = "Normalized\ncoverage",
  sample_ID_title = "Cell type"
)

You will need to provide x_offset, y_offset, and number_of_traces. These values differ across experiments.
You will need to provide the names of the traces. They are prodived via example3_names <- c("sperm", "egg", "zygote", "seedling").

Final touches

Manually adjust axis breaks, axis range, color palette, and axis title position. Since example3_traces is a ggplot object, we can easily make additional customizations.

example3_traces +
  geom_segment(x = -Inf, xend = -Inf, y = 0, yend = 800, size = 1.1, color = "grey20") +
  geom_segment(x = -3000, xend = 2000, y = -Inf, yend = -Inf, size = 1.1, color = "grey20") +
  scale_color_manual(values = c("dodgerblue2", "tomato1", "violetred4", "seagreen"),
                     limits = example3_mapping$sample) +
  scale_y_continuous(breaks = c(0, 200, 400, 600, 800)) +
  theme(legend.position = "top",
        axis.title.y = element_text(hjust = 0.4))

Done!

Comparison of perspectives

Different x_offset and y_offset values changes the apparence of the final product.

High x_offset and low y_offset facilitate comparisons along y axis. It gives the sensation that we are looking at the graph from the side.
Low x_offset and high y_offset facilitate comparisons along x axis. It gives the sensation that we are looking at the graph from the top.

Additional features

Facet plot

Facet plot is a plot type where each line trace gets its own x and y axis.

plot_facet(LC_MS_data_2, x_title = "Retention time (min)", y_title = "Relative intensity") +
  scale_color_manual(values = brewer.pal(8, "Set2")[c(1,4)])

The plot_facet() function requires the tidy data frame as input. x_title and y_title are optional. Defaults are "x" and "y", respectively.

Pherogram

Pherogram is short for electropherogram, where we imagine the traces are moving down a gel. The original y value is now represented as color intensity in the heat map.

plot_pherogram(data = metagene_2, 
               y_title = "Position relative to TSS", 
               legend_title = "Normalized\ncoverage",
               mapping = example3_mapping)

The plot_pherogram() function requires the tidy data frame as input. y_title argument controls the y axis title (default = "x"), since it was the x value in the original line traces. legend_title argument controls the title of the color scale (default = "y"), since it was the y value in the origal line traces.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Data		Data
Results		Results
Scripts		Scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ggtraces - A tidyverse and grammar of graphics powered line traces visualizer

Table of contents

Dependencies

Required input

Functions defined by the workflow

Example output

Real datasets

LC-MS data

Metagene data

Getting started

Example script

Load data

Rename columns

Run ggtrace functions one by one

Final touches

Comparison of perspectives

Additional features

Facet plot

Pherogram

About

Releases 1

License

cxli233/ggtraces

Folders and files

Latest commit

History

Repository files navigation

ggtraces - A tidyverse and grammar of graphics powered line traces visualizer

Table of contents

Dependencies

Required input

Functions defined by the workflow

Example output

Real datasets

LC-MS data

Metagene data

Getting started

Example script

Load data

Rename columns

Run ggtrace functions one by one

Final touches

Comparison of perspectives

Additional features

Facet plot

Pherogram

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1