Skip to content

A tidyverse and grammar of graphics powered line traces visualizer

License

Notifications You must be signed in to change notification settings

cxli233/ggtraces

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ggtraces - A tidyverse and grammar of graphics powered line traces visualizer

v.1.0.0 release: DOI

Author: Chenxin Li, Ph.D., Assistant Research Scientist at Department of Crop & Soil Sciences and Center for Applied Genetic Technologies, University of Georgia.

Contact: Chenxin.Li@uga.edu

The main goal of this repository is to empower R users such that we can produce publication quality chromatograms with R. Examples and explanations are below.

The Scripts/ directory contains .Rmd files that generate the graphics shown below. It requires R, RStudio, and the rmarkdown package.

Table of contents

  1. Dependencies
  2. Required input
  3. Functions generated by the workflow
  4. Example output
  5. Real datasets
  6. Getting started
  7. Example script
  8. Comparison of perspectives
  9. Additional features

Dependencies

library(tidyverse)

This is a tidyverse based workflow.

Required input

The workflow requires the input data to be in the tidy format (each row is an observation, and each column is a variable).

It requires the following 3 columns:

  1. The column named x, which will be the x axis
  2. the column named y, which will be the y axis
  3. A sample column that indicates the sample ID of each of the traces.

Addition required values:

  1. a vector of sample IDs
  2. x_offset, default = 0.2
  3. y_offset, default = 0.4
  4. number of traces to plot

Functions defined by the workflow

This workflow defines a 6 functions in this order:

  1. find_xy_ranges() takes the tidy input data frame and finds xmin, xmax, ymin, and ymax.
  2. make_grid_table() takes the ranges produced by find_xy_ranges() and produce a data frame that will be used to make the coordinate system. Additionally, it requires x_offset and y_offset and number_of_traces.
  3. make_axis_table() takes the ranges produced by find_xy_ranges() and produce a data frame that will be used to make the coordinate system.
  4. make_coord() takes the output of find_xy_ranges(), make_grid_table(), make_axis_table, to make a ggplot object that is a blank coordinate system. It also requires x_offset and y_offset and number_of_traces.
  5. map_sample_to_trace() takes a vector of sample IDs and produce a data frame that maps sample IDs to traces (column of 1 to n).
  6. plot_traces() takes the output of all the above and produce a ggplot object.

Example output

As a example, let's visualize two sine waves.

The workflow first generates a blank coordinate system, which is a ggplot object (a "grob").

  • The coordinate system is definbed by x and y value ranges, as well as number of traces to graph.
  • The perspective of the coordinate system is defined by x_offset and y_offset.

Example blank coord

Again, the blank coordinate is a "grob" object. We can add ggplot layers to, such as geom, scale, theme, and so on.

The trace plot in its most basic form, is the blank coordinate system + geom_line() to plot the line traces.

Example trace plot

This is showing two sine ways aligned along a parallelogram. This is a grob object. We can add more ggplot layers to it if needed, such as replacing the default color palette. Usually it requires some final touches to make it look nicer.

Example trace plot, but nicer

Real datasets

The best way to use this tool is running ggtraces.Rmd in the same environment (same RStudio window) in a different tab. Doing so will deposite the functions needed into the environment. Then you can simply call the functions one-by-one.

I tried out two real datasets that are very different. The first one is LC-MS data. Data from Li et al., 2022
The second one is small RNA metagene (averaged gene) data. Data from Li et al., 2020 and Li et al., 2022.

Running ggtraces_uses.Rmd in the Scripts/ directory will generate these graphs.

LC-MS data

LC-MS example This is showing the base peak chromatograms (normalized to higest peak) of two samples.

Metagene data

Metagene example This is showing normalized coverage of 24-nt siRNAs (per 1000 24-nt siRNAs) arround transcription start sites, averaged across all genes.

Getting started

  1. Clone the repository to your machine.
  2. Run ggtraces.Rmd under Scripts/. You will need to install the rmarkdown package.
  3. Call each function in order.
  4. Make final touches (e.g., adjust axis range, axis label, color palette, and so on)
  5. Done!

Example script

Load data

metagene <- read_csv("../Data/metagene.csv", col_types = cols())

This is already a tidy data frame. If your data table is not in the tidy format, you'll need to re-format it first.

Rename columns

metagene_2 <- metagene %>% 
  dplyr::rename(x = `bin start`,
                sample = sample_type) %>% 
  mutate(y = mena_pro_24 * 1000)

The workflow requires x, y, and sample columns.

Run ggtrace functions one by one

example3_ranges <- find_xy_ranges(metagene_2)
example3_grid_table <- make_grid_table(example3_ranges, x_offset = 200, y_offset = 150, number_traces = 5)
example3_axis_table <- make_axis_table(example3_ranges)

example3_coord <- make_coord(
  grid_table = example3_grid_table, 
  axis_table = example3_axis_table,
  ranges = example3_ranges,
  number_traces = 5,
  x_offset = 200,
  y_offset = 150
)

example3_names <- c("sperm", "egg", "zygote", "seedling")
example3_mapping <- map_sample_to_trace(example3_names)

example3_traces <- plot_traces(
  data = metagene_2,
  coord = example3_coord,
  mapping = example3_mapping,
  x_offset = 200,
  y_offset = 150,
  ranges = example3_ranges,
  x_title = "Position relative to TSS",
  y_title = "Normalized\ncoverage",
  sample_ID_title = "Cell type"
)
  • You will need to provide x_offset, y_offset, and number_of_traces. These values differ across experiments.
  • You will need to provide the names of the traces. They are prodived via example3_names <- c("sperm", "egg", "zygote", "seedling").

Final touches

Manually adjust axis breaks, axis range, color palette, and axis title position. Since example3_traces is a ggplot object, we can easily make additional customizations.

example3_traces +
  geom_segment(x = -Inf, xend = -Inf, y = 0, yend = 800, size = 1.1, color = "grey20") +
  geom_segment(x = -3000, xend = 2000, y = -Inf, yend = -Inf, size = 1.1, color = "grey20") +
  scale_color_manual(values = c("dodgerblue2", "tomato1", "violetred4", "seagreen"),
                     limits = example3_mapping$sample) +
  scale_y_continuous(breaks = c(0, 200, 400, 600, 800)) +
  theme(legend.position = "top",
        axis.title.y = element_text(hjust = 0.4))

Metagene example Done!

Comparison of perspectives

Different x_offset and y_offset values changes the apparence of the final product. LC MS different perspectives

  • High x_offset and low y_offset facilitate comparisons along y axis. It gives the sensation that we are looking at the graph from the side.
  • Low x_offset and high y_offset facilitate comparisons along x axis. It gives the sensation that we are looking at the graph from the top.

Additional features

Facet plot

Facet plot is a plot type where each line trace gets its own x and y axis.

plot_facet(LC_MS_data_2, x_title = "Retention time (min)", y_title = "Relative intensity") +
  scale_color_manual(values = brewer.pal(8, "Set2")[c(1,4)]) 

LC MS facet plot

The plot_facet() function requires the tidy data frame as input. x_title and y_title are optional. Defaults are "x" and "y", respectively.

Pherogram

Pherogram is short for electropherogram, where we imagine the traces are moving down a gel. The original y value is now represented as color intensity in the heat map.

plot_pherogram(data = metagene_2, 
               y_title = "Position relative to TSS", 
               legend_title = "Normalized\ncoverage",
               mapping = example3_mapping)

Metagene pherogram

The plot_pherogram() function requires the tidy data frame as input. y_title argument controls the y axis title (default = "x"), since it was the x value in the original line traces. legend_title argument controls the title of the color scale (default = "y"), since it was the y value in the origal line traces.