RNA-seq data visualization in the UCSC Genome Browser

The UC Santa Cruz Genome Browser provides a great way to visualize genomic data in context.

Note	ENSEMBL is a similar tool, however we’ll focus on the UCSC Browser in this tutorial.

Note	UCSC has a European mirror here

The UCSC Browser contains many publicly available datasets (ENCODE data, ESTs, RefSeq, GENCODE gene models, etc.) organized in tracks. This is its "core" set of tracks, which can be turned on and off at will (see Track display below).

You can also add your own data to it, via Custom Tracks and Track Hubs.

In this part of the tutorial, we’ll learn how to use the UCSC Browser to visualize the data you’ve just generated with GRAPE (BAMs and bigWigs).

Basics

Tip	A lot of UCSC training resources are available here.

Accessing a locus / genome position

We’ll be looking at the top two differentially expressed genes in our dataset:

ENSEMBL id	Genome position (mm9)	Differential expression
`ENSMUSG00000052187`	`chr7:111,000,259-111,001,754`	E14 > E18
`ENSMUSG00000032936`	`chr9:107,838,251-107,852,022`	E14 < E18

You can input ENSEMBL ids, gene symbols or genome positions into the search box at the top of the page. Hit enter and you’ll be taken to the region of your gene of interest, which will be highlighted in the image.

Track display

Each track can be displayed with different levels of detail. These are, in ascending order:

hide
dense
squish
pack
full

You can access these settings either through individual drop-down track menus below the genome image:

or by right-clicking the corresponding area on the image:

Some track types (bigWigs, BAMs) have much more detailed configuration options:

Visualizing your own data

If not done already, set custom shell environment:
```
source ~ngs00/env/.ngsenv
```

Custom Tracks

The easiest way to load your data into the UCSC Browser is through a Custom Track.

First, we need to make this data accessible from the web, so that UCSC can download it. In your home directory you will find a public_docs/ folder, which is reachable through HTTP at this address: http://public-docs.crg.es/NGS/$USER (replace $USER with your ngsXX username, or type

echo http://public-docs.crg.es/NGS/$USER

in your terminal, and paste the output in your Web browser).

Make Custom Track directory (web-accessible through http://public-docs.crg.es/NGS/$USER/custom_tracks/)
```
mkdir -p $customTrackDir
```

Copy GRAPE output files there (bigWigs + BAMs)

awk '$5~/GenomeAlignment|^PlusRawSignal|^MinusRawSignal/{print $3}' $grapeDb | while read f; do
# copy data files:
rsync -av $f $customTrackDir/
# copy BAM indices as well:
[[ "$f" =~ bam$ ]] && rsync -av $f.bai $customTrackDir/
done

Can you see the files in your Web Browser?

Open the Genome Browser
Make sure you’re using the correct genome assembly (mouse/mm9)
Click on "add custom tracks"

Go back to you terminal and convert local datafile paths to global web URLs:

cd $customTrackDir
for file in `ls . |grep -v .bai`; do
echo "http://public-docs.crg.es/NGS/$USER/custom_tracks/$file"
done

Copy the output

Switch to your Web Browser, paste the URLs into the "Paste URLs or data:" text box and clisk "Submit". Your data will then be fetched by UCSC servers.
Check out our two gene examples:

ENSEMBL id	Genome position (mm9)	Differential expression
`ENSMUSG00000052187`	`chr7:111,000,259-111,001,754`	E14 > E18
`ENSMUSG00000032936`	`chr9:107,838,251-107,852,022`	E14 < E18

Custom tracks are viewable only on the machine from which they were uploaded and are automatically discarded 48 hours after the last time they are accessed, unless they are saved in a Session (in which case UCSC will erase them after 4 months). For a permanent solution, use Track Hubs instead.

Another important limitation is that the track display options need to be configured individually, which is cumbersome if you have multiple datasets.

Track Hubs

Overview

Track Hubs are Custom Tracks on steroids:

	Custom Tracks	Track Hubs
Configure tracks by groups	No	Yes
Where is the data?	Uploaded to UCSC servers (except binary indexed files)	Stays on your server
Accepted file types	All most common (BED, GTF, etc.)	Only binary indexed (bigWig, bigBed, BAM+BAI)
How long will it live?	48h	"Forever"
On exotic genome assemblies?	No	Yes (Assembly hubs)

Although originally developed at UCSC, they are also supported by ENSEMBL.

Track Hubs are very powerful: they allow you to reach the same level of sophistication as some "core" ENCODE tracks such as this one:

They are relatively complex to set up, though.

Introduction to quickTrackHub

Here we will use the quickTrackHub framework to make this task easier.

The idea is to group similar tracks together, based on their associated metadata (represented in their file names). Let’s see what our grouping options are:

We can organize our tracks the following way:
- One superTrack per file type :
  - BAM: ReadAligns
  - bigWig: ReadSignal
- Split each superTrack into composite dimensions:
  - (tissue , lifeStage) (matrix’s X dimension)
  - replicate (matrix’s Y dimension)
  - strand (for bigWigs only)

quickTrackHub will:

Read a Track Hub Definition File (JSON) that contain:

Basic track settings (genome assembly, URL, name, visibility, etc.)
Track grouping instructions

Filename parsing instructions (i.e. how to extract metadata from filenames)

trackHubDefinition.json example

{
	"longLabel" : "ENCODE GRAPE sample data track hub, user ngs00",
	"track" : "crgGrapeSample-ngs00",
	"trackHubAssociatedEmail" : "your.email@yourinstitution.org",
	"webPublicDir" : "http://public-docs.crg.es/NGS/ngs00/track_hub",
	"superTracks" : [
		{
			"track" : "ENCODE_GRAPE_sample",
			"longLabel" : "ENCODE GRAPE sample superTrack",
			"visibility": "dense"
		},
		{
			"track" : "ReadAligns",
			"parent" : "ENCODE_GRAPE_sample",
			"longLabel" : "Read alignments (BAMs)",
			"visibility" : "dense",
			"type" : "bam",
			"fileNameMatch" : {
				"fileExtension" : "bam"
			},
			"compositeDimensions" : {
				"x" : [
					"lifeStage",
					"tissue"
				],
				"y" : [
					"replicate"
				]
			}
		},
		{
			"track" : "ReadSignal",
			"parent" : "ENCODE_GRAPE_sample",
			"longLabel" : "Read signal (BigWigs)",
			"visibility" : "dense",
			"type" : "bigWig",
			"autoScale" : "on",
			"alwaysZero" : "on",
			"maxHeightPixels" : "128:28:11",
			"fileNameMatch" : {
				"fileExtension" : "bw"
			},
			"compositeDimensions" : {
				"x" : [
					"lifeStage",
					"tissue"
				],
				"y" : [
					"replicate"
				],
				"a" : [
					"strand"
				]
			}
		}
	],
	"dataFilesList" : "/users/ngs00/public_docs/track_hub/dataFiles.list",
	"dataFileNameParsingInstructions" :	{
		"fieldSeparator" : "_",
		"fields" : {
			"genome" : 0,
			"tissue" : 1,
			"lifeStage" : 2,
			"replicate" : 3,
			"strand" : 5,
			"fileExtension" : -1
		}
	}
}

Output the corresponding Track Hub file and directory structure that will be parsed by UCSC.

quickTrackHub in practice

First, create a new public subdirectory for the Track Hub
```
mkdir -p $trackHubDir
```

Copy the Custom Track data files there and rename them.

Note	GRAPE’s native output filenames are not (yet) `quickTrackHub`-compliant, this is why we need this renaming extra step.

for f in `find $customTrackDir/ -type f`; do
# perform some string substitution magic to rename the files
outFile=$(basename $f)
outFile=${outFile/mouse/mm9}
outFile=${outFile//.Unique./_Unique_}
# copy/rename data files:
rsync -av $f $trackHubDir/$outFile
# copy/rename BAM indices as well:
[[ "$f" =~ bam$ ]] && rsync -av $f.bai $trackHubDir/$outFile.bai
done

Download quickTrackHub from its github repository to your home directory:
```
cd $HOME
git clone https://github.com/julienlag/quickTrackHub.git
```

Make the script executable:

chmod u+x $HOME/quickTrackHub/quickTrackHub.pl

Download the hubCheck utility from UCSC (somewhat useful for Track Hub debugging purposes), and place it into $HOME/bin/
```
mkdir -p $HOME/bin/
```
```
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/hubCheck -O $HOME/bin/hubCheck
```
Make it executable
```
chmod u+x $HOME/bin/hubCheck
```
cd to public Track Hub directory
```
cd $trackHubDir
```
Copy the template Track Hub Definition JSON file to your public Track Hub directory
```
cp $HOME/quickTrackHub/trackHubDefinition.json .
```
Open and edit the JSON file:
```
gedit trackHubDefinition.json &
```
- Find and replace all instances of ngsXX in the file with your username.
- Replace your.email@yourinstitution.org with your email address (Optional).
- Save

Generate the list of files (BAMS + bigWigs) to include in the Track Hub:

find . -type f | grep "\.bam\|\.bw" | grep -v "\.bai" > dataFiles.list

Make the Track Hub:

quickTrackHub.pl trackHubDefinition.json

Load the Track Hub in the UCSC Browser

Your hub’s URL is output by the following command:
```
echo http://public-docs.crg.es/NGS/$USER/track_hub/hub.txt
```
There are two ways to load your Track Hub:
- Load manually:
  - Click on the "track hub" button below the genome image in the UCSC Browser
  - Select the "My Hubs" tab
  - In the "URL" box, paste the URL of your hub (http://public-docs.crg.es/NGS/$USER/track_hub/hub.txt)
  - Click on "Add Hub"
  - You should be redirected to the mm9 Browser Gateway
- Load directly through URL:
  
  Get the direct link via:
  echo "http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&hubUrl=http://public-docs.crg.es/NGS/$USER/track_hub/hub.txt"
  And copy/paste the output in your browser.
  
  Tip
  Use this direct link to share your Track Hub with collaborators.
  
  The settings of your Track Hub are accessible here (below the genome image):
Look at our two favorite differentially expressed genes:

ENSEMBL id Genome position (mm9) Differential expression

ENSMUSG00000052187

chr7:111,000,259-111,001,754

E14 > E18

ENSMUSG00000032936

chr9:107,838,251-107,852,022

E14 < E18
Tune the track display parameters so as to visualize better the differential expression.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-visualization.adoc

data-visualization.adoc

RNA-seq data visualization in the UCSC Genome Browser

Basics

Accessing a locus / genome position

Track display

Visualizing your own data

Custom Tracks

Track Hubs

Overview

Introduction to quickTrackHub

quickTrackHub in practice

Files

data-visualization.adoc

Latest commit

History

data-visualization.adoc

File metadata and controls

RNA-seq data visualization in the UCSC Genome Browser

Basics

Accessing a locus / genome position

Track display

Visualizing your own data

Custom Tracks

Track Hubs

Overview

Introduction to quickTrackHub

quickTrackHub in practice