Skip to content

CiaranOMara/BedgraphFiles.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BedgraphFiles.jl

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Unit Tests codecov

This project follows the semver pro forma and uses the git-flow branching model.

Overview

This package provides load and save support for Bedgraph under the FileIO package, and also implements the IterableTables interface for easy conversion between tabular data structures.

Installation

You can install BedgraphFiles from the Julia REPL. Press ] to enter pkg mode, then enter the following:

add BedgraphFiles

If you are interested in the cutting edge of the development, please check out the develop branch to try new features before release.

Usage

Loading bedGraph files

To load a bedGraph file into a Vector{Bedgraph.Record}, use the following Julia code:

using FileIO, BedgraphFiles, Bedgraph

records = Vector{Bedgraph.Record}(load("data.bedgraph"))
records = collect(Bedgraph.Record, load("data.bedgraph"))

Saving bedGraph files

Note: saving on top of an existing file will overwrite metadata/header information with a minimal working header.

The following example saves a Vector{Bedgraph.Record} to a bedGraph file:

using FileIO, BedgraphFiles, Bedgraph

records = [Bedgraph.Record("chr", i, i + 99, rand()) for i in 1:100:1000]

save("output.bedgraph", records)

IterableTables

The execution of load returns a struct that adheres to the IterableTables interface, and can be passed to any function that also implements the interface, i.e. all the sinks in IterableTable.jl.

The following code shows an example of loading a bedGraph file into a DataFrame:

using FileIO, BedgraphFiles, DataFrames

df = DataFrame(load("data.bedgraph"))

Here are some more examples of materialising a bedGraph file into other data structures:

using FileIO, BedgraphFiles, DataTables, IndexedTables, Gadfly

# Load into a DataTable
dt = DataTable(load("data.bedgraph"))

# Load into an IndexedTable
it = table(load("data.bedgraph"))

# Plot directly with Gadfly
plot(load("data.bedgraph"), xmin=:leftposition, xmax=:rightposition, y=:value, Geom.bar)

The following code saves any compatible source to a bedGraph file:

using FileIO, BedgraphFiles

it = getiterator(data)

save("output.bedgraph", it)

Using the pipe syntax

Both load and save also support the pipe syntax. For example, to load a bedGraph file into a DataFrame, one can use the following code:

using FileIO, BedgraphFiles, DataFrame

df = load("data.bedgraph") |> DataFrame

To save an iterable table, one can use the following form:

using FileIO, BedgraphFiles, DataFrame

df = # Aquire a DataFrame somehow.

df |> save("output.bedgraph")

The save method returns the data provided or Vector{Bedgraph.Record}. This is useful when periodically saving your work during a sequence of operations.

records = some sequence of operations |> save("output.bedgraph")

The pipe syntax is especially useful when combining it with Query.jl queries. For example, one can easily load a bedGraph file, pipe its data into a query, and then store the query result by piping it to the save function.

using FileIO, BedgraphFiles, Query
load("data.bedgraph") |> @filter(_.chrom == "chr19") |> save("data-chr19.bedgraph")

Acknowledgements

This package is largely -- if not completely -- inspired by the work of David Anthoff. Other influences are from the BioJulia community.