Skip to content

Commit

Permalink
Apparently this never really rendered correctly
Browse files Browse the repository at this point in the history
  • Loading branch information
dpryan79 authored May 1, 2017
1 parent a4c7f18 commit 62bf61f
Showing 1 changed file with 32 additions and 31 deletions.
63 changes: 32 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#Bison: bisulfite alignment on nodes of a cluster.
# Bison: bisulfite alignment on nodes of a cluster.

**N.B.: There is now a tutorial available [here](http://sourceforge.net/projects/dna-bison/files/bison_tutorial.tar.gz/download). This tutorial largely replaces this README file and users are encouraged to read it.**

If you use Bison in your work please site the following:
[Ryan D.P. and Ehninger D. **Bison: bisulfite alignment on nodes of a cluster.** *BMC Bioinformatics* 2014, Oct 18;**15**(1):337](http://www.biomedcentral.com/1471-2105/15/337)

##Usage
## Usage

One can index all fasta files (files with extension .fa or .fasta) in a
directory as follows:
Expand Down Expand Up @@ -66,11 +66,12 @@ judgement).

See the "Auxiliary files" section, below, for additional files.

##Auxiliary files
## Auxiliary files

The following programs and scripts will be available if you type "make auxiliary":

###bedGraph2BSseq.py
### bedGraph2BSseq.py

This python script can accept a filename prefix and the names of at least 2
bedGraph files and output 3 files for input into BSseq. A single chromosome can
be processed at a time, if desired, by using the -chr option. The output files
Expand All @@ -94,18 +95,18 @@ BS1 <- BSseq(M=M, Cov=Cov, gr=gr, pData=groups, sampleNames=colnames(M)) #You'll
```


###`bedGraph2methylKit`
### `bedGraph2methylKit`
As above, but each bedGraph file is converted to a .methylKit file. The
bedGraphs should be of CpGs and not have had the strands merged (i.e., don't run
the merge_CpGs command below).

###`bedGraph2MOABS`
### `bedGraph2MOABS`
Like `bedGraph2methylKit`, but each bedGraph file is converted to a .moabs file.
The bedGraph files should ideally contain single-C metrics rather than having
been merged to form CpG metrics, though both are supported. The resulting .moabs
files can then be used by `mcomp` in the MOABS package.

###`bedGraph2MethylSeekR`
### `bedGraph2MethylSeekR`
As above, but each bedGraph file is converted into a .MethylSeekR file. The
bedGraphs MUST be merged before-hand with bison_merge_CpGs to create per-CpG
metrics, as this is what MethylSeekR is expecting. Input is performed with the
Expand All @@ -121,17 +122,17 @@ names(chromosome_lengths) <- fai$V1
d <- readMethylome("file.MethylSeekR", chromosome_lengths)
```

###`make_reduced_genome`
### `make_reduced_genome`
Create a reduced representation genome appropriate for reads of a given size
($size, default is 36bp). MspI and TaqI libraries are supported. Nucleotides
greater than $size+10% are converted to N.

###`merge_bedGraphs.py`
### `merge_bedGraphs.py`
This will merge bedGraphs from technical replicates of a single sample into a
single bedGraph file, summing the methylation metrics as it goes. The output,
like the input is coordinate sorted.

###`bison_merge_CpGs`
### `bison_merge_CpGs`
Methylation is usually symmetric at CpG sites. While the output bedGraph files
have a single-C resolution, this will convert that to single-CpG resolution by
summing Cs in the same CpG from opposite strands. This saves space and will
Expand All @@ -143,7 +144,7 @@ packages either do not require a helper script or can use one of the
aforementioned scripts. Import instructions for such packages are mentioned
below.

###BiSeq
### BiSeq
BiSeq requires input in an identical format as BSseq. Consequently, just use the
bedGraph2BSseq.py helper script. The following example commands should then
suffice to load everything into R:
Expand All @@ -159,7 +160,7 @@ groups <- DataFrame(row.names=colnames(M),
d <- BSraw(exptData=exptData, rowData=gr, colData=groups, totalReads=Cov, methReads=M)
```

###BEAT
### BEAT
The BEAT Bioconductor package conveniently expects per-sample position and
methylation information in a format already present in bedGraph files. However,
this information is in a slightly different format than bedGraph, so the
Expand All @@ -169,7 +170,7 @@ sample_name.positions.csv.
awk '{if(NR>1){printf("%s,%i,%i,%i\n",$1,$2+1,$5,$6)}else{printf("chr,pos,meth,unmeth\n")}}' sample.bedGraph > sample.positions.csv


##Advanced bison_herd usage
## Advanced bison_herd usage

`bison_herd` has the ability to use a semi-arbitrary number of nodes. In practice,
if bison is given N nodes, it will effectively use `2*((N-1)/2)+1` or
Expand Down Expand Up @@ -222,7 +223,7 @@ Even when --reorder is used, if there is >1 second between these, then you may
benefit from increasing the number of compression threads. For those curious,
this option is identical to that used in samtools.

##Throttling
## Throttling

`bison_herd` generally uses blocking, but not synchronous sends. What this means
in practice is that many reads will be queued by the master node for sending to
Expand All @@ -245,7 +246,7 @@ Throttling is not always required, particularly as an increasing number of nodes
are used. Throttling can be disabled altogether by compiling with -DNOTHROTTLE,
which will remove all related components.

##Debug mode
## Debug mode

For debugging, a special debug mode is available for both bison and `bison_herd`
by compiling with -DDEBUG. Instead of running of needing multiple nodes, both
Expand All @@ -266,24 +267,24 @@ non-directional reads.
In general, this mode should not be used unless you are running into extremely
odd bugs.

##Compatibility with Bismark
## Compatibility with Bismark

Bison is generally similar to bismark, however the indexes are incompatible,
due to bismark renaming contigs. Also, the two will not produce identical
output, due to algorithmic differences. Running `bison_methylation_extractor`
on the output of bismark will also produce different results, again due to
algorithmic differences. In addition, bison always outputs BAM files directly.

##Other details
## Other details

Bison needn't be run on multiple computers. You can also use a single
computer for all compute nodes (e.g. mpiexec -n 5 bison ...). The same holds
true for `bison_herd`. Both bison and `bison_herd` seem to be faster than bismark,
even when limited to the same resources.

##Changes
## Changes

###0.4.0
### 0.4.0

* Allow lower case reads in fastq files (previously, this would result in
corrupt BAM files.
Expand Down Expand Up @@ -311,15 +312,15 @@ even when limited to the same resources.
* Fixed a bug in bison_CpG_coverage, where previously only the first
chromosome was used.

###0.3.3
### 0.3.3

* Allow mixed and discordant alignments.

###0.3.2b
### 0.3.2b

* Fix the Makefile to use the static htslib library.

###0.3.2
### 0.3.2
* Added bedGraph2MOABS to convert bedGraph files for use by MOABS. See usage
above.

Expand All @@ -335,7 +336,7 @@ even when limited to the same resources.
* The default minimum MAPQ and Phred scores used by `bison_mbias` have been
updated to match `bison_methylation_extractor`.

###0.3.1
### 0.3.1
* The various bedGraph files didn't previously have a "track" line. The UCSC
Genome Browser requires this, so bedGraph files produced will now contain
it. It should be noted that this is the very minimal line required. Bison
Expand All @@ -355,7 +356,7 @@ even when limited to the same resources.
MAPQ, this one will do that for the read/pair with the highest summed phred
score (a la picard).

###0.3.0
### 0.3.0
* Note: The indices produced by previous versions are not guaranteed to be
compatible unless you used a multi-fasta file. There was a serious
implementation problem with how `bison_index` worked when given multiple
Expand Down Expand Up @@ -387,7 +388,7 @@ even when limited to the same resources.

* A number of small bug fixes, such as when "genome_dir" doesn't end in a /.

###0.2.4
### 0.2.4
* Fixed an off-by-one error in bison_mbias. Also, at some point 1-methylation
percentage started getting calculated. That's been fixed.

Expand All @@ -399,7 +400,7 @@ even when limited to the same resources.
single-C bedGraph files before (if they were merged, then they were being
handled correctly).

###0.2.3
### 0.2.3
* Fix how hard and soft-clipped bases are dealt with (previously, soft-
clipped bases resulted in an error and hard-clipped bases in incorrect
position assignments!).
Expand Down Expand Up @@ -429,15 +430,15 @@ even when limited to the same resources.
(effectively the more verbose version of the MD tag) contains soft-clipped
sequences. I could probably have these removed if someone would like.

###0.2.2
### 0.2.2
* Properly fixed some wording on the textual output (i.e., removed the word
"unique").

* Lowered the default MAPQ and Phred thresholds used by the methylation
extractor to 10 each. That the MAPQ threshold was originally
20 was an error on my part.

###0.2.1
### 0.2.1
* Added support for file globbing in bison_herd. You may now input multiple
files using a combination of wild-cards (*, ?, etc.) and commas. Remember
to put these in quotes (e.g., "foo/*1.fq.gz","bar/*1.fq.gz") so the shell
Expand Down Expand Up @@ -484,7 +485,7 @@ even when limited to the same resources.

* Fixed a bug in bison_herd that allowed early termination without warning.

###0.2.0
### 0.2.0
* Added a note to the methylation summary statistics output at the end of a
run that the numbers will include double counting of any site covered by
both mates in a pair. These metrics are only meant for general information
Expand Down Expand Up @@ -512,7 +513,7 @@ even when limited to the same resources.
actually supports the level of thread support requested (previously, this
was just assumed).

###0.1.1
### 0.1.1
* Fixed a number of minor bugs.

* Added support for uncompressed fastq files, as well as bzipped files
Expand Down Expand Up @@ -545,5 +546,5 @@ even when limited to the same resources.
an RRBS genome and other possibly useful functions.


###0.1.0
### 0.1.0
Initial release

0 comments on commit 62bf61f

Please sign in to comment.