(Note, this won't work with amplified data.)
Extra resources:
---
At the command line, create a new directory and extract some data:
cd /mnt mkdir slice cd slice
We're going to work with half the read data set for speed reasons --
gunzip -c ../mapping/SRR1976948.abundtrim.subset.pe.fq.gz | \ head -6000000 > SRR1976948.half.fq
In a Jupyter Notebook (go to 'http://' + machine name + ':8000'), password 'davis', create new Python notebook "conda root", run:
cd /mnt/slice
and then in another cell:
!load-into-counting.py -M 4e9 -k 31 SRR1976948.kh SRR1976948.half.fq
and in another cell:
!abundance-dist.py SRR1976948.kh SRR1976948.half.fq SRR1976948.dist
and in yet another cell:
%matplotlib inline import numpy from pylab import * dist1 = numpy.loadtxt('SRR1976948.dist', skiprows=1, delimiter=',') plot(dist1[:,0], dist1[:,1]) axis(ymax=10000, xmax=1000)
Then:
python2 ~/khmer/sandbox/calc-median-distribution.py SRR1976948.kh \ SRR1976948.half.fq SRR1976948.readdist
And:
python2 ~/khmer/sandbox/slice-reads-by-coverage.py SRR1976948.kh SRR1976948.half.fq slice.fq -m 0 -M 60
(Re)install megahit:
cd git clone https://github.com/voutcn/megahit.git cd megahit make
Go back to the slice directory and extract paired ends:
cd /mnt/slice extract-paired-ends.py slice.fq
Assemble!
~/megahit/megahit --12 slice.fq.pe -o slice
The contigs will be in slice/final.contigs.fa
.