Integrative transcript library for Branchiostoma floridae (www.bio-add.org/InTrans/)
Copyright (C) 2017 ZhiLiang Ji (appo@xmu.edu.cn)
This software is suitable for all unix-like system with python(version 2.7.7) installed.
One python module was required before usage : configparser3.5.0.
Moreover, three already published softwares should be correctly installed in advance, and
make sure they had been add to your system environment variables. The three softwares are:
(1) IDBA (version 1.1.1) https://github.com/loneknightpy/idba
(2) CD-HIT (version 4.5.4) http://www.bioinformatics.org/cd-hit/
(3) CAP3 (version 12/21/07) http://seq.cs.iastate.edu/cap3.html
of course, for softwares mentioned above, other version is allowed. However, the pipeline operated
stably with the recommended version.
Simply installed by extracting the software package
In the package folder you extracted, there are three files and one derectory : "InTrans.py", "run.cfg", "__init__.py" and "test_data"
(1) "InTrans.py" is the software executed file
(2) "run.cfg" is the configure file
, which contains a series of important parameters. For correctly running with your data, you set the right parameter value in "run.cfg" file. Detail of these parameters is writed in "run.cfg"
, or if you confused, please see the corresponding software manual.
(1) the default maximun read length of IDBA is 128 bp
, if your read is longer than that, you should change
the vaue of 128 to longer one (e.g. 250) in "xx/idba-xxx/src/sequence/short_sequence.h" :
"static const uint32_t kMaxShortSequence = 128;"
->
"static const uint32_t kMaxShortSequence = 250;"
(2) correspondingly, you should also change the default kmer unit to bigger one(e.g. 8) in "xx/idba-xxxsrc/basic/kmer.h":
"static const uint32_t kNumUint64 = 4;"
->
"static const uint32_t kNumUint64 = 8;"
(3) recompile IDBA after modification to make new read length and kmer working
If individual parameter value had been set in "run.cfg" file, then run the pipeline with:
$ python InTrans.py run.cfg
For example, you can make a test running with datas in "test_data"
:
(1) run without heterogeneous data, corresponding configure file is run_fq.cfg
:
$ cd ./test_data/
$ python ../InTrans.py run_fq.cfg
(2) run with heterogeneous data, corresponding configure file is run_fq_heterogen.cfg
:
$ cd ./test_data/
$ python ../InTrans.py run_fq_heterogen.cfg
Two folders and one log file were generated after the program runs out:
(1) "output" folder
contains the final transcript file, which in fasta format.
(2) "temp_output" folder
contains the temporary file during running, include output of IDBA, CD-HIT, and CAP3.