-
Notifications
You must be signed in to change notification settings - Fork 6
mtw/Bio-ViennaNGS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# Bio::ViennaNGS 0.19.2 Bio::ViennaNGS is a distribution of Perl modules and utilities for building efficient Next-Generation Sequencing (NGS) analysis pipelines. It covers various aspects of NGS data analysis, including (but not limited to) conversion of sequence annotation, evaluation of mapped data, expression quantification and visualization. Bio::ViennaNGS is shipped with a complementary set of modules, classes and roles: - Bio::ViennaNGS::AnnoC A Moose interface for storage and conversion of sequence annotation data. - Bio::ViennaNGS::Bam Routines for high-level manipulation of BAM files. - Bio::ViennaNGS::Bed A Moose interface for manipulation of genomic interval data in BED format. - Bio::ViennaNGS::BedGraphEntry A Moose interface for storing genomic cvoerage and interval data in bedGraph format. - Bio::ViennaNGS::Expression A Moose interface for computation of normalized gene / transcript expression based on read counts. - Bio::ViennaNGS::ExtFeature A Moose wrapper for extended BED6 elements. - Bio::ViennaNGS::Fasta Routines for accessing genomic sequences implemented through a Moose interface to Bio::DB::Fasta. - Bio::ViennaNGS::Feature A Moose-based BED6 wrapper. - Bio::ViennaNGS::FeatureChain Yet another Moose class for chaining gene annotation features. - Bio::ViennaNGS::FeatureInterval A Moose interface for handling elementary genomic intervals, corresponding to BED3. - Bio::ViennaNGS::FeatureIO A Moose interface for easy input/output operations on common feature annotation file formats. - Bio::ViennaNGS::FeatureLine An abstract Moose class for combining several Bio::ViennaNGS::FeatureChain objects. - Bio::ViennaNGS::MinimalFeature A Moose interface for handling elementary gene annotation, corresponding to BED4. - Bio::ViennaNGS::Peak A Moose interface for identification and characterization of (coverage / expression) peaks in RNA-seq data. - Bio::ViennaNGS::SpliceJunc A collection of routines for alternative splicing analysis. - Bio::ViennaNGS::Subtypes Moose subtypes for internal usage. - Bio::ViennaNGS::Tutorial A comprehensive tutorial of the Bio::ViennaNGS core routines with real-world NGS data. - Bio::ViennaNGS::UCSC Routines for visualization of genomics data with the UCSC genome browser. - Bio::ViennaNGS::Util A collection of wrapper routines for commonly used third-party NGS utilities as well as a set of utility functions. ## UTILITIES In addition, Bio::ViennaNGS comes with a collection of utility programs for accomplishing routine tasks often required in NGS data processing. These utilities serve as reference implementation of the routines implemented in the (sub)modules: assembly_hub_constructor.pl: The UCSC genome browser offers the possibility to visualize any organism (including organisms that are not included in the standard UCSC browser bundle) through hso called 'Assembly Hubs'. This script constructs Assembly Hubs from genomic sequence and annotation data. bam_split.pl: Split (paired-end and single-end) BAM alignment files by strand and compute statistics. Optionally create BED output, as well as normalized bedGraph and bigWig files for coverage visualization in genome browsers (see dependencies on third-patry tools below). bam_to_bigWig.pl: Produce bigWig coverage profiles from (aligned) BAM files, explicitly considering strandedness. The most natural use case of this tool is to create strand-aware coverage profiles in bigWig format for genome browser visualization. bam_uniq.pl: Extract unique and multi mapping reads from BAM alignment files and create a separate BAM file for both uniqe (.uniq.) and multi (.mult.) mappers. bed2bedGraph.pl: Convert BED files to (strand specific) bedGraph files, allowing additional annotation and automatic generation of bedGraph files which can easily be converted to big-type files for easy UCSC visualization. bed2nt2aa.pl: Extract sequence intervals from Fasta files, provided as BED6 records. Optionally translate nucleotide into amino acid sequences. bed62bed12.pl: A simple BED6 -> BED12 converter fasta_multigrep.pl: Extract individual (whole) sequences from a multi Fasta file fasta_subgrep.pl: Extract subsequences from (multi) Fasta files. fasta_regex.pl: Extract sequence motifs from (multi) Fasta files provided as regular expression. gff2bed.pl: Convert RefSeq GFF3 annotation files to BED12 format. Individual BED12 files are created for each feature type (CDS/tRNA/rRNA/etc.). Tested with RefSeq bacterial GFF3 annotation. kmer_analysis.pl: Count k-mers of predefined length in FastQ and Fasta files newUCSCdb.pl: Create a new genome database to a locally installed instance of the UCSC genome browser in order to add a novel organism for visualization. normalize_multicov.pl: Compute normalized expression data in TPM and RPKM from (raw) read counts in bedtools multicov format. TPM reference: Wagner et al, Theory Biosci. 131(4), pp 281-85 (2012). rnaseq_peakfinder.pl: Find and characterize peaks/enriched regions of certain size and coverage in RNA-seq data. sj_visualizer.pl: Convert splice junctions from mapped RNA-seq data in segemehl BED6 splice junction format to BED12 for easy visualization in genome browsers. splice_site_summary.pl: Identify and characterize splice junctions from RNA-seq data by intersecting them with annotated splice junctions. track_hub_constructor.pl: Analogous to assembly_hub_constructor.pl, construct a Track Hub for an organism listed in the UCSC Genome Browser. trim_fastq.pl: Trim sequence and quality string fields in a Fastq file by user defined length. ## TUTORIAL See Bio::ViennaNGS::Tutorial ## INSTALLATION We have compiled a comprehensive `ViennaNGS Installation HOWTO` which covers the setup of third party and Perl dependencies in detail: http://rna-seq.at/blog/2015/03/03/viennangs-installation-howto/ If you have already installed all dependencies, the ViennaNGS distribution can be installed with the following commands: perl Makefile.PL make make test make install However, we highly recommend using *cpanminus* for installing ViennaNGS. If you have cpanm available on your system, simply type cpanm Bio::ViennaNGS This command will grab the lastest ViennaNGS release from CPAN together with all Perl dependencies and install the suite automatically (provided you have samtools <= 0.1.19 installed, see the HOWTO link above). ## SUPPORT AND DOCUMENTATION After installing, you can find documentation for this module with the perldoc command. perldoc Bio::ViennaNGS You can also look for information at: RT, CPAN's request tracker (report bugs here) http://rt.cpan.org/NoAuth/Bugs.html?Dist=Bio-ViennaNGS AnnoCPAN, Annotated CPAN documentation http://annocpan.org/dist/Bio-ViennaNGS CPAN Ratings http://cpanratings.perl.org/d/Bio-ViennaNGS Search CPAN http://search.cpan.org/dist/Bio-ViennaNGS ## DEPENDENCIES Bio::ViennaNGS depends on a set of third-party tools and libraries which are required for specific filtering and file format conversion tasks as well as for building internally used Perl modules: * bedtools2 >= 2.17 (https://github.com/arq5x/bedtools2) * bedGraphToBigWig, fetchChromSizes, faToTwoBit from the UCSC Genome Browser applications (http://hgdownload.cse.ucsc.edu/admin/exe/) * the R Statistics software (http://www.r-project.org/) * samtools <= v0.1.19 for building Bio::DB::Sam. Please note that more recent HTSlib-based versions of samtools will not work with Bio::DB::Sam Please ensure that all third-party utilities are available on your system and accessible to the Perl interpreter. Moreover, Bio::ViennaNGS depends on a ste of Perl packages that are not shipped with the Perl core distribution, most notably Bio::Perl >= 1.00690001 and Bio::DB::Sam >= 1.37. ## SOURCE AVAILABILITY This source is available on Github: https://github.com/mtw/Bio-ViennaNGS ## VIENNANGS PAPER If the Bio::ViennaNGS suite is useful for your work or if you use any component of Bio::ViennaNGS in a custom pipeline, please cite the following publication: "ViennaNGS - A toolbox for building efficient next-generation sequencing analysis pipelines" Michael T. Wolfinger, Joerg Fallmann, Florian Eggenhofer and Fabian Amman F1000Research 4:50 (2015) DOI: http://dx.doi.org/10.12688/f1000research.6157.2 ## NOTES The Bio::ViennaNGS suite is actively developed and tested on different flavours of Linux and Mac OS X. We have taken care of writing platform-independent code that should run out of the box on most UNIX-based systems, however we do not have access to machines running Microsoft Windows. As such Microsoft Windows compatibility is not tested. ## BUGS Please report bugs through the Github issue tracker at https://github.com/mtw/Bio-ViennaNGS/issues ## AUTHORS Michael T. Wolfinger <michael@wolfinger.eu> Joerg Fallmann <fall@tbi.univie.ac.at> Florian Eggenhofer <florian.eggenhofer@tbi.univie.ac.at> Fabian Amman <fabian@tbi.univie.ac.at> ## COPYRIGHT AND LICENCE Copyright (C) 2014-2017 Michael T. Wolfinger <michael@wolfinger.eu> This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
About
A Perl extension and collection of utilities for Next-Generation Sequencing (NGS) data analysis
Resources
Stars
Watchers
Forks
Packages 0
No packages published