From e24d6a6dd7708c018d8b2ef5aa9fec1fce2c3345 Mon Sep 17 00:00:00 2001 From: ndombrowski Date: Wed, 7 Feb 2024 12:02:41 +0100 Subject: [PATCH] add docu for featurecounts --- .quarto/idx/index.qmd.json | 2 +- .quarto/xref/7fa5e22e | 2 +- .quarto/xref/INDEX | 3 + _quarto.yml | 1 + docs/index.html | 7 + docs/search.json | 30 +- docs/sitemap.xml | 8 +- docs/source/core_tools/featurecounts.html | 1160 +++++++++++++++++++++ docs/source/core_tools/samtools.html | 4 +- index.qmd | 1 + source/core_tools/featurecounts.qmd | 57 + source/core_tools/references.bib | 15 + source/core_tools/samtools.qmd | 4 +- 13 files changed, 1283 insertions(+), 11 deletions(-) create mode 100644 docs/source/core_tools/featurecounts.html create mode 100644 source/core_tools/featurecounts.qmd diff --git a/.quarto/idx/index.qmd.json b/.quarto/idx/index.qmd.json index c2e7669..19e6717 100644 --- a/.quarto/idx/index.qmd.json +++ b/.quarto/idx/index.qmd.json @@ -1 +1 @@ -{"title":"Useful tutorials","markdown":{"yaml":{"toc-depth":2},"headingText":"Useful tutorials","containsRefs":false,"markdown":"\n\n![](img/banner-min.png)\n\n
\n\nOn this website you can find documentation about software commonly used in bioinformatic data analyses as well as tutorials about various bioinformatic subjects. On this webpage you can find software organized by topic and for each topic you find a list of commonly used software tools.\n\nIf you are working at the University of Amsterdam (UvA) Institute for Biodiversity and Ecosystem Dynamics (IBED) and want to know more about what computational resources are available, please also visit the [computational support teams website](https://ibed.uva.nl/facilities/computational-facilities/ibed-computational-support-team/ibed-computational-support-team.html).\n\nPlease, be aware that this page is a work in progress and will be slowly updated over time. If you want to add additional information or feel that something is missing feel free to send an email to [n.dombrowski\\@uva.nl](mailto:n.dombrowski@uva.nl){.email}.\n\n\n### Getting started with bash\n\n- [A tutorial on using bash and an HPC](https://ndombrowski.github.io/cli_workshop/)\n- [Version control with git](https://github.com/fkariminejadasl/ml-notebooks/blob/main/tutorial/git.md)\n- [A tutorial on using AWK](https://ndombrowski.github.io/AWK_tutorial/), a command line tool for filtering tables, extracting patterns, etc... If you want to follow this tutorial then you can download the required input files from [here](https://github.com/ndombrowski/AWK_tutorial/tree/main/1_Inputfiles)\n\n### Using R\n\n- [An R cookbook](https://ndombrowski.github.io/R_cookbook/) including some [example files](https://github.com/ndombrowski/R_cookbook/tree/main/data) if you want to code along\n- [Tutorial on data manipulation with dplyr](https://ndombrowski.github.io/Tidyverse_tutorial/)\n- [Tutorial on data visualization with ggplot2](https://ndombrowski.github.io/Ggplot_tutorial/)\n\n### Bioinformatic workflows\n\n- [From sequence file to OTU table with Qiime](source/Qiime/3_evelyn_tutorial_notes.qmd)\n- [Analysing an OTU table with R](source/Qiime/OTU_table_analysis.qmd)\n- [Assembling a metagenome](https://ndombrowski.github.io/Assembly_tutorial/)\n- [Metagenomic binning](https://ndombrowski.github.io/Binning_tutorial//)\n- [Annotating microbial genomes](https://github.com/ndombrowski/Annotation_workflow)\n- [How to do generate a species tree](https://ndombrowski.github.io/Phylogeny_tutorial/)\n- [Accessing data from NCBI](source/core_tools/ncbi.qmd)\n\n## Bioinformatic tools A-Z\n\n- [Bowtie 2](source/core_tools/bowtie.qmd): A tool for aligning sequencing reads to genomes and other reference sequences\n- [Chopper](source/nanopore/chopper.qmd): A tool for quality filtering of long read data\n- [FAMA](source/metagenomics/fama_readme.qmd): A fast pipeline for functional and taxonomic analysis of metagenomic sequences\n- [FastP](source/metatranscriptomics/fastp.qmd): A tool for fast all-in-one preprocessing of FastQ files\n- [FastQC](source/metagenomics/fastqc_readme.qmd): A quality control tool for read sequencing data\n- [Interproscan](source/metagenomics/interproscan_readme.qmd): A tool to scan protein and nucleic sequences against InterPro signatures\n- [ITSx](source/ITSx/itsx_readme.qmd): A tool to extract ITS1 and ITS2 subregions from ITS sequences\n- [Kraken2](source/classification/kraken2.qmd): A taxonomic sequence classifier using kmers\n- [METABOLIC](source/metagenomics/metabolic.qmd): A tool to predict functional trait profiles in genome datasets\n- [Minimap2](source/classification/minimap2.qmd): A program to align DNA or mRNA sequences against a reference database\n- [NanoClass2](source/nanopore/nanoclass.qmd): A taxonomic meta-classifier for long-read 16S/18S rRNA gene sequencing data\n- [NanoITS](source/nanopore/nanoITS.qmd): A taxonomic meta-classifier for long-read ITS operon sequencing data\n- [NanoPlot](source/nanopore/nanoplot_readme.qmd): Plotting tool for long read sequencing data\n- [NanoQC](source/nanopore/nanoqc_readme.qmd): A quality control tool for long read sequencing data\n- [Porechop](source/nanopore/porechop_readme.qmd): A tool for finding and removing adapters from Nanopore reads\n- [Prokka](source/core_tools/prokka.qmd): A tool to annotate bacterial, archaeal and viral genomes\n- [Samtools](source/core_tools/samtools.qmd): A tool to manipulating alignments in SAM/BAM format\n- [SeqKit](source/core_tools/seqkit.qmd): A tool for FASTA/Q file manipulation\n- [SortMerNa](source/metatranscriptomics/sortmerna.qmd): A tool to filter ribosomal RNAs in metatranscriptomic data\n- [Trinity](source/metatranscriptomics/trinity.qmd): A tool to assemble transcript sequences from Illumina RNA-Seq data","srcMarkdownNoYaml":"\n\n![](img/banner-min.png)\n\n
\n\nOn this website you can find documentation about software commonly used in bioinformatic data analyses as well as tutorials about various bioinformatic subjects. On this webpage you can find software organized by topic and for each topic you find a list of commonly used software tools.\n\nIf you are working at the University of Amsterdam (UvA) Institute for Biodiversity and Ecosystem Dynamics (IBED) and want to know more about what computational resources are available, please also visit the [computational support teams website](https://ibed.uva.nl/facilities/computational-facilities/ibed-computational-support-team/ibed-computational-support-team.html).\n\nPlease, be aware that this page is a work in progress and will be slowly updated over time. If you want to add additional information or feel that something is missing feel free to send an email to [n.dombrowski\\@uva.nl](mailto:n.dombrowski@uva.nl){.email}.\n\n## Useful tutorials\n\n### Getting started with bash\n\n- [A tutorial on using bash and an HPC](https://ndombrowski.github.io/cli_workshop/)\n- [Version control with git](https://github.com/fkariminejadasl/ml-notebooks/blob/main/tutorial/git.md)\n- [A tutorial on using AWK](https://ndombrowski.github.io/AWK_tutorial/), a command line tool for filtering tables, extracting patterns, etc... If you want to follow this tutorial then you can download the required input files from [here](https://github.com/ndombrowski/AWK_tutorial/tree/main/1_Inputfiles)\n\n### Using R\n\n- [An R cookbook](https://ndombrowski.github.io/R_cookbook/) including some [example files](https://github.com/ndombrowski/R_cookbook/tree/main/data) if you want to code along\n- [Tutorial on data manipulation with dplyr](https://ndombrowski.github.io/Tidyverse_tutorial/)\n- [Tutorial on data visualization with ggplot2](https://ndombrowski.github.io/Ggplot_tutorial/)\n\n### Bioinformatic workflows\n\n- [From sequence file to OTU table with Qiime](source/Qiime/3_evelyn_tutorial_notes.qmd)\n- [Analysing an OTU table with R](source/Qiime/OTU_table_analysis.qmd)\n- [Assembling a metagenome](https://ndombrowski.github.io/Assembly_tutorial/)\n- [Metagenomic binning](https://ndombrowski.github.io/Binning_tutorial//)\n- [Annotating microbial genomes](https://github.com/ndombrowski/Annotation_workflow)\n- [How to do generate a species tree](https://ndombrowski.github.io/Phylogeny_tutorial/)\n- [Accessing data from NCBI](source/core_tools/ncbi.qmd)\n\n## Bioinformatic tools A-Z\n\n- [Bowtie 2](source/core_tools/bowtie.qmd): A tool for aligning sequencing reads to genomes and other reference sequences\n- [Chopper](source/nanopore/chopper.qmd): A tool for quality filtering of long read data\n- [FAMA](source/metagenomics/fama_readme.qmd): A fast pipeline for functional and taxonomic analysis of metagenomic sequences\n- [FastP](source/metatranscriptomics/fastp.qmd): A tool for fast all-in-one preprocessing of FastQ files\n- [FastQC](source/metagenomics/fastqc_readme.qmd): A quality control tool for read sequencing data\n- [Interproscan](source/metagenomics/interproscan_readme.qmd): A tool to scan protein and nucleic sequences against InterPro signatures\n- [ITSx](source/ITSx/itsx_readme.qmd): A tool to extract ITS1 and ITS2 subregions from ITS sequences\n- [Kraken2](source/classification/kraken2.qmd): A taxonomic sequence classifier using kmers\n- [METABOLIC](source/metagenomics/metabolic.qmd): A tool to predict functional trait profiles in genome datasets\n- [Minimap2](source/classification/minimap2.qmd): A program to align DNA or mRNA sequences against a reference database\n- [NanoClass2](source/nanopore/nanoclass.qmd): A taxonomic meta-classifier for long-read 16S/18S rRNA gene sequencing data\n- [NanoITS](source/nanopore/nanoITS.qmd): A taxonomic meta-classifier for long-read ITS operon sequencing data\n- [NanoPlot](source/nanopore/nanoplot_readme.qmd): Plotting tool for long read sequencing data\n- [NanoQC](source/nanopore/nanoqc_readme.qmd): A quality control tool for long read sequencing data\n- [Porechop](source/nanopore/porechop_readme.qmd): A tool for finding and removing adapters from Nanopore reads\n- [Prokka](source/core_tools/prokka.qmd): A tool to annotate bacterial, archaeal and viral genomes\n- [Samtools](source/core_tools/samtools.qmd): A tool to manipulating alignments in SAM/BAM format\n- [SeqKit](source/core_tools/seqkit.qmd): A tool for FASTA/Q file manipulation\n- [SortMerNa](source/metatranscriptomics/sortmerna.qmd): A tool to filter ribosomal RNAs in metatranscriptomic data\n- [Trinity](source/metatranscriptomics/trinity.qmd): A tool to assemble transcript sequences from Illumina RNA-Seq data"},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":null,"freeze":"auto","echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"engine":"markdown"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":"none","code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":true,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","css":["styles.css"],"toc":true,"toc-depth":2,"output-file":"index.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.4.549","date-modified":"last-modified","title-block-style":"none","theme":{"light":"lumen","dark":"cyborg"},"toc-expand":true},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]} \ No newline at end of file +{"title":"Useful tutorials","markdown":{"yaml":{"toc-depth":2},"headingText":"Useful tutorials","containsRefs":false,"markdown":"\n\n![](img/banner-min.png)\n\n
\n\nOn this website you can find documentation about software commonly used in bioinformatic data analyses as well as tutorials about various bioinformatic subjects. On this webpage you can find software organized by topic and for each topic you find a list of commonly used software tools.\n\nIf you are working at the University of Amsterdam (UvA) Institute for Biodiversity and Ecosystem Dynamics (IBED) and want to know more about what computational resources are available, please also visit the [computational support teams website](https://ibed.uva.nl/facilities/computational-facilities/ibed-computational-support-team/ibed-computational-support-team.html).\n\nPlease, be aware that this page is a work in progress and will be slowly updated over time. If you want to add additional information or feel that something is missing feel free to send an email to [n.dombrowski\\@uva.nl](mailto:n.dombrowski@uva.nl){.email}.\n\n\n### Getting started with bash\n\n- [A tutorial on using bash and an HPC](https://ndombrowski.github.io/cli_workshop/)\n- [Version control with git](https://github.com/fkariminejadasl/ml-notebooks/blob/main/tutorial/git.md)\n- [A tutorial on using AWK](https://ndombrowski.github.io/AWK_tutorial/), a command line tool for filtering tables, extracting patterns, etc... If you want to follow this tutorial then you can download the required input files from [here](https://github.com/ndombrowski/AWK_tutorial/tree/main/1_Inputfiles)\n\n### Using R\n\n- [An R cookbook](https://ndombrowski.github.io/R_cookbook/) including some [example files](https://github.com/ndombrowski/R_cookbook/tree/main/data) if you want to code along\n- [Tutorial on data manipulation with dplyr](https://ndombrowski.github.io/Tidyverse_tutorial/)\n- [Tutorial on data visualization with ggplot2](https://ndombrowski.github.io/Ggplot_tutorial/)\n\n### Bioinformatic workflows\n\n- [From sequence file to OTU table with Qiime](source/Qiime/3_evelyn_tutorial_notes.qmd)\n- [Analysing an OTU table with R](source/Qiime/OTU_table_analysis.qmd)\n- [Assembling a metagenome](https://ndombrowski.github.io/Assembly_tutorial/)\n- [Metagenomic binning](https://ndombrowski.github.io/Binning_tutorial//)\n- [Annotating microbial genomes](https://github.com/ndombrowski/Annotation_workflow)\n- [How to do generate a species tree](https://ndombrowski.github.io/Phylogeny_tutorial/)\n- [Accessing data from NCBI](source/core_tools/ncbi.qmd)\n\n## Bioinformatic tools A-Z\n\n- [Bowtie 2](source/core_tools/bowtie.qmd): A tool for aligning sequencing reads to genomes and other reference sequences\n- [Chopper](source/nanopore/chopper.qmd): A tool for quality filtering of long read data\n- [FAMA](source/metagenomics/fama_readme.qmd): A fast pipeline for functional and taxonomic analysis of metagenomic sequences\n- [FastP](source/metatranscriptomics/fastp.qmd): A tool for fast all-in-one preprocessing of FastQ files\n- [FastQC](source/metagenomics/fastqc_readme.qmd): A quality control tool for read sequencing data\n- [FeatureCounts](source/core_tools/featurecounts.qmd): A read summarization program that counts mapped reads for genomic features\n- [Interproscan](source/metagenomics/interproscan_readme.qmd): A tool to scan protein and nucleic sequences against InterPro signatures\n- [ITSx](source/ITSx/itsx_readme.qmd): A tool to extract ITS1 and ITS2 subregions from ITS sequences\n- [Kraken2](source/classification/kraken2.qmd): A taxonomic sequence classifier using kmers\n- [METABOLIC](source/metagenomics/metabolic.qmd): A tool to predict functional trait profiles in genome datasets\n- [Minimap2](source/classification/minimap2.qmd): A program to align DNA or mRNA sequences against a reference database\n- [NanoClass2](source/nanopore/nanoclass.qmd): A taxonomic meta-classifier for long-read 16S/18S rRNA gene sequencing data\n- [NanoITS](source/nanopore/nanoITS.qmd): A taxonomic meta-classifier for long-read ITS operon sequencing data\n- [NanoPlot](source/nanopore/nanoplot_readme.qmd): Plotting tool for long read sequencing data\n- [NanoQC](source/nanopore/nanoqc_readme.qmd): A quality control tool for long read sequencing data\n- [Porechop](source/nanopore/porechop_readme.qmd): A tool for finding and removing adapters from Nanopore reads\n- [Prokka](source/core_tools/prokka.qmd): A tool to annotate bacterial, archaeal and viral genomes\n- [Samtools](source/core_tools/samtools.qmd): A tool to manipulating alignments in SAM/BAM format\n- [SeqKit](source/core_tools/seqkit.qmd): A tool for FASTA/Q file manipulation\n- [SortMerNa](source/metatranscriptomics/sortmerna.qmd): A tool to filter ribosomal RNAs in metatranscriptomic data\n- [Trinity](source/metatranscriptomics/trinity.qmd): A tool to assemble transcript sequences from Illumina RNA-Seq data","srcMarkdownNoYaml":"\n\n![](img/banner-min.png)\n\n
\n\nOn this website you can find documentation about software commonly used in bioinformatic data analyses as well as tutorials about various bioinformatic subjects. On this webpage you can find software organized by topic and for each topic you find a list of commonly used software tools.\n\nIf you are working at the University of Amsterdam (UvA) Institute for Biodiversity and Ecosystem Dynamics (IBED) and want to know more about what computational resources are available, please also visit the [computational support teams website](https://ibed.uva.nl/facilities/computational-facilities/ibed-computational-support-team/ibed-computational-support-team.html).\n\nPlease, be aware that this page is a work in progress and will be slowly updated over time. If you want to add additional information or feel that something is missing feel free to send an email to [n.dombrowski\\@uva.nl](mailto:n.dombrowski@uva.nl){.email}.\n\n## Useful tutorials\n\n### Getting started with bash\n\n- [A tutorial on using bash and an HPC](https://ndombrowski.github.io/cli_workshop/)\n- [Version control with git](https://github.com/fkariminejadasl/ml-notebooks/blob/main/tutorial/git.md)\n- [A tutorial on using AWK](https://ndombrowski.github.io/AWK_tutorial/), a command line tool for filtering tables, extracting patterns, etc... If you want to follow this tutorial then you can download the required input files from [here](https://github.com/ndombrowski/AWK_tutorial/tree/main/1_Inputfiles)\n\n### Using R\n\n- [An R cookbook](https://ndombrowski.github.io/R_cookbook/) including some [example files](https://github.com/ndombrowski/R_cookbook/tree/main/data) if you want to code along\n- [Tutorial on data manipulation with dplyr](https://ndombrowski.github.io/Tidyverse_tutorial/)\n- [Tutorial on data visualization with ggplot2](https://ndombrowski.github.io/Ggplot_tutorial/)\n\n### Bioinformatic workflows\n\n- [From sequence file to OTU table with Qiime](source/Qiime/3_evelyn_tutorial_notes.qmd)\n- [Analysing an OTU table with R](source/Qiime/OTU_table_analysis.qmd)\n- [Assembling a metagenome](https://ndombrowski.github.io/Assembly_tutorial/)\n- [Metagenomic binning](https://ndombrowski.github.io/Binning_tutorial//)\n- [Annotating microbial genomes](https://github.com/ndombrowski/Annotation_workflow)\n- [How to do generate a species tree](https://ndombrowski.github.io/Phylogeny_tutorial/)\n- [Accessing data from NCBI](source/core_tools/ncbi.qmd)\n\n## Bioinformatic tools A-Z\n\n- [Bowtie 2](source/core_tools/bowtie.qmd): A tool for aligning sequencing reads to genomes and other reference sequences\n- [Chopper](source/nanopore/chopper.qmd): A tool for quality filtering of long read data\n- [FAMA](source/metagenomics/fama_readme.qmd): A fast pipeline for functional and taxonomic analysis of metagenomic sequences\n- [FastP](source/metatranscriptomics/fastp.qmd): A tool for fast all-in-one preprocessing of FastQ files\n- [FastQC](source/metagenomics/fastqc_readme.qmd): A quality control tool for read sequencing data\n- [FeatureCounts](source/core_tools/featurecounts.qmd): A read summarization program that counts mapped reads for genomic features\n- [Interproscan](source/metagenomics/interproscan_readme.qmd): A tool to scan protein and nucleic sequences against InterPro signatures\n- [ITSx](source/ITSx/itsx_readme.qmd): A tool to extract ITS1 and ITS2 subregions from ITS sequences\n- [Kraken2](source/classification/kraken2.qmd): A taxonomic sequence classifier using kmers\n- [METABOLIC](source/metagenomics/metabolic.qmd): A tool to predict functional trait profiles in genome datasets\n- [Minimap2](source/classification/minimap2.qmd): A program to align DNA or mRNA sequences against a reference database\n- [NanoClass2](source/nanopore/nanoclass.qmd): A taxonomic meta-classifier for long-read 16S/18S rRNA gene sequencing data\n- [NanoITS](source/nanopore/nanoITS.qmd): A taxonomic meta-classifier for long-read ITS operon sequencing data\n- [NanoPlot](source/nanopore/nanoplot_readme.qmd): Plotting tool for long read sequencing data\n- [NanoQC](source/nanopore/nanoqc_readme.qmd): A quality control tool for long read sequencing data\n- [Porechop](source/nanopore/porechop_readme.qmd): A tool for finding and removing adapters from Nanopore reads\n- [Prokka](source/core_tools/prokka.qmd): A tool to annotate bacterial, archaeal and viral genomes\n- [Samtools](source/core_tools/samtools.qmd): A tool to manipulating alignments in SAM/BAM format\n- [SeqKit](source/core_tools/seqkit.qmd): A tool for FASTA/Q file manipulation\n- [SortMerNa](source/metatranscriptomics/sortmerna.qmd): A tool to filter ribosomal RNAs in metatranscriptomic data\n- [Trinity](source/metatranscriptomics/trinity.qmd): A tool to assemble transcript sequences from Illumina RNA-Seq data"},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":null,"freeze":"auto","echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"engine":"markdown"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":"none","code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":true,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","css":["styles.css"],"toc":true,"toc-depth":2,"output-file":"index.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.4.549","date-modified":"last-modified","title-block-style":"none","theme":{"light":"lumen","dark":"cyborg"},"toc-expand":true},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]} \ No newline at end of file diff --git a/.quarto/xref/7fa5e22e b/.quarto/xref/7fa5e22e index c84b977..2ab65e6 100644 --- a/.quarto/xref/7fa5e22e +++ b/.quarto/xref/7fa5e22e @@ -1 +1 @@ -{"entries":[],"headings":["useful-tutorials","getting-started-with-bash","using-r","bioinformatic-workflows","bioinformatic-tools-a-z"]} \ No newline at end of file +{"headings":["useful-tutorials","getting-started-with-bash","using-r","bioinformatic-workflows","bioinformatic-tools-a-z"],"entries":[]} \ No newline at end of file diff --git a/.quarto/xref/INDEX b/.quarto/xref/INDEX index 2487b3b..b966805 100644 --- a/.quarto/xref/INDEX +++ b/.quarto/xref/INDEX @@ -146,5 +146,8 @@ }, "source/core_tools/samtools.qmd": { "samtools.html": "44dcd5a6" + }, + "source/core_tools/featurecounts.qmd": { + "featurecounts.html": "5e0b8c7f" } } \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml index 3be23b2..227c874 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -90,6 +90,7 @@ website: contents: - source/core_tools/bowtie.qmd - source/core_tools/samtools.qmd + - source/core_tools/featurecounts.qmd - section: "Functional annotation" contents: diff --git a/docs/index.html b/docs/index.html index f846254..9781ea1 100644 --- a/docs/index.html +++ b/docs/index.html @@ -285,6 +285,12 @@ Samtools
+ + @@ -504,6 +510,7 @@

Bioinformatic tool
  • FAMA: A fast pipeline for functional and taxonomic analysis of metagenomic sequences
  • FastP: A tool for fast all-in-one preprocessing of FastQ files
  • FastQC: A quality control tool for read sequencing data
  • +
  • FeatureCounts: A read summarization program that counts mapped reads for genomic features
  • Interproscan: A tool to scan protein and nucleic sequences against InterPro signatures
  • ITSx: A tool to extract ITS1 and ITS2 subregions from ITS sequences
  • Kraken2: A taxonomic sequence classifier using kmers
  • diff --git a/docs/search.json b/docs/search.json index 936a624..8d34634 100644 --- a/docs/search.json +++ b/docs/search.json @@ -866,7 +866,7 @@ "href": "index.html#bioinformatic-tools-a-z", "title": "Bioinformatics guidance page", "section": "Bioinformatic tools A-Z", - "text": "Bioinformatic tools A-Z\n\nBowtie 2: A tool for aligning sequencing reads to genomes and other reference sequences\nChopper: A tool for quality filtering of long read data\nFAMA: A fast pipeline for functional and taxonomic analysis of metagenomic sequences\nFastP: A tool for fast all-in-one preprocessing of FastQ files\nFastQC: A quality control tool for read sequencing data\nInterproscan: A tool to scan protein and nucleic sequences against InterPro signatures\nITSx: A tool to extract ITS1 and ITS2 subregions from ITS sequences\nKraken2: A taxonomic sequence classifier using kmers\nMETABOLIC: A tool to predict functional trait profiles in genome datasets\nMinimap2: A program to align DNA or mRNA sequences against a reference database\nNanoClass2: A taxonomic meta-classifier for long-read 16S/18S rRNA gene sequencing data\nNanoITS: A taxonomic meta-classifier for long-read ITS operon sequencing data\nNanoPlot: Plotting tool for long read sequencing data\nNanoQC: A quality control tool for long read sequencing data\nPorechop: A tool for finding and removing adapters from Nanopore reads\nProkka: A tool to annotate bacterial, archaeal and viral genomes\nSamtools: A tool to manipulating alignments in SAM/BAM format\nSeqKit: A tool for FASTA/Q file manipulation\nSortMerNa: A tool to filter ribosomal RNAs in metatranscriptomic data\nTrinity: A tool to assemble transcript sequences from Illumina RNA-Seq data", + "text": "Bioinformatic tools A-Z\n\nBowtie 2: A tool for aligning sequencing reads to genomes and other reference sequences\nChopper: A tool for quality filtering of long read data\nFAMA: A fast pipeline for functional and taxonomic analysis of metagenomic sequences\nFastP: A tool for fast all-in-one preprocessing of FastQ files\nFastQC: A quality control tool for read sequencing data\nFeatureCounts: A read summarization program that counts mapped reads for genomic features\nInterproscan: A tool to scan protein and nucleic sequences against InterPro signatures\nITSx: A tool to extract ITS1 and ITS2 subregions from ITS sequences\nKraken2: A taxonomic sequence classifier using kmers\nMETABOLIC: A tool to predict functional trait profiles in genome datasets\nMinimap2: A program to align DNA or mRNA sequences against a reference database\nNanoClass2: A taxonomic meta-classifier for long-read 16S/18S rRNA gene sequencing data\nNanoITS: A taxonomic meta-classifier for long-read ITS operon sequencing data\nNanoPlot: Plotting tool for long read sequencing data\nNanoQC: A quality control tool for long read sequencing data\nPorechop: A tool for finding and removing adapters from Nanopore reads\nProkka: A tool to annotate bacterial, archaeal and viral genomes\nSamtools: A tool to manipulating alignments in SAM/BAM format\nSeqKit: A tool for FASTA/Q file manipulation\nSortMerNa: A tool to filter ribosomal RNAs in metatranscriptomic data\nTrinity: A tool to assemble transcript sequences from Illumina RNA-Seq data", "crumbs": [ "Welcome page" ] @@ -1729,7 +1729,7 @@ "href": "source/core_tools/samtools.html", "title": "Bioinformatics guidance page", "section": "", - "text": "SAMtools is a toolkit for manipulating alignments in SAM/BAM format, including sorting, merging, indexing and generating alignments in a per-position format (Danecek et al. 2021). The SAM format is a standard format for storing large nucleotide sequence alignments and is generated by many sequence alignment tools such as Bowtie or BWA. The BAM format is the binary form from SAM.\nFor more detailed information, please visit the manual.\n\n\n\nInstalled on crunchomics: Yes, samtools v1.9 is installed.\nIf you want to install it yourself, you can run:\n\nmamba create -n samtools_1.19.2\nmamba install -n samtools_1.19.2 -c bioconda samtools=1.19.2\nmamba activate samtools_1.19.2\n\n\n\n\nThe basic usage of SAMtools is:\nsamtools COMMAND [options]\nThe following commands are available:\n\nview: SAM/BAM and BAM/SAM conversion\nsort: sort alignment file\nmpileup: multi-way pileup\ndepth: compute the depth\nfaidx: index/extract FASTA\ntview: text alignment viewer\nindex: index the alignment\nidxstats: generate BAM index stats\nfixmate: fix mate information\nflagstat: simple stats\ncalmd: recalculate MD/NM tags and = bases\nmerge: merge sorted alignments\nrmdup: remove PCR duplicates\nreheader: replace BAM header\ncat: concatenate BAMs\nbedcov: read depth per BED region\ntargetcut: cut fosmid regions\nphase: phase heterozygotes\nbamshuf: shuffle and group alignments by name\n\nFor detailed description and more information on a specific command, just type:\nsamtools COMMAND\nBelow, you find examples on how to run some of the most common samtools commands. For this, we start with the example of a researcher, who aligned Illumina reads from a sample to a reference genomes and generated a SAM file using Bowtie 2.\n\n\nThe SAM file is a tab-delimited text file that contains information for each individual read and its alignment to the reference.\nThe file begins with an option header. The header describes the source of data, reference sequence, method of alignment,and might look slightly different depending on the aligner being used. Each section begins with character @ followed by a two-letter record type code. These are followed by two-letter tags and values.\nAfterwards, you see the alignment section. Each line corresponds to the alignment information for a single read. Each alignment line has 11 mandatory fields for essential mapping information and a variable number of other fields for aligner-specific information. This looks something like this:\n\nThe individual fields are:\n\nQNAME: Query name or read name, the same read name present in the header of the FASTQ file\nFLAG: numerical value providing information about read mapping and whether the read is part of a pair. To translate these flags into a more meaningful description, go here. In our example:\n\nThe 16 flag means that the short sequence maps on the reverse strand of the reference genom.\nThe 0 flag means that none of the bit-wise flags you see in the link are set. This means that your reads with flag 0 are unpaired (because the first flag, 0x1, is not set), successfully mapped to the reference (because 0x4 is not set) and mapped to the forward strand (because 0x10 is not set)\n\nRNAME: the reference sequence name\nPOS: refers to the 1-based leftmost position of the alignment\nMAPQ: the alignment quality, the scale of which will depend on the aligner being used. The maximum MAPQ value that Bowtie 2 generates is 42. In contrast, the maximum MAPQ value that BWA will generate is 37. The better the score the better the alignment quality\nCIGAR: a sequence of letters and numbers that represent the operations that are required to match the read to the reference. The letters are operations that are used to indicate which bases align to the reference (i.e. match, mismatch, deletion, insertion), and the numbers indicate the associated base lengths. For example 129M1I31M means that the first 129 bases match, then we have 1 insertion followed by 31 matches\nMRNM: the mate reference name\nMPOS: the mate position (1-based, leftmost)\nISIZE: the inferred insert size\nSEQ: the raw sequence\nQUAL: the associated quality values for each position in the read\n\n\n\n\nWhile the SAM alignment file from Bowtie 2 is human readable, we need a BAM alignment file for downstream analysis. A BAM file is a binary equivalent version of the SAM file, i.e. the same file in a compressed format.\nWe can use the samtools view command to convert our SAM file into its binary compressed version (BAM) and save it to file. Once we generate the BAM file, we don’t need to retain the SAM file anymore, we can delete it to save space.\n\nsamtools view -h -S -b \\\n -o SRR6344904_mapped.bam \\\n SRR6344904_mapped.sam\n\nWe can adjust this to ask very specific questions about our data as well. For example, we might ask how many insertion and deletions do we have in our mapped reads. For this, we can use samtools view followed by some extra commands that we add with a pipe to:\n\nExtract only the mapped reads by removing the unmapped reads (-F 4)\nExtract the 6th field with the CIGAR string that contains information about insertions, deletions, etc (cut -f 6)\nExtract only alignments that have insertion or deletions (grep -P '[ID]')\nCount how many alignments have insertion or deletions (wc -l)\n\n\n#count total reads\nSRR6344904_mapped_sorted.bam | wc -l \n\n#count mapped reads with ID events\nsamtools view -F 4 SRR6344904_mapped_sorted.bam | \\\n cut -f 6 | \\\n grep -P '[ID]' | \\\n wc -l}\n\nUsed options:\n\n-h: include header in output\n-S: input is in SAM format\n-b: output BAM format\n-o: /path/to/output/file\n-F 4: exclude reads with the flag 4 (i.e. unmapped reads)\n-f 4: only keep reads with the flag 4 (i.e. unmapped reads)\n\n\n\n\nSorting BAM files is recommended for most down-stream analyses and is done as follows:\n\nsamtools sort \\\n SRR6344904_mapped.bam \\\n -o SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools index command creates a new index file that allows fast look-up of the data in a sorted SAM or BAM file.\n\nsamtools index SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools idxstats command prints stats for the BAM index file but it requires an index to run.\nThe output is TAB delimited with each line consisting of reference sequence name, sequence length, number of mapped reads and number of unmapped reads.\n\nsamtools idxstats SRR6344904_mapped_sorted.bam\n\n\n\n\nSamtools depth computes the read depth at each position or region\n\n#calculate the depth per position\n#header: the name of the contig or chromosome, the position, the number of reads aligned at that position\nsamtools depth SRR6344904_mapped_sorted.bam | head\n\n#calculate the average coverage for all covered regions\nsamtools depth SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\n#calculate the average coverage for all regions\nsamtools depth -a SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\nNotice:\n\nIf you want to calculate the average X coverage for your genome, then you would need to divide by the total size of your genome, instead of dividing by NR in the command above.", + "text": "SAMtools is a toolkit for manipulating alignments in SAM/BAM format, including sorting, merging, indexing and generating alignments in a per-position format (Danecek et al. 2021). The SAM format is a standard format for storing large nucleotide sequence alignments and is generated by many sequence alignment tools such as Bowtie or BWA. The BAM format is the binary form from SAM.\nFor more detailed information, please visit the manual.\n\n\n\nInstalled on crunchomics: Yes, samtools v1.9 is installed.\nIf you want to install it yourself, you can run:\n\nmamba create -n samtools_1.19.2\nmamba install -n samtools_1.19.2 -c bioconda samtools=1.19.2\nmamba activate samtools_1.19.2\n\n\n\n\nThe basic usage of SAMtools is:\nsamtools COMMAND [options]\nThe following commands are available:\n\nview: SAM/BAM and BAM/SAM conversion\nsort: sort alignment file\nmpileup: multi-way pileup\ndepth: compute the depth\nfaidx: index/extract FASTA\ntview: text alignment viewer\nindex: index the alignment\nidxstats: generate BAM index stats\nfixmate: fix mate information\nflagstat: simple stats\ncalmd: recalculate MD/NM tags and = bases\nmerge: merge sorted alignments\nrmdup: remove PCR duplicates\nreheader: replace BAM header\ncat: concatenate BAMs\nbedcov: read depth per BED region\ntargetcut: cut fosmid regions\nphase: phase heterozygotes\nbamshuf: shuffle and group alignments by name\n\nFor detailed description and more information on a specific command, just type:\nsamtools COMMAND\nBelow, you find examples on how to run some of the most common samtools commands. For this, we start with the example of a researcher, who aligned Illumina reads from a sample to a reference genomes and generated a SAM file using Bowtie 2.\n\n\nThe SAM file is a tab-delimited text file that contains information for each individual read and its alignment to the reference.\nThe file begins with an option header. The header describes the source of data, reference sequence, method of alignment,and might look slightly different depending on the aligner being used. Each section begins with character @ followed by a two-letter record type code. These are followed by two-letter tags and values.\nAfterwards, you see the alignment section. Each line corresponds to the alignment information for a single read. Each alignment line has 11 mandatory fields for essential mapping information and a variable number of other fields for aligner-specific information. This looks something like this:\n\nThe individual fields are:\n\nQNAME: Query name or read name, the same read name present in the header of the FASTQ file\nFLAG: numerical value providing information about read mapping and whether the read is part of a pair. To translate these flags into a more meaningful description, go here. In our example:\n\nThe 16 flag means that the short sequence maps on the reverse strand of the reference genom.\nThe 0 flag means that none of the bit-wise flags you see in the link are set. This means that your reads with flag 0 are unpaired (because the first flag, 0x1, is not set), successfully mapped to the reference (because 0x4 is not set) and mapped to the forward strand (because 0x10 is not set)\n\nRNAME: the reference sequence name\nPOS: refers to the 1-based leftmost position of the alignment\nMAPQ: the alignment quality, the scale of which will depend on the aligner being used. The maximum MAPQ value that Bowtie 2 generates is 42. In contrast, the maximum MAPQ value that BWA will generate is 37. The better the score the better the alignment quality\nCIGAR: a sequence of letters and numbers that represent the operations that are required to match the read to the reference. The letters are operations that are used to indicate which bases align to the reference (i.e. match, mismatch, deletion, insertion), and the numbers indicate the associated base lengths. For example 129M1I31M means that the first 129 bases match, then we have 1 insertion followed by 31 matches\nMRNM: the mate reference name\nMPOS: the mate position (1-based, leftmost)\nISIZE: the inferred insert size\nSEQ: the raw sequence\nQUAL: the associated quality values for each position in the read\n\n\n\n\nWhile the SAM alignment file from Bowtie 2 is human readable, we need a BAM alignment file for downstream analysis. A BAM file is a binary equivalent version of the SAM file, i.e. the same file in a compressed format.\nWe can use the samtools view command to convert our SAM file into its binary compressed version (BAM) and save it to file. Once we generate the BAM file, we don’t need to retain the SAM file anymore, we can delete it to save space.\n\nsamtools view -h -S -b \\\n -o SRR6344904_mapped.bam \\\n SRR6344904_mapped.sam\n\nWe can adjust this to ask very specific questions about our data as well. For example, we might ask how many insertion and deletions do we have in our mapped reads. For this, we can use samtools view followed by some extra commands that we add with a pipe to:\n\nExtract only the mapped reads by removing the unmapped reads (-F 4)\nExtract the 6th field with the CIGAR string that contains information about insertions, deletions, etc (cut -f 6)\nExtract only alignments that have insertion or deletions (grep -P '[ID]')\nCount how many alignments have insertion or deletions (wc -l)\n\n\n#count total alignments\nSRR6344904_mapped_sorted.bam | wc -l \n\n#count mapped alignments with ID events\nsamtools view -F 4 SRR6344904_mapped_sorted.bam | \\\n cut -f 6 | \\\n grep -P '[ID]' | \\\n wc -l}\n\nUsed options:\n\n-h: include header in output\n-S: input is in SAM format\n-b: output BAM format\n-o: /path/to/output/file\n-F 4: exclude reads with the flag 4 (i.e. unmapped reads)\n-f 4: only keep reads with the flag 4 (i.e. unmapped reads)\n\n\n\n\nSorting BAM files is recommended for most down-stream analyses and is done as follows:\n\nsamtools sort \\\n SRR6344904_mapped.bam \\\n -o SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools index command creates a new index file that allows fast look-up of the data in a sorted SAM or BAM file.\n\nsamtools index SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools idxstats command prints stats for the BAM index file but it requires an index to run.\nThe output is TAB delimited with each line consisting of reference sequence name, sequence length, number of mapped reads and number of unmapped reads.\n\nsamtools idxstats SRR6344904_mapped_sorted.bam\n\n\n\n\nSamtools depth computes the read depth at each position or region\n\n#calculate the depth per position\n#header: the name of the contig or chromosome, the position, the number of reads aligned at that position\nsamtools depth SRR6344904_mapped_sorted.bam | head\n\n#calculate the average coverage for all covered regions\nsamtools depth SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\n#calculate the average coverage for all regions\nsamtools depth -a SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\nNotice:\n\nIf you want to calculate the average X coverage for your genome, then you would need to divide by the total size of your genome, instead of dividing by NR in the command above.", "crumbs": [ "Sequence data analyses", "Sequence alignment", @@ -1741,11 +1741,35 @@ "href": "source/core_tools/samtools.html#samtools", "title": "Bioinformatics guidance page", "section": "", - "text": "SAMtools is a toolkit for manipulating alignments in SAM/BAM format, including sorting, merging, indexing and generating alignments in a per-position format (Danecek et al. 2021). The SAM format is a standard format for storing large nucleotide sequence alignments and is generated by many sequence alignment tools such as Bowtie or BWA. The BAM format is the binary form from SAM.\nFor more detailed information, please visit the manual.\n\n\n\nInstalled on crunchomics: Yes, samtools v1.9 is installed.\nIf you want to install it yourself, you can run:\n\nmamba create -n samtools_1.19.2\nmamba install -n samtools_1.19.2 -c bioconda samtools=1.19.2\nmamba activate samtools_1.19.2\n\n\n\n\nThe basic usage of SAMtools is:\nsamtools COMMAND [options]\nThe following commands are available:\n\nview: SAM/BAM and BAM/SAM conversion\nsort: sort alignment file\nmpileup: multi-way pileup\ndepth: compute the depth\nfaidx: index/extract FASTA\ntview: text alignment viewer\nindex: index the alignment\nidxstats: generate BAM index stats\nfixmate: fix mate information\nflagstat: simple stats\ncalmd: recalculate MD/NM tags and = bases\nmerge: merge sorted alignments\nrmdup: remove PCR duplicates\nreheader: replace BAM header\ncat: concatenate BAMs\nbedcov: read depth per BED region\ntargetcut: cut fosmid regions\nphase: phase heterozygotes\nbamshuf: shuffle and group alignments by name\n\nFor detailed description and more information on a specific command, just type:\nsamtools COMMAND\nBelow, you find examples on how to run some of the most common samtools commands. For this, we start with the example of a researcher, who aligned Illumina reads from a sample to a reference genomes and generated a SAM file using Bowtie 2.\n\n\nThe SAM file is a tab-delimited text file that contains information for each individual read and its alignment to the reference.\nThe file begins with an option header. The header describes the source of data, reference sequence, method of alignment,and might look slightly different depending on the aligner being used. Each section begins with character @ followed by a two-letter record type code. These are followed by two-letter tags and values.\nAfterwards, you see the alignment section. Each line corresponds to the alignment information for a single read. Each alignment line has 11 mandatory fields for essential mapping information and a variable number of other fields for aligner-specific information. This looks something like this:\n\nThe individual fields are:\n\nQNAME: Query name or read name, the same read name present in the header of the FASTQ file\nFLAG: numerical value providing information about read mapping and whether the read is part of a pair. To translate these flags into a more meaningful description, go here. In our example:\n\nThe 16 flag means that the short sequence maps on the reverse strand of the reference genom.\nThe 0 flag means that none of the bit-wise flags you see in the link are set. This means that your reads with flag 0 are unpaired (because the first flag, 0x1, is not set), successfully mapped to the reference (because 0x4 is not set) and mapped to the forward strand (because 0x10 is not set)\n\nRNAME: the reference sequence name\nPOS: refers to the 1-based leftmost position of the alignment\nMAPQ: the alignment quality, the scale of which will depend on the aligner being used. The maximum MAPQ value that Bowtie 2 generates is 42. In contrast, the maximum MAPQ value that BWA will generate is 37. The better the score the better the alignment quality\nCIGAR: a sequence of letters and numbers that represent the operations that are required to match the read to the reference. The letters are operations that are used to indicate which bases align to the reference (i.e. match, mismatch, deletion, insertion), and the numbers indicate the associated base lengths. For example 129M1I31M means that the first 129 bases match, then we have 1 insertion followed by 31 matches\nMRNM: the mate reference name\nMPOS: the mate position (1-based, leftmost)\nISIZE: the inferred insert size\nSEQ: the raw sequence\nQUAL: the associated quality values for each position in the read\n\n\n\n\nWhile the SAM alignment file from Bowtie 2 is human readable, we need a BAM alignment file for downstream analysis. A BAM file is a binary equivalent version of the SAM file, i.e. the same file in a compressed format.\nWe can use the samtools view command to convert our SAM file into its binary compressed version (BAM) and save it to file. Once we generate the BAM file, we don’t need to retain the SAM file anymore, we can delete it to save space.\n\nsamtools view -h -S -b \\\n -o SRR6344904_mapped.bam \\\n SRR6344904_mapped.sam\n\nWe can adjust this to ask very specific questions about our data as well. For example, we might ask how many insertion and deletions do we have in our mapped reads. For this, we can use samtools view followed by some extra commands that we add with a pipe to:\n\nExtract only the mapped reads by removing the unmapped reads (-F 4)\nExtract the 6th field with the CIGAR string that contains information about insertions, deletions, etc (cut -f 6)\nExtract only alignments that have insertion or deletions (grep -P '[ID]')\nCount how many alignments have insertion or deletions (wc -l)\n\n\n#count total reads\nSRR6344904_mapped_sorted.bam | wc -l \n\n#count mapped reads with ID events\nsamtools view -F 4 SRR6344904_mapped_sorted.bam | \\\n cut -f 6 | \\\n grep -P '[ID]' | \\\n wc -l}\n\nUsed options:\n\n-h: include header in output\n-S: input is in SAM format\n-b: output BAM format\n-o: /path/to/output/file\n-F 4: exclude reads with the flag 4 (i.e. unmapped reads)\n-f 4: only keep reads with the flag 4 (i.e. unmapped reads)\n\n\n\n\nSorting BAM files is recommended for most down-stream analyses and is done as follows:\n\nsamtools sort \\\n SRR6344904_mapped.bam \\\n -o SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools index command creates a new index file that allows fast look-up of the data in a sorted SAM or BAM file.\n\nsamtools index SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools idxstats command prints stats for the BAM index file but it requires an index to run.\nThe output is TAB delimited with each line consisting of reference sequence name, sequence length, number of mapped reads and number of unmapped reads.\n\nsamtools idxstats SRR6344904_mapped_sorted.bam\n\n\n\n\nSamtools depth computes the read depth at each position or region\n\n#calculate the depth per position\n#header: the name of the contig or chromosome, the position, the number of reads aligned at that position\nsamtools depth SRR6344904_mapped_sorted.bam | head\n\n#calculate the average coverage for all covered regions\nsamtools depth SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\n#calculate the average coverage for all regions\nsamtools depth -a SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\nNotice:\n\nIf you want to calculate the average X coverage for your genome, then you would need to divide by the total size of your genome, instead of dividing by NR in the command above.", + "text": "SAMtools is a toolkit for manipulating alignments in SAM/BAM format, including sorting, merging, indexing and generating alignments in a per-position format (Danecek et al. 2021). The SAM format is a standard format for storing large nucleotide sequence alignments and is generated by many sequence alignment tools such as Bowtie or BWA. The BAM format is the binary form from SAM.\nFor more detailed information, please visit the manual.\n\n\n\nInstalled on crunchomics: Yes, samtools v1.9 is installed.\nIf you want to install it yourself, you can run:\n\nmamba create -n samtools_1.19.2\nmamba install -n samtools_1.19.2 -c bioconda samtools=1.19.2\nmamba activate samtools_1.19.2\n\n\n\n\nThe basic usage of SAMtools is:\nsamtools COMMAND [options]\nThe following commands are available:\n\nview: SAM/BAM and BAM/SAM conversion\nsort: sort alignment file\nmpileup: multi-way pileup\ndepth: compute the depth\nfaidx: index/extract FASTA\ntview: text alignment viewer\nindex: index the alignment\nidxstats: generate BAM index stats\nfixmate: fix mate information\nflagstat: simple stats\ncalmd: recalculate MD/NM tags and = bases\nmerge: merge sorted alignments\nrmdup: remove PCR duplicates\nreheader: replace BAM header\ncat: concatenate BAMs\nbedcov: read depth per BED region\ntargetcut: cut fosmid regions\nphase: phase heterozygotes\nbamshuf: shuffle and group alignments by name\n\nFor detailed description and more information on a specific command, just type:\nsamtools COMMAND\nBelow, you find examples on how to run some of the most common samtools commands. For this, we start with the example of a researcher, who aligned Illumina reads from a sample to a reference genomes and generated a SAM file using Bowtie 2.\n\n\nThe SAM file is a tab-delimited text file that contains information for each individual read and its alignment to the reference.\nThe file begins with an option header. The header describes the source of data, reference sequence, method of alignment,and might look slightly different depending on the aligner being used. Each section begins with character @ followed by a two-letter record type code. These are followed by two-letter tags and values.\nAfterwards, you see the alignment section. Each line corresponds to the alignment information for a single read. Each alignment line has 11 mandatory fields for essential mapping information and a variable number of other fields for aligner-specific information. This looks something like this:\n\nThe individual fields are:\n\nQNAME: Query name or read name, the same read name present in the header of the FASTQ file\nFLAG: numerical value providing information about read mapping and whether the read is part of a pair. To translate these flags into a more meaningful description, go here. In our example:\n\nThe 16 flag means that the short sequence maps on the reverse strand of the reference genom.\nThe 0 flag means that none of the bit-wise flags you see in the link are set. This means that your reads with flag 0 are unpaired (because the first flag, 0x1, is not set), successfully mapped to the reference (because 0x4 is not set) and mapped to the forward strand (because 0x10 is not set)\n\nRNAME: the reference sequence name\nPOS: refers to the 1-based leftmost position of the alignment\nMAPQ: the alignment quality, the scale of which will depend on the aligner being used. The maximum MAPQ value that Bowtie 2 generates is 42. In contrast, the maximum MAPQ value that BWA will generate is 37. The better the score the better the alignment quality\nCIGAR: a sequence of letters and numbers that represent the operations that are required to match the read to the reference. The letters are operations that are used to indicate which bases align to the reference (i.e. match, mismatch, deletion, insertion), and the numbers indicate the associated base lengths. For example 129M1I31M means that the first 129 bases match, then we have 1 insertion followed by 31 matches\nMRNM: the mate reference name\nMPOS: the mate position (1-based, leftmost)\nISIZE: the inferred insert size\nSEQ: the raw sequence\nQUAL: the associated quality values for each position in the read\n\n\n\n\nWhile the SAM alignment file from Bowtie 2 is human readable, we need a BAM alignment file for downstream analysis. A BAM file is a binary equivalent version of the SAM file, i.e. the same file in a compressed format.\nWe can use the samtools view command to convert our SAM file into its binary compressed version (BAM) and save it to file. Once we generate the BAM file, we don’t need to retain the SAM file anymore, we can delete it to save space.\n\nsamtools view -h -S -b \\\n -o SRR6344904_mapped.bam \\\n SRR6344904_mapped.sam\n\nWe can adjust this to ask very specific questions about our data as well. For example, we might ask how many insertion and deletions do we have in our mapped reads. For this, we can use samtools view followed by some extra commands that we add with a pipe to:\n\nExtract only the mapped reads by removing the unmapped reads (-F 4)\nExtract the 6th field with the CIGAR string that contains information about insertions, deletions, etc (cut -f 6)\nExtract only alignments that have insertion or deletions (grep -P '[ID]')\nCount how many alignments have insertion or deletions (wc -l)\n\n\n#count total alignments\nSRR6344904_mapped_sorted.bam | wc -l \n\n#count mapped alignments with ID events\nsamtools view -F 4 SRR6344904_mapped_sorted.bam | \\\n cut -f 6 | \\\n grep -P '[ID]' | \\\n wc -l}\n\nUsed options:\n\n-h: include header in output\n-S: input is in SAM format\n-b: output BAM format\n-o: /path/to/output/file\n-F 4: exclude reads with the flag 4 (i.e. unmapped reads)\n-f 4: only keep reads with the flag 4 (i.e. unmapped reads)\n\n\n\n\nSorting BAM files is recommended for most down-stream analyses and is done as follows:\n\nsamtools sort \\\n SRR6344904_mapped.bam \\\n -o SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools index command creates a new index file that allows fast look-up of the data in a sorted SAM or BAM file.\n\nsamtools index SRR6344904_mapped_sorted.bam\n\n\n\n\nThe samtools idxstats command prints stats for the BAM index file but it requires an index to run.\nThe output is TAB delimited with each line consisting of reference sequence name, sequence length, number of mapped reads and number of unmapped reads.\n\nsamtools idxstats SRR6344904_mapped_sorted.bam\n\n\n\n\nSamtools depth computes the read depth at each position or region\n\n#calculate the depth per position\n#header: the name of the contig or chromosome, the position, the number of reads aligned at that position\nsamtools depth SRR6344904_mapped_sorted.bam | head\n\n#calculate the average coverage for all covered regions\nsamtools depth SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\n#calculate the average coverage for all regions\nsamtools depth -a SRR6344904_mapped_sorted.bam | awk '{sum+=$3} END { print \"Average = \",sum/NR}'\n\nNotice:\n\nIf you want to calculate the average X coverage for your genome, then you would need to divide by the total size of your genome, instead of dividing by NR in the command above.", "crumbs": [ "Sequence data analyses", "Sequence alignment", "Samtools" ] + }, + { + "objectID": "source/core_tools/featurecounts.html", + "href": "source/core_tools/featurecounts.html", + "title": "Bioinformatics guidance page", + "section": "", + "text": "FeatureCounts is part of the Subread software package, a tool kit for processing next-gen sequencing data (Liao, Smyth, and Shi 2013). It includes Subread aligner, Subjunc exon-exon junction detector and featureCounts read summarization program.\nFeatureCounts is a program that counts how many reads map to features, such as genes, exon, promoter and genomic bins. Therefore, it is useful to use after you, for example, aligned sequences (from a genome, metagenome, transcriptome) to reference sequences and want to generate a count table.\nA detailed documentation can be downloaded from here.\n\n\n\nInstalled on crunchomics: No\nIf you want to install it yourself, you can run:\n\nmamba create -n subread_2.0.6\nmamba install -n subread_2.0.6 -c bioconda subread=2.0.6\nmamba activate subread_2.0.6\n\n\n\n\nFeatureCounts takes as input a annotation file in gtf or gff format and a sorted bam file.\nIt outputs a text file with the counts for each feature (in our example CDS) per sample. Notice, how you can use a wildcard to generate a counts table for multiple bam files at the same time.\n\nfeatureCounts -T 5 -t CDS -g gene_id -M \\\n -a data/genome/genomic.gtf \\\n -o results/featurecounts/ncbi_gtf/counts.txt \\\n results/bowtie/*_mapped_sorted.bam\n\nUseful options:\n\n-a Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in ‘annotation’ directory of the package. Gzipped file is also accepted.\n-o Name of output file including read counts. A separate file including summary statistics of counting results is also included in the output (‘.summary’). Both files are in tab delimited format.\n-t Specify feature type(s) in a GTF annotation. If multiple types are provided, they should be separated by ‘,’ with no space in between. ‘exon’ by default. Rows in the annotation with a matched feature will be extracted and used for read mapping.\n-g Specify attribute type in GTF annotation. ‘gene_id’ by default. Meta-features used for read counting will be extracted from annotation using the provided value.\n-M Multi-mapping reads will also be counted. For a multi- mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM/SAM input is used to detect multi-mapping reads.\n-L Count long reads such as Nanopore and PacBio reads. Long read counting can only run in one thread and only reads (not read-pairs) can be counted. There is no limitation on the number of ‘M’ operations allowed in a CIGAR string in long read counting.\n--maxMOp Maximum number of ‘M’ operations allowed in a CIGAR string. 10 by default. Both ‘X’ and ‘=’ are treated as ‘M’ and adjacent ‘M’ operations are merged in the CIGAR string.\n-p If specified, libraries are assumed to contain paired-end reads. For any library that contains paired-end reads, the ‘countReadPairs’ parameter controls if read pairs or reads should be counted.\n-s Perform strand-specific read counting. A single integer value (applied to all input files) or a string of comma- separated values (applied to each corresponding input file) should be provided. Possible values include: 0 (unstranded), 1 (stranded) and 2 (reversely stranded). Default value is 0 (ie. unstranded read counting carried out for all input files).", + "crumbs": [ + "Sequence data analyses", + "Sequence alignment", + "FeatureCounts" + ] + }, + { + "objectID": "source/core_tools/featurecounts.html#featurecounts", + "href": "source/core_tools/featurecounts.html#featurecounts", + "title": "Bioinformatics guidance page", + "section": "", + "text": "FeatureCounts is part of the Subread software package, a tool kit for processing next-gen sequencing data (Liao, Smyth, and Shi 2013). It includes Subread aligner, Subjunc exon-exon junction detector and featureCounts read summarization program.\nFeatureCounts is a program that counts how many reads map to features, such as genes, exon, promoter and genomic bins. Therefore, it is useful to use after you, for example, aligned sequences (from a genome, metagenome, transcriptome) to reference sequences and want to generate a count table.\nA detailed documentation can be downloaded from here.\n\n\n\nInstalled on crunchomics: No\nIf you want to install it yourself, you can run:\n\nmamba create -n subread_2.0.6\nmamba install -n subread_2.0.6 -c bioconda subread=2.0.6\nmamba activate subread_2.0.6\n\n\n\n\nFeatureCounts takes as input a annotation file in gtf or gff format and a sorted bam file.\nIt outputs a text file with the counts for each feature (in our example CDS) per sample. Notice, how you can use a wildcard to generate a counts table for multiple bam files at the same time.\n\nfeatureCounts -T 5 -t CDS -g gene_id -M \\\n -a data/genome/genomic.gtf \\\n -o results/featurecounts/ncbi_gtf/counts.txt \\\n results/bowtie/*_mapped_sorted.bam\n\nUseful options:\n\n-a Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in ‘annotation’ directory of the package. Gzipped file is also accepted.\n-o Name of output file including read counts. A separate file including summary statistics of counting results is also included in the output (‘.summary’). Both files are in tab delimited format.\n-t Specify feature type(s) in a GTF annotation. If multiple types are provided, they should be separated by ‘,’ with no space in between. ‘exon’ by default. Rows in the annotation with a matched feature will be extracted and used for read mapping.\n-g Specify attribute type in GTF annotation. ‘gene_id’ by default. Meta-features used for read counting will be extracted from annotation using the provided value.\n-M Multi-mapping reads will also be counted. For a multi- mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM/SAM input is used to detect multi-mapping reads.\n-L Count long reads such as Nanopore and PacBio reads. Long read counting can only run in one thread and only reads (not read-pairs) can be counted. There is no limitation on the number of ‘M’ operations allowed in a CIGAR string in long read counting.\n--maxMOp Maximum number of ‘M’ operations allowed in a CIGAR string. 10 by default. Both ‘X’ and ‘=’ are treated as ‘M’ and adjacent ‘M’ operations are merged in the CIGAR string.\n-p If specified, libraries are assumed to contain paired-end reads. For any library that contains paired-end reads, the ‘countReadPairs’ parameter controls if read pairs or reads should be counted.\n-s Perform strand-specific read counting. A single integer value (applied to all input files) or a string of comma- separated values (applied to each corresponding input file) should be provided. Possible values include: 0 (unstranded), 1 (stranded) and 2 (reversely stranded). Default value is 0 (ie. unstranded read counting carried out for all input files).", + "crumbs": [ + "Sequence data analyses", + "Sequence alignment", + "FeatureCounts" + ] } ] \ No newline at end of file diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 4d0ca67..52f0479 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -74,7 +74,7 @@ https://scienceparkstudygroup.github.io/software_information/index.html - 2024-02-07T09:33:30.693Z + 2024-02-07T09:58:04.659Z https://scienceparkstudygroup.github.io/software_information/source/ITSx/itsx_readme.html @@ -154,6 +154,10 @@ https://scienceparkstudygroup.github.io/software_information/source/core_tools/samtools.html - 2024-02-07T09:40:31.306Z + 2024-02-07T09:42:32.778Z + + + https://scienceparkstudygroup.github.io/software_information/source/core_tools/featurecounts.html + 2024-02-07T09:58:00.468Z diff --git a/docs/source/core_tools/featurecounts.html b/docs/source/core_tools/featurecounts.html new file mode 100644 index 0000000..4e1bf98 --- /dev/null +++ b/docs/source/core_tools/featurecounts.html @@ -0,0 +1,1160 @@ + + + + + + + + + +Bioinformatics guidance page + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + +
    + +
    + + +
    + + + +
    + +
    + + +
    + + +
    + +
    +

    FeatureCounts

    +
    +

    Introduction

    +

    FeatureCounts is part of the Subread software package, a tool kit for processing next-gen sequencing data (Liao, Smyth, and Shi 2013). It includes Subread aligner, Subjunc exon-exon junction detector and featureCounts read summarization program.

    +

    FeatureCounts is a program that counts how many reads map to features, such as genes, exon, promoter and genomic bins. Therefore, it is useful to use after you, for example, aligned sequences (from a genome, metagenome, transcriptome) to reference sequences and want to generate a count table.

    +

    A detailed documentation can be downloaded from here.

    +
    +
    +

    Installation

    +

    Installed on crunchomics: No

    +

    If you want to install it yourself, you can run:

    +
    +
    mamba create -n subread_2.0.6
    +mamba install -n subread_2.0.6 -c bioconda subread=2.0.6
    +mamba activate subread_2.0.6
    +
    +
    +
    +

    Usage

    +

    FeatureCounts takes as input a annotation file in gtf or gff format and a sorted bam file.

    +

    It outputs a text file with the counts for each feature (in our example CDS) per sample. Notice, how you can use a wildcard to generate a counts table for multiple bam files at the same time.

    +
    +
    featureCounts -T 5 -t CDS -g gene_id -M \
    +    -a data/genome/genomic.gtf \
    +    -o  results/featurecounts/ncbi_gtf/counts.txt \
    +    results/bowtie/*_mapped_sorted.bam
    +
    +

    Useful options:

    +
      +
    • -a Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in ‘annotation’ directory of the package. Gzipped file is also accepted.
    • +
    • -o Name of output file including read counts. A separate file including summary statistics of counting results is also included in the output (‘.summary’). Both files are in tab delimited format.
    • +
    • -t Specify feature type(s) in a GTF annotation. If multiple types are provided, they should be separated by ‘,’ with no space in between. ‘exon’ by default. Rows in the annotation with a matched feature will be extracted and used for read mapping.
    • +
    • -g Specify attribute type in GTF annotation. ‘gene_id’ by default. Meta-features used for read counting will be extracted from annotation using the provided value.
    • +
    • -M Multi-mapping reads will also be counted. For a multi- mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM/SAM input is used to detect multi-mapping reads.
    • +
    • -L Count long reads such as Nanopore and PacBio reads. Long read counting can only run in one thread and only reads (not read-pairs) can be counted. There is no limitation on the number of ‘M’ operations allowed in a CIGAR string in long read counting.
    • +
    • --maxMOp Maximum number of ‘M’ operations allowed in a CIGAR string. 10 by default. Both ‘X’ and ‘=’ are treated as ‘M’ and adjacent ‘M’ operations are merged in the CIGAR string.
    • +
    • -p If specified, libraries are assumed to contain paired-end reads. For any library that contains paired-end reads, the ‘countReadPairs’ parameter controls if read pairs or reads should be counted.
    • +
    • -s Perform strand-specific read counting. A single integer value (applied to all input files) or a string of comma- separated values (applied to each corresponding input file) should be provided. Possible values include: 0 (unstranded), 1 (stranded) and 2 (reversely stranded). Default value is 0 (ie. unstranded read counting carried out for all input files).
    • +
    + + + +
    +
    + +

    References

    +
    +Liao, Yang, Gordon K. Smyth, and Wei Shi. 2013. “featureCounts: An Efficient General Purpose Program for Assigning Sequence Reads to Genomic Features.” Bioinformatics 30 (7): 923–30. https://doi.org/10.1093/bioinformatics/btt656. +
    +
    + + +
    + + + + + + \ No newline at end of file diff --git a/docs/source/core_tools/samtools.html b/docs/source/core_tools/samtools.html index 09171b5..8b711f3 100644 --- a/docs/source/core_tools/samtools.html +++ b/docs/source/core_tools/samtools.html @@ -616,10 +616,10 @@

    Samtools View

  • Count how many alignments have insertion or deletions (wc -l)
  • -
    #count total reads
    +
    #count total alignments
     SRR6344904_mapped_sorted.bam | wc -l  
     
    -#count mapped reads with ID events
    +#count mapped alignments with ID events
     samtools view -F 4 SRR6344904_mapped_sorted.bam | \
          cut -f 6 | \
          grep -P '[ID]' | \
    diff --git a/index.qmd b/index.qmd
    index c5ff92f..ea3b035 100644
    --- a/index.qmd
    +++ b/index.qmd
    @@ -43,6 +43,7 @@ Please, be aware that this page is a work in progress and will be slowly updated
     -   [FAMA](source/metagenomics/fama_readme.qmd): A fast pipeline for functional and taxonomic analysis of metagenomic sequences
     -   [FastP](source/metatranscriptomics/fastp.qmd): A tool for fast all-in-one preprocessing of FastQ files
     -   [FastQC](source/metagenomics/fastqc_readme.qmd): A quality control tool for read sequencing data
    +-   [FeatureCounts](source/core_tools/featurecounts.qmd): A read summarization program that counts mapped reads for genomic features
     -   [Interproscan](source/metagenomics/interproscan_readme.qmd): A tool to scan protein and nucleic sequences against InterPro signatures
     -   [ITSx](source/ITSx/itsx_readme.qmd): A tool to extract ITS1 and ITS2 subregions from ITS sequences
     -   [Kraken2](source/classification/kraken2.qmd): A taxonomic sequence classifier using kmers
    diff --git a/source/core_tools/featurecounts.qmd b/source/core_tools/featurecounts.qmd
    new file mode 100644
    index 0000000..710bbda
    --- /dev/null
    +++ b/source/core_tools/featurecounts.qmd
    @@ -0,0 +1,57 @@
    +---
    +code-block-bg: true
    +code-block-border-left: "#31BAE9"
    +execute:
    +  eval: false
    +engine: knitr
    +bibliography: references.bib
    +---
    +
    +
    + +## FeatureCounts + +### Introduction + +FeatureCounts is part of the Subread software package, a tool kit for processing next-gen sequencing data [@Liao2014]. It includes Subread aligner, Subjunc exon-exon junction detector and featureCounts read summarization program. + +FeatureCounts is a program that counts how many reads map to features, such as genes, exon, promoter and genomic bins. Therefore, it is useful to use after you, for example, aligned sequences (from a genome, metagenome, transcriptome) to reference sequences and want to generate a count table. + +A detailed documentation can be downloaded from [here](https://subread.sourceforge.net/featureCounts.html). + +### Installation + +Installed on crunchomics: No + +If you want to install it yourself, you can run: + +```{bash} +mamba create -n subread_2.0.6 +mamba install -n subread_2.0.6 -c bioconda subread=2.0.6 +mamba activate subread_2.0.6 +``` + +### Usage + +FeatureCounts takes as input a annotation file in gtf or gff format and a sorted bam file. + +It outputs a text file with the counts for each feature (in our example CDS) per sample. Notice, how you can use a wildcard to generate a counts table for multiple bam files at the same time. + +```{bash} +featureCounts -T 5 -t CDS -g gene_id -M \ + -a data/genome/genomic.gtf \ + -o results/featurecounts/ncbi_gtf/counts.txt \ + results/bowtie/*_mapped_sorted.bam +``` + +Useful options: + +- `-a` Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in 'annotation' directory of the package. Gzipped file is also accepted. +- `-o` Name of output file including read counts. A separate file including summary statistics of counting results is also included in the output ('.summary'). Both files are in tab delimited format. +- `-t` Specify feature type(s) in a GTF annotation. If multiple types are provided, they should be separated by ',' with no space in between. 'exon' by default. Rows in the annotation with a matched feature will be extracted and used for read mapping. +- `-g` Specify attribute type in GTF annotation. 'gene_id' by default. Meta-features used for read counting will be extracted from annotation using the provided value. +- `-M` Multi-mapping reads will also be counted. For a multi- mapping read, all its reported alignments will be counted. The 'NH' tag in BAM/SAM input is used to detect multi-mapping reads. +- `-L` Count long reads such as Nanopore and PacBio reads. Long read counting can only run in one thread and only reads (not read-pairs) can be counted. There is no limitation on the number of 'M' operations allowed in a CIGAR string in long read counting. +- `--maxMOp` Maximum number of 'M' operations allowed in a CIGAR string. 10 by default. Both 'X' and '=' are treated as 'M' and adjacent 'M' operations are merged in the CIGAR string. +- `-p` If specified, libraries are assumed to contain paired-end reads. For any library that contains paired-end reads, the 'countReadPairs' parameter controls if read pairs or reads should be counted. +- `-s` Perform strand-specific read counting. A single integer value (applied to all input files) or a string of comma- separated values (applied to each corresponding input file) should be provided. Possible values include: 0 (unstranded), 1 (stranded) and 2 (reversely stranded). Default value is 0 (ie. unstranded read counting carried out for all input files). \ No newline at end of file diff --git a/source/core_tools/references.bib b/source/core_tools/references.bib index ccaa0b7..e172b5f 100644 --- a/source/core_tools/references.bib +++ b/source/core_tools/references.bib @@ -58,3 +58,18 @@ @article{Danecek2021 url = {http://dx.doi.org/10.1093/gigascience/giab008}, langid = {en} } + +@article{Liao2014, + title = {featureCounts: an efficient general purpose program for assigning sequence reads to genomic features}, + author = {Liao, Yang and Smyth, Gordon K. and Shi, Wei}, + year = {2013}, + month = {11}, + date = {2013-11-13}, + journal = {Bioinformatics}, + pages = {923--930}, + volume = {30}, + number = {7}, + doi = {10.1093/bioinformatics/btt656}, + url = {http://dx.doi.org/10.1093/bioinformatics/btt656}, + langid = {en} +} diff --git a/source/core_tools/samtools.qmd b/source/core_tools/samtools.qmd index 23e0526..0fe95f5 100644 --- a/source/core_tools/samtools.qmd +++ b/source/core_tools/samtools.qmd @@ -109,10 +109,10 @@ We can adjust this to ask very specific questions about our data as well. For ex 4. Count how many alignments have insertion or deletions (`wc -l`) ```{bash} -#count total reads +#count total alignments SRR6344904_mapped_sorted.bam | wc -l -#count mapped reads with ID events +#count mapped alignments with ID events samtools view -F 4 SRR6344904_mapped_sorted.bam | \ cut -f 6 | \ grep -P '[ID]' | \