diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 0000000..1bfa800 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,13 @@ +## Pull Request for PGEforge + +Thank you for your contribution to PGEforge!✨ + +### Description of changes +Please include a summary of the changes and the related issue, if applicable. Please also include relevant motivation and context. List any dependencies that are required for this change. + +### Pull Request Checklist + +- [ ] I have provided a description and detailed information about the changes made +- [ ] I have followed the [PGEforge GitHub contribution guidelines](http://mrc-ide.github.io/PGEforge/website_docs/how_to_contribute.html#contributing-guidelines-for-github) and I am merging into the `develop` branch +- [ ] Documentation is updated, if necessary +- [ ] Relevant issues are linked, if applicable \ No newline at end of file diff --git a/.github/workflows/check-quarto-render.yml b/.github/workflows/check-quarto-render.yml new file mode 100644 index 0000000..f52f427 --- /dev/null +++ b/.github/workflows/check-quarto-render.yml @@ -0,0 +1,28 @@ +on: + push: + branches: + - develop + - main + - feature/website + pull_request: + branches: + - develop + - main + +name: check-quarto-render + +jobs: + check-quarto-render: + runs-on: ubuntu-latest + + steps: + - name: Check out repository + uses: actions/checkout@v4 + + - name: Set up quarto + uses: quarto-dev/quarto-actions/setup@v2 + + - name: Render quarto website + uses: quarto-dev/quarto-actions/render@v2 + with: + path: _site \ No newline at end of file diff --git a/.github/workflows/publish-website.yml b/.github/workflows/publish-website.yml new file mode 100644 index 0000000..6b93702 --- /dev/null +++ b/.github/workflows/publish-website.yml @@ -0,0 +1,33 @@ +on: + push: + branches: + - main + pull_request: + branches: + - main + +name: publish-quarto-website + +permissions: + contents: write + pages: write + +jobs: + build-deploy: + runs-on: ubuntu-latest + + steps: + - name: Check out repository + uses: actions/checkout@v4 + + - name: Set up quarto + uses: quarto-dev/quarto-actions/setup@v2 + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + + - name: Publish to GitHub Pages (and render) + uses: quarto-dev/quarto-actions/publish@v2 + with: + target: gh-pages + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} \ No newline at end of file diff --git a/.gitignore b/.gitignore index 114759b..9515857 100644 --- a/.gitignore +++ b/.gitignore @@ -7,5 +7,5 @@ ignore/ .DS_Store /_site/ -_freeze/ *.html +README_files diff --git a/README.md b/README.md index 62db4c4..5e6716d 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,10 @@ -# PGEforge -A place for holding tutorials related to common software tools in Plasmodium genomic epidemiology. +# PGEforge +PGEforge is a community-driven platform designed to simplify *Plasmodium* genomic data analysis. The process of analyzing genomic data often involves various software tools with different formats, user interfaces, and levels of accessibility. These differences can create barriers, especially for those without strong computational skills. + +PGEforge aims to overcome these challenges by providing clear tutorials, streamlined workflows, and comprehensive resources to help researchers at all levels understand and analyze genomic data. By prioritizing the end-user, we strive to encourage better software development and foster an inclusive research community. Our goal is to make advanced genomic analysis accessible to everyone, promoting collaboration and accelerating progress in malaria research. + +## The PGEforge platform +All of the resources for enhancing **P**lasmodium **G**enomic **E**pidemiology analysis are found on our open-source [website](https://mrc-ide.github.io/PGEforge/). + +## How to contribute +We welcome input from all areas of the research community to continuously improve and expand our platform. Please read details on how to contribute to this community-driven resource on our [website](https://mrc-ide.github.io/PGEforge/website_docs/how_to_contribute.html). You can also [open a new issue](https://github.com/mrc-ide/PGEforge/issues). \ No newline at end of file diff --git a/_freeze/site_libs/dt-ext-fixedcolumns-1.13.6/css/fixedColumns.dataTables.min.css b/_freeze/site_libs/dt-ext-fixedcolumns-1.13.6/css/fixedColumns.dataTables.min.css new file mode 100644 index 0000000..ce934fd --- /dev/null +++ b/_freeze/site_libs/dt-ext-fixedcolumns-1.13.6/css/fixedColumns.dataTables.min.css @@ -0,0 +1 @@ +table.dataTable thead tr>.dtfc-fixed-left,table.dataTable thead tr>.dtfc-fixed-right,table.dataTable tfoot tr>.dtfc-fixed-left,table.dataTable tfoot tr>.dtfc-fixed-right{top:0;bottom:0;z-index:3;background-color:white}table.dataTable tbody tr>.dtfc-fixed-left,table.dataTable tbody tr>.dtfc-fixed-right{z-index:1;background-color:white}div.dtfc-left-top-blocker,div.dtfc-right-top-blocker{background-color:white}html.dark table.dataTable thead tr>.dtfc-fixed-left,html.dark table.dataTable thead tr>.dtfc-fixed-right,html.dark table.dataTable tfoot tr>.dtfc-fixed-left,html.dark table.dataTable tfoot tr>.dtfc-fixed-right{background-color:var(--dt-html-background)}html.dark table.dataTable tbody tr>.dtfc-fixed-left,html.dark table.dataTable tbody tr>.dtfc-fixed-right{background-color:var(--dt-html-background)}html.dark div.dtfc-left-top-blocker,html.dark div.dtfc-right-top-blocker{background-color:var(--dt-html-background)} diff --git a/_freeze/site_libs/dt-ext-fixedcolumns-1.13.6/js/dataTables.fixedColumns.min.js b/_freeze/site_libs/dt-ext-fixedcolumns-1.13.6/js/dataTables.fixedColumns.min.js new file mode 100644 index 0000000..f8202b8 --- /dev/null +++ b/_freeze/site_libs/dt-ext-fixedcolumns-1.13.6/js/dataTables.fixedColumns.min.js @@ -0,0 +1,4 @@ +/*! FixedColumns 4.3.0 + * © SpryMedia Ltd - datatables.net/license + */ +!function(e){var i,o;"function"==typeof define&&define.amd?define(["jquery","datatables.net"],function(t){return e(t,window,document)}):"object"==typeof exports?(i=require("jquery"),o=function(t,s){s.fn.dataTable||require("datatables.net")(t,s)},"undefined"==typeof window?module.exports=function(t,s){return t=t||window,s=s||i(t),o(t,s),e(s,0,t.document)}:(o(window,i),module.exports=e(i,window,window.document))):e(jQuery,window,document)}(function(o,t,s,F){"use strict";var A,i,e,l,d=o.fn.dataTable;function r(t,s){var e=this;if(i&&i.versionCheck&&i.versionCheck("1.10.0"))return t=new i.Api(t),this.classes=A.extend(!0,{},r.classes),this.c=A.extend(!0,{},r.defaults,s),s&&s.left!==F||this.c.leftColumns===F||(this.c.left=this.c.leftColumns),s&&s.right!==F||this.c.rightColumns===F||(this.c.right=this.c.rightColumns),this.s={barWidth:0,dt:t,rtl:"rtl"===A("body").css("direction")},s={bottom:"0px",display:"block",position:"absolute",width:this.s.barWidth+1+"px"},this.dom={leftBottomBlocker:A("
").css(s).css("left",0).addClass(this.classes.leftBottomBlocker),leftTopBlocker:A("
").css(s).css({left:0,top:0}).addClass(this.classes.leftTopBlocker),rightBottomBlocker:A("
").css(s).css("right",0).addClass(this.classes.rightBottomBlocker),rightTopBlocker:A("
").css(s).css({right:0,top:0}).addClass(this.classes.rightTopBlocker)},this.s.dt.settings()[0]._bInitComplete?(this._addStyles(),this._setKeyTableListener()):t.one("init.dt.dtfc",function(){e._addStyles(),e._setKeyTableListener()}),t.on("column-sizing.dt.dtfc",function(){return e._addStyles()}),t.settings()[0]._fixedColumns=this,t.on("destroy",function(){return e._destroy()}),this;throw new Error("FixedColumns requires DataTables 1.10 or newer")}function h(t,s){void 0===s&&(s=null);t=new d.Api(t),s=s||t.init().fixedColumns||d.defaults.fixedColumns;new e(t,s)}return r.prototype.left=function(t){return t!==F?(0<=t&&t<=this.s.dt.columns().count()&&(this.c.left=t,this._addStyles()),this):this.c.left},r.prototype.right=function(t){return t!==F?(0<=t&&t<=this.s.dt.columns().count()&&(this.c.right=t,this._addStyles()),this):this.c.right},r.prototype._addStyles=function(){this.s.dt.settings()[0].oScroll.sY&&(s=A(this.s.dt.table().node()).closest("div.dataTables_scrollBody")[0],e=this.s.dt.settings()[0].oBrowser.barWidth,s.offsetWidth-s.clientWidth>=e?this.s.barWidth=e:this.s.barWidth=0,this.dom.rightTopBlocker.css("width",this.s.barWidth+1),this.dom.leftTopBlocker.css("width",this.s.barWidth+1),this.dom.rightBottomBlocker.css("width",this.s.barWidth+1),this.dom.leftBottomBlocker.css("width",this.s.barWidth+1));for(var t=null,s=this.s.dt.column(0).header(),e=null,i=(null!==s&&(e=(s=A(s)).outerHeight()+1,t=A(s.closest("div.dataTables_scroll")).css("position","relative")),this.s.dt.column(0).footer()),o=null,l=(null!==i&&(o=(i=A(i)).outerHeight(),null===t)&&(t=A(i.closest("div.dataTables_scroll")).css("position","relative")),this.s.dt.columns().data().toArray().length),d=0,r=0,h=A(this.s.dt.table().node()).children("tbody").children("tr"),n=0,a=new Map,c=0;c=l-this.c.right){if(A(this.s.dt.table().node()).addClass(this.classes.tableFixedRight),t.addClass(this.classes.tableFixedRight),c+1+ye.left)&&(l=r.scrollLeft(),r.scrollLeft(l-(e.left-(d.left+o))))}),this.s.dt.on("draw.dt.dtfc",function(){h._addStyles()}),this.s.dt.on("column-reorder.dt.dtfc",function(){h._addStyles()}),this.s.dt.on("column-visibility.dt.dtfc",function(t,s,e,i,o){o&&!s.bDestroying&&setTimeout(function(){h._addStyles()},50)})},r.version="4.3.0",r.classes={fixedLeft:"dtfc-fixed-left",fixedRight:"dtfc-fixed-right",leftBottomBlocker:"dtfc-left-bottom-blocker",leftTopBlocker:"dtfc-left-top-blocker",rightBottomBlocker:"dtfc-right-bottom-blocker",rightTopBlocker:"dtfc-right-top-blocker",tableFixedLeft:"dtfc-has-left",tableFixedRight:"dtfc-has-right"},r.defaults={i18n:{button:"FixedColumns"},left:1,right:0},e=r,i=(A=o).fn.dataTable,o.fn.dataTable.FixedColumns=e,o.fn.DataTable.FixedColumns=e,(l=d.Api.register)("fixedColumns()",function(){return this}),l("fixedColumns().left()",function(t){var s=this.context[0];return t!==F?(s._fixedColumns.left(t),this):s._fixedColumns.left()}),l("fixedColumns().right()",function(t){var s=this.context[0];return t!==F?(s._fixedColumns.right(t),this):s._fixedColumns.right()}),d.ext.buttons.fixedColumns={action:function(t,s,e,i){o(e).attr("active")?(o(e).removeAttr("active").removeClass("active"),s.fixedColumns().left(0),s.fixedColumns().right(0)):(o(e).attr("active","true").addClass("active"),s.fixedColumns().left(i.config.left),s.fixedColumns().right(i.config.right))},config:{left:1,right:0},init:function(t,s,e){t.settings()[0]._fixedColumns===F&&h(t.settings(),e),o(s).attr("active","true").addClass("active"),t.button(s).text(e.text||t.i18n("buttons.fixedColumns",t.settings()[0]._fixedColumns.c.i18n.button))},text:null},o(s).on("plugin-init.dt",function(t,s){"dt"!==t.namespace||!s.oInit.fixedColumns&&!d.defaults.fixedColumns||s._fixedColumns||h(s,null)}),d}); \ No newline at end of file diff --git a/_freeze/site_libs/dt-ext-fixedheader-1.13.6/css/fixedHeader.dataTables.min.css b/_freeze/site_libs/dt-ext-fixedheader-1.13.6/css/fixedHeader.dataTables.min.css new file mode 100644 index 0000000..988ad2b --- /dev/null +++ b/_freeze/site_libs/dt-ext-fixedheader-1.13.6/css/fixedHeader.dataTables.min.css @@ -0,0 +1 @@ +table.fixedHeader-floating{background-color:white}table.fixedHeader-floating.no-footer{border-bottom-width:0}table.fixedHeader-locked{position:absolute !important;background-color:white}@media print{table.fixedHeader-floating{display:none}}html.dark table.fixedHeader-floating{background-color:var(--dt-html-background)}html.dark table.fixedHeader-locked{background-color:var(--dt-html-background)} diff --git a/_freeze/site_libs/dt-ext-fixedheader-1.13.6/js/dataTables.fixedHeader.min.js b/_freeze/site_libs/dt-ext-fixedheader-1.13.6/js/dataTables.fixedHeader.min.js new file mode 100644 index 0000000..d9b3e03 --- /dev/null +++ b/_freeze/site_libs/dt-ext-fixedheader-1.13.6/js/dataTables.fixedHeader.min.js @@ -0,0 +1,4 @@ +/*! FixedHeader 3.4.0 + * © SpryMedia Ltd - datatables.net/license + */ +!function(o){var i,s;"function"==typeof define&&define.amd?define(["jquery","datatables.net"],function(t){return o(t,window,document)}):"object"==typeof exports?(i=require("jquery"),s=function(t,e){e.fn.dataTable||require("datatables.net")(t,e)},"undefined"==typeof window?module.exports=function(t,e){return t=t||window,e=e||i(t),s(t,e),o(e,t,t.document)}:(s(window,i),module.exports=o(i,window,window.document))):o(jQuery,window,document)}(function(m,H,x,v){"use strict";function s(t,e){if(!(this instanceof s))throw"FixedHeader must be initialised with the 'new' keyword.";if(!0===e&&(e={}),t=new n.Api(t),this.c=m.extend(!0,{},s.defaults,e),this.s={dt:t,position:{theadTop:0,tbodyTop:0,tfootTop:0,tfootBottom:0,width:0,left:0,tfootHeight:0,theadHeight:0,windowHeight:m(H).height(),visible:!0},headerMode:null,footerMode:null,autoWidth:t.settings()[0].oFeatures.bAutoWidth,namespace:".dtfc"+o++,scrollLeft:{header:-1,footer:-1},enable:!0,autoDisable:!1},this.dom={floatingHeader:null,thead:m(t.table().header()),tbody:m(t.table().body()),tfoot:m(t.table().footer()),header:{host:null,floating:null,floatingParent:m('
'),placeholder:null},footer:{host:null,floating:null,floatingParent:m('
'),placeholder:null}},this.dom.header.host=this.dom.thead.parent(),this.dom.footer.host=this.dom.tfoot.parent(),(e=t.settings()[0])._fixedHeader)throw"FixedHeader already initialised on table "+e.nTable.id;(e._fixedHeader=this)._constructor()}var n=m.fn.dataTable,o=0;return m.extend(s.prototype,{destroy:function(){var t=this.dom;this.s.dt.off(".dtfc"),m(H).off(this.s.namespace),t.header.rightBlocker&&t.header.rightBlocker.remove(),t.header.leftBlocker&&t.header.leftBlocker.remove(),t.footer.rightBlocker&&t.footer.rightBlocker.remove(),t.footer.leftBlocker&&t.footer.leftBlocker.remove(),this.c.header&&this._modeChange("in-place","header",!0),this.c.footer&&t.tfoot.length&&this._modeChange("in-place","footer",!0)},enable:function(t,e,o){this.s.enable=t,this.s.enableType=o,!e&&e!==v||(this._positions(),this._scroll(!0))},enabled:function(){return this.s.enable},headerOffset:function(t){return t!==v&&(this.c.headerOffset=t,this.update()),this.c.headerOffset},footerOffset:function(t){return t!==v&&(this.c.footerOffset=t,this.update()),this.c.footerOffset},update:function(t){var e=this.s.dt.table().node();(this.s.enable||this.s.autoDisable)&&(m(e).is(":visible")?(this.s.autoDisable=!1,this.enable(!0,!1)):(this.s.autoDisable=!0,this.enable(!1,!1)),0!==m(e).children("thead").length)&&(this._positions(),this._scroll(t===v||t))},_constructor:function(){var o=this,i=this.s.dt,t=(m(H).on("scroll"+this.s.namespace,function(){o._scroll()}).on("resize"+this.s.namespace,n.util.throttle(function(){o.s.position.windowHeight=m(H).height(),o.update()},50)),m(".fh-fixedHeader")),t=(!this.c.headerOffset&&t.length&&(this.c.headerOffset=t.outerHeight()),m(".fh-fixedFooter"));!this.c.footerOffset&&t.length&&(this.c.footerOffset=t.outerHeight()),i.on("column-reorder.dt.dtfc column-visibility.dt.dtfc column-sizing.dt.dtfc responsive-display.dt.dtfc",function(t,e){o.update()}).on("draw.dt.dtfc",function(t,e){o.update(e!==i.settings()[0])}),i.on("destroy.dtfc",function(){o.destroy()}),this._positions(),this._scroll()},_clone:function(t,e){var o,i,s=this,n=this.s.dt,r=this.dom[t],d="header"===t?this.dom.thead:this.dom.tfoot;"footer"===t&&this._scrollEnabled()||(!e&&r.floating?r.floating.removeClass("fixedHeader-floating fixedHeader-locked"):(r.floating&&(null!==r.placeholder&&r.placeholder.remove(),this._unsize(t),r.floating.children().detach(),r.floating.remove()),e=m(n.table().node()),o=m(e.parent()),i=this._scrollEnabled(),r.floating=m(n.table().node().cloneNode(!1)).attr("aria-hidden","true").css({"table-layout":"fixed",top:0,left:0}).removeAttr("id").append(d),r.floatingParent.css({width:o.width(),overflow:"hidden",height:"fit-content",position:"fixed",left:i?e.offset().left+o.scrollLeft():0}).css("header"===t?{top:this.c.headerOffset,bottom:""}:{top:"",bottom:this.c.footerOffset}).addClass("footer"===t?"dtfh-floatingparentfoot":"dtfh-floatingparenthead").append(r.floating).appendTo("body"),this._stickyPosition(r.floating,"-"),(n=function(){var t=o.scrollLeft();s.s.scrollLeft={footer:t,header:t},r.floatingParent.scrollLeft(s.s.scrollLeft.header)})(),o.off("scroll.dtfh").on("scroll.dtfh",n),r.placeholder=d.clone(!1),r.placeholder.find("*[id]").removeAttr("id"),r.host.prepend(r.placeholder),this._matchWidths(r.placeholder,r.floating)))},_stickyPosition:function(t,i){var s,n;this._scrollEnabled()&&(n="rtl"===m((s=this).s.dt.table().node()).css("direction"),t.find("th").each(function(){var t,e,o;"sticky"===m(this).css("position")&&(t=m(this).css("right"),e=m(this).css("left"),"auto"===t||n?"auto"!==e&&n&&(o=+e.replace(/px/g,"")+("-"===i?-1:1)*s.s.dt.settings()[0].oBrowser.barWidth,m(this).css("left",0b&&s+this.c.headerOffset+u.theadHeightc||this.dom.header.floatingParent===v?t=!0:this.dom.header.floatingParent.css({top:this.c.headerOffset,position:"fixed"}).append(this.dom.header.floating)):h="below",!t&&h===this.s.headerMode||this._modeChange(h,"header",t),this._horizontal("header",i)),p={offset:{top:0,left:0},height:0},g={offset:{top:0,left:0},height:0},this.c.footer&&this.dom.tfoot.length&&(!this.s.enable||!u.visible||u.tfootBottom+this.c.footerOffset<=a?n="in-place":c+u.tfootHeight+this.c.footerOffset>a&&b+this.c.footerOffsets&&(u=a+((c=s-o.top)>-p.height?c:0)-(p.offset.top+(c<-p.height?p.height:0)+g.height),f.outerHeight(u=u<0?0:u),Math.round(f.outerHeight())>=Math.round(u)?m(this.dom.tfoot.parent()).addClass("fixedHeader-floating"):m(this.dom.tfoot.parent()).removeClass("fixedHeader-floating")),this.dom.header.floating&&this.dom.header.floatingParent.css("left",r-i),this.dom.footer.floating&&this.dom.footer.floatingParent.css("left",r-i),this.s.dt.settings()[0]._fixedColumns!==v&&(this.dom.header.rightBlocker=(b=function(t,e,o){var i;return null!==(o=o===v?0===(i=m("div.dtfc-"+t+"-"+e+"-blocker")).length?null:i.clone().css("z-index",1):o)&&("in"===h||"below"===h?o.appendTo("body").css({top:("top"===e?p:g).offset.top,left:"right"===t?r+d-o.width():r}):o.detach()),o})("right","top",this.dom.header.rightBlocker),this.dom.header.leftBlocker=b("left","top",this.dom.header.leftBlocker),this.dom.footer.rightBlocker=b("right","bottom",this.dom.footer.rightBlocker),this.dom.footer.leftBlocker=b("left","bottom",this.dom.footer.leftBlocker)))},_scrollEnabled:function(){var t=this.s.dt.settings()[0].oScroll;return""!==t.sY||""!==t.sX}}),s.version="3.4.0",s.defaults={header:!0,footer:!1,headerOffset:0,footerOffset:0},m.fn.dataTable.FixedHeader=s,m.fn.DataTable.FixedHeader=s,m(x).on("init.dt.dtfh",function(t,e,o){var i;"dt"===t.namespace&&(t=e.oInit.fixedHeader,i=n.defaults.fixedHeader,t||i)&&!e._fixedHeader&&(i=m.extend({},i,t),!1!==t)&&new s(e,i)}),n.Api.register("fixedHeader()",function(){}),n.Api.register("fixedHeader.adjust()",function(){return this.iterator("table",function(t){t=t._fixedHeader;t&&t.update()})}),n.Api.register("fixedHeader.enable()",function(e){return this.iterator("table",function(t){t=t._fixedHeader;e=e===v||e,t&&e!==t.enabled()&&t.enable(e)})}),n.Api.register("fixedHeader.enabled()",function(){if(this.context.length){var t=this.context[0]._fixedHeader;if(t)return t.enabled()}return!1}),n.Api.register("fixedHeader.disable()",function(){return this.iterator("table",function(t){t=t._fixedHeader;t&&t.enabled()&&t.enable(!1)})}),m.each(["header","footer"],function(t,o){n.Api.register("fixedHeader."+o+"Offset()",function(e){var t=this.context;return e===v?t.length&&t[0]._fixedHeader?t[0]._fixedHeader[o+"Offset"]():v:this.iterator("table",function(t){t=t._fixedHeader;t&&t[o+"Offset"](e)})})}),n}); \ No newline at end of file diff --git a/_freeze/site_libs/htmltools-fill-0.5.8.1/fill.css b/_freeze/site_libs/htmltools-fill-0.5.8.1/fill.css new file mode 100644 index 0000000..841ea9d --- /dev/null +++ b/_freeze/site_libs/htmltools-fill-0.5.8.1/fill.css @@ -0,0 +1,21 @@ +@layer htmltools { + .html-fill-container { + display: flex; + flex-direction: column; + /* Prevent the container from expanding vertically or horizontally beyond its + parent's constraints. */ + min-height: 0; + min-width: 0; + } + .html-fill-container > .html-fill-item { + /* Fill items can grow and shrink freely within + available vertical space in fillable container */ + flex: 1 1 auto; + min-height: 0; + min-width: 0; + } + .html-fill-container > :not(.html-fill-item) { + /* Prevent shrinking or growing of non-fill items */ + flex: 0 0 auto; + } +} diff --git a/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/execute-results/html.json b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/execute-results/html.json new file mode 100644 index 0000000..ab9aa01 --- /dev/null +++ b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "8fa906b10ebf476f65c7121c11cee372", + "result": { + "markdown": "---\ntitle: \"Analysis with MIPAnalyzer\"\noutput: html_document\neditor_options: \n chunk_output_type: console\n---\n\n\n\n\n## Overview \n`MIPanalyzer` is a tool suite for analyzing Molecular Inversion Probe (MIP) data with functionalitiies that could be extended to other platforms/outputs such as small variant call files (VCFs) or multi-loci amplicon data. However, for the purpose of this tutorial, we will focus on MIP data using the file structure and outputs from the Bailey-Lab and [`MIPTools`](https://github.com/bailey-lab/MIPTools). \n\n
\n\nThe `MIPanalyzer` suite has four main arenas of analysis: \n\n1. Munging/wrangling MIP data \n2. Calculating genetic distance \n1. Analyzing genetic structure \n2. Summarizing Drug Resistance Mutations (work in progress) \n\n\nThe tool is intended to make analysis of MIP data straightforward and standardized. Notably, many of the methods within the package rely on within-sample allele-frequencies to account for complexity of infection/polyclonality.\n\n\n## Data Input\nWe assume that the user has two files: (1) a variant call file following [VCF4 specifications](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://samtools.github.io/hts-specs/VCFv4.3.pdf) and (2) an amino acid table (TBD). \n\n\n\n\n## Munging MIP Data\nAs input for `MIPanalyzer`, we will start with a variant call file, abbreviated as a [VCF](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://samtools.github.io/hts-specs/VCFv4.3.pdf) of the Sanger Barcode from Vietnam. The VCF is converted into a `mipanalyzer` class that is either biallelic or multiallelic depending on the user specification. \n\n::: {.cell result='hide'}\n\n```{.r .cell-code}\n#......................\n# read in the VCF from the main data page \n# and convert to mipanalyzer_biallelic object\n#......................\ndat_biallelic <- MIPanalyzer::vcf_to_mipanalyzer_biallelic(\"../../data/snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.Vietnam.vcf.gz\") \n```\n\n::: {.cell-output .cell-output-stdout}\n```\nScanning file to determine attributes.\nFile attributes:\n meta lines: 136\n header_line: 137\n variant count: 91\n column count: 106\n\nMeta line 136 read in.\nAll meta lines processed.\ngt matrix initialized.\nCharacter matrix gt created.\n Character matrix gt rows: 91\n Character matrix gt cols: 106\n skip: 0\n nrows: 91\n row_num: 0\n\nProcessed variant: 91\nAll variants processed\n```\n:::\n:::\n\nThe `dat_biallelic` object is a protected MIPanalyzer class (`mipanalyzer_biallelic`) that is tidy and fast container for genomic data. It has slots for sample name, loci (CHROM/POS/ID/REF/ALT/QUAL/FILTER/INFO), coverage (alleleic depth or dp), counts (allelic counts or ad), filter history (upstream manipulations performed on the object), and vcfmeta (the header of the original VCF file). \n\n### Filter \nData can be quickly filtered using a series of commands. Moreover, `MIPanalyzer` incorporates several data visualization functions that extend the interactiveness of filtering process. Below, I explore how much data will be lost if I filter loci that have at least 10 reads for all samples (minimal coverage is 10 reads with a 0% tolerance for any missing data). \n\n\n::: {.cell}\n\n```{.r .cell-code}\ndat_biallelic %>% \n explore_filter_coverage_loci(min_coverage = 10, max_low_coverage = 0)\n```\n\n::: {.cell-output-display}\n![](MIPanalyzer_analysis_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\nI then apply this filter as so: \n\n\n::: {.cell}\n\n```{.r .cell-code}\ndat_biallelic <- dat_biallelic %>%\n filter_coverage_loci(min_coverage = 10, max_low_coverage = 0)\n```\n:::\n\nThere are similar capabilities to filter by sample (`filter_samples`) and by loci (`filter_loci`). Within the `filter_samples` framework, users can exclude samples that have poor coverage cross the genome. Below, I should how these fucntions can be used: filtering samples based on coverage (25 depth for all sites is shown below).\n\n::: {.cell}\n\n```{.r .cell-code}\ndat_biallelic %>% \n explore_filter_coverage_samples(min_coverage = 25, max_low_coverage = 0)\n```\n\n::: {.cell-output-display}\n![](MIPanalyzer_analysis_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n\n```{.r .cell-code}\ndat_biallelic <- dat_biallelic %>%\n filter_coverage_samples(min_coverage = 10, max_low_coverage = 0)\n```\n:::\n\n\nSimilarly, loci that have an unexpected amount of sequencing effort (\"jackpotting\") that may be prone to sequencing error can also be excluded with `filter_overcounts`. Finally, users can remove sites that are uninformative (all variants are the same) with `filter_loci_invariant`. This is demonstrated below: \n\n\n::: {.cell}\n\n```{.r .cell-code}\ndat_biallelic <- dat_biallelic %>%\n filter_loci_invariant()\n```\n:::\n\n\n## Calculating Genetic Distances\nFor this section, we will explore pairwise relatedness through a variety of genetic distance measures. In the first schema, we can consider how allele frequencies are similar between samples using the $d_{ab}$ metric proposed by the MalariaGEN Plasmodium falciparum Community Project in the article \"Genomic epidemiology of artemisinin resistant malaria\", eLIFE (2016) (`get_genomic_distance`). \n\nIn a separate, schema, we can calculate the similarity between two samples based on whether the genetic sequence is identical at given genomic positions, or loci. We can either simply measure the number of sites with identical alleles between two individuals, termed identity by state (IBS), or, we can use statistical models to determine if identical alleles and \"blocks\" of the genome were likely to be inherited from a common ancestor, termed identity by descent (IBD). Separately, we can calculate if within-sample allele frequencies (+/- some degree of tolerance) are identical between two individuals - which is essentially a continuous extension of IBS. The IBS calculations are straightforward and just represent comparisons between samples. IBD calculations require parametric assumptions in order to account for the heritability factor (as part of its definition). The IBD calculator incorporated in MIPanalyzer is based off the classic Malécot definition of IBD and is considered in a maximum-likelihood framework. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# AF based methods \ndab_est <- get_genomic_distance(dat_biallelic, report_progress = FALSE)\n# identity based methods\nmix_est <- get_IB_mixture(dat_biallelic, report_progress = FALSE)\nibd_est <- inbreeding_mle(dat_biallelic, \n f = seq(0, 1, l = 5),\n report_progress = FALSE)\nibs_est <- get_IBS_distance(dat_biallelic, report_progress = FALSE)\n```\n:::\n\nThe output for each distance calculator is a distance matrix with the upper triangle filled in, with the exception of the IBD estimator that also includes the log-likelihood of each inbreeding level, $f$, considered by the model (specified by user). \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# plot all pairwise distances (make a distance instead of similarity score with 1-x)\np1 <- plot_distance(1 - ibd_est$mle, col_pal = \"magma\") + ggtitle(\"IBD Dist\")\np2 <- plot_distance(1 - mix_est, col_pal = \"inferno\") + ggtitle(\"Mixture Dist\")\np3 <- plot_distance(1 - ibs_est, col_pal = \"plasma\") + ggtitle(\"IBS Dist\")\np4 <- plot_distance(dab_est, col_pal = \"viridis\") + ggtitle(\"Dab Dist\")\n\ncowplot::plot_grid(p1,p2,p3,p4, nrow = 2, ncol = 2)\n```\n\n::: {.cell-output-display}\n![](MIPanalyzer_analysis_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\nAs expected, our samples have much less IBD than IBS. Similarly, the $D_{ab}$ has a wide range of values. However, across all four genomic distances, sample pairs that are highly related are \n\n\n## Analyzing Genetic Structure \nIn this section, we will perform a principal _component_ analysis and plot the results to assess for structure (as part of an exploratory data analysis exercise).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# calculate within-sample allele frequencies\nwsaf <- get_wsaf(dat_biallelic, )\n# produce pca \npca <- pca_wsaf(wsaf)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_pca(pca, num_components = 3)\n```\n:::\n\n\n\nWe can also look at the amount of variance captured in our principal components as well as which loci are contributing most to the variation. Understanding the amount of variance that is captured in the principal components conveys how much structure is in the data, while understanding which loci are contributing the most variation conveys signals of selection, drift, etc.\n\n::: {.cell}\n\n```{.r .cell-code}\n# plot percentage variance explained\nplot_pca_variance(pca)\n```\n:::\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get CHROM in numeric format for each locus\nchrom_numeric <- mapply(function(x) as.numeric(strsplit(x, \"_\")[[1]][2]), dat_biallelic$loci$CHROM)\n\n# plot loading values\nplot_pca_contribution(pca, component = 1, chrom = chrom_numeric, pos = dat_biallelic$loci$POS)\n```\n\n::: {.cell-output-display}\n![](MIPanalyzer_analysis_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n:::\n\n\nAnother approach to analyzing structure is to assess principal _coordinate_ analysis. While principal component analysis (PCA) is based on linear combinations, principal coordinate analysis (PCoA) is based on minimizing distance based on an internal loss function (also called classical scaling, function back-ended by `ape::pcoa`). The utility of PCoA versus PCA, is that PCoA can take _any_ distance (matrix) that the user specifies and thus can capture genetic structures that may be nonlinear. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# genomic distance from malariagen manuscript as above\ngdist <- get_genomic_distance(dat_biallelic, report_progress = FALSE)\n# perform PCoA\npcoa <- pcoa_genomic_distance(gdist)\n# scatterplot\nplot_pcoa(pcoa, num_components = 3)\n```\n:::\n\n\n\n\n\n\n\n\n## Summarizing Drug Resistance Mutations (work in progress)\nTo Do (pending file format from Bailey group)\n\n\n## Summary\nIn this tutorial, we explored how to use the `MIPanalyzer` to munge/wrangle MIP data, filter samples, and prepare our MIP data for analysis. We then performed genetic analysis by calculating genetic distances as well as analyzing our data for genetic structure. \n", + "supporting": [ + "MIPanalyzer_analysis_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-11-1.png b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-11-1.png new file mode 100644 index 0000000..baafc08 Binary files /dev/null and b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-11-1.png differ diff --git a/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-2-1.png b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-2-1.png new file mode 100644 index 0000000..fd10463 Binary files /dev/null and b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-2-1.png differ diff --git a/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-4-1.png b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 0000000..8ab804a Binary files /dev/null and b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-4-1.png differ diff --git a/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-7-1.png b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-7-1.png new file mode 100644 index 0000000..3c863a5 Binary files /dev/null and b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_analysis/figure-html/unnamed-chunk-7-1.png differ diff --git a/_freeze/tutorials/MIPanalyzer/MIPanalyzer_background/execute-results/html.json b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_background/execute-results/html.json new file mode 100644 index 0000000..7f2b1b9 --- /dev/null +++ b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "66b96621cfa3803ba8d9e268da3f9a92", + "result": { + "markdown": "---\ntitle: \"Insert tool title\"\noutput: html_document\nauthor: \"Insert name\"\ndate: \"Insert todays date\"\n---\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases MIP data analysis
Authors Robert Verity; OJ Watson; Nick Brazeau
Latest version v1.1.0
License MIT
Website https://mrc-ide.github.io/MIPanalyzer/
Code repository https://github.com/mrc-ide/MIPanalyzer
Publication https://pubmed.ncbi.nlm.nih.gov/32355199/
Tutorial authors Nick Brazeau
Tutorial date Dec 13 2023
\n\n`````\n:::\n:::\n\n\n## Purpose\n\nA quick one paragraph description of what the tool does. For an example, see the [DRpower page](https://mrc-ide.github.io/PGEforge/tutorials/DRpower/DRpower_background.html).\n\n## Existing resources\n\n- Any existing online tutorials?\n- Any important papers?\n\n## Citation\n\nBibTeX style citation. For an R package, you can get this using `citation(package = \"name\")`:\n\nHere is an example for DRpower, using `citation(package = \"DRpower\")`:\n\n```\n@Manual{,\n title = {DRpower: Study design and analysis for pfhrp2/3 deletion prevalence studies},\n author = {Bob Verity and Shazia Ruybal},\n note = {R package version 1.0.2},\n }\n```", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/MIPanalyzer/MIPanalyzer_installation/execute-results/html.json b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_installation/execute-results/html.json new file mode 100644 index 0000000..77fa390 --- /dev/null +++ b/_freeze/tutorials/MIPanalyzer/MIPanalyzer_installation/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "89605557f17bc7c6bac578b2e39d8830", + "result": { + "markdown": "---\ntitle: \"Installing MIPanalyzer\"\noutput: html_document\n---\n\n\n\n\n## Installing from R-Universe\nYou can directly install `MIPanalyzer` from [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds), which greatly simplifies installation. \n\n\n## Installing From Github \n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"remotes\")\nremotes::install_github(\"mrc-ide/MIPanalyzer\")\n```\n:::\n\n\n### Dependencies\n`MIPanalyzer` relies on the [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html) package, which requires certain OS-specific dependencies: \n\n* Windows\n - Download and install the appropriate version of [Rtools](https://cran.rstudio.com/bin/windows/Rtools/) for your version of R. On installation, ensure you check the box to arrange your system PATH as recommended by Rtools\n* Mac OS X\n - Download and install [XCode](http://itunes.apple.com/us/app/xcode/id497799835?mt=12)\n - Within XCode go to Preferences : Downloads and install the Command Line Tools\n* Linux (Debian/Ubuntu)\n - Install the core software development utilities required for R package development as well as LaTeX by executing\n ```\n sudo apt-get install r-base-dev texlive-full\n ```\n\nAssuming everything installed correctly, you can now load the package:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(MIPanalyzer)\n```\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/MalHaploFreq/MalHaploFreq_background/execute-results/html.json b/_freeze/tutorials/MalHaploFreq/MalHaploFreq_background/execute-results/html.json new file mode 100644 index 0000000..ab5be5d --- /dev/null +++ b/_freeze/tutorials/MalHaploFreq/MalHaploFreq_background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "8d76875f24d9d2c0b3d758c2e23aac47", + "result": { + "markdown": "---\ntitle: \"MalHaploFreq About\"\noutput: html_document\nauthor: \"Nicholas Hathaway\"\ndate: \"2023-12-15\"\nbibliography: reference.bib\n---\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases prevalence
Authors Ian Hastings
Latest version NA
Code repository http://pcwww.liv.ac.uk/hastings/MalHaploFreq/
Publication https://doi.org/10.1186/1475-2875-7-130
\n\n`````\n:::\n:::\n\n\n## Purpose\n\nThis tool was created to estimate prevalence of haplotypes (linking up to 3 SNPs together) from SNP loci. This tool requires input determined MOI and does 3 loci. Since the creation of this tool other tools have been created that do similar analysis with additional features, for example [MultiLociBiallelicModel](../MultiLociBiallelicModel/MultiLociBiallelicModel_background.qmd) also estimates prevalence but does not require MOI, can do more loci and estimates the MOI from the input data. \n\n## Citation\n\n[@Hastings2008-mb]\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/dcifer/dcifer_analysis/figure-html/unnamed-chunk-7-1.png b/_freeze/tutorials/dcifer/dcifer_analysis/figure-html/unnamed-chunk-7-1.png new file mode 100644 index 0000000..2e8e46f Binary files /dev/null and b/_freeze/tutorials/dcifer/dcifer_analysis/figure-html/unnamed-chunk-7-1.png differ diff --git a/_freeze/tutorials/dcifer/dcifer_background/execute-results/html.json b/_freeze/tutorials/dcifer/dcifer_background/execute-results/html.json new file mode 100644 index 0000000..0ce3b13 --- /dev/null +++ b/_freeze/tutorials/dcifer/dcifer_background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "0ab11c9cf5df52d4e9d1d913f3ce7cfa", + "result": { + "markdown": "---\ntitle: \"dcifer\"\noutput: html_document\nauthor: \"Shazia Ruybal-Pesántez\"\ndate: \"11 December 2023\"\n---\n\n\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases Genetic relatedness between polyclonal infections
Authors Inna Gerlovina
Latest version 1.2.0
License MIT
Website https://eppicenter.github.io/dcifer/
Code repository https://github.com/EPPIcenter/dcifer
Publication https://doi.org/10.1093/genetics/iyac126
Tutorial authors Shazia Ruybal-Pesántez
Tutorial date 11-Dec-23
\n\n`````\n:::\n:::\n\n\n## Purpose\n\nThe `dcifer` R package is primarily designed to estimate relatedness between polyclonal infections. The data input types must be biallelic or multiallelic data. \n\nThe approach uses a likelihood function and statistical inference, and provides these alongside relatedness estimates.\n\n## Existing resources\n\nThe `dcifer` R package includes built-in functions for reading and reformatting data, performing preparatory steps, and visualizing the results are also included. This is documented in the [`dcifer` R package website](https://eppicenter.github.io/dcifer/index.html). There is also a [tutorial](https://eppicenter.github.io/dcifer/index.html) outlining the analysis process using the `dcifer` R package with microhaplotype data from Mozambique.\n\n## Citation\nThe publication associated with the `dcifer` R package can be found [here (Gerlovina 2022 Genetics)](https://doi.org/10.1093/genetics/iyac126).\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncitation(package = \"dcifer\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nTo cite package 'dcifer' in publications use:\n\n Gerlovina I (2023). _dcifer: Genetic Relatedness Between Polyclonal\n Infections_. R package version 1.2.1,\n .\n\nA BibTeX entry for LaTeX users is\n\n @Manual{,\n title = {dcifer: Genetic Relatedness Between Polyclonal Infections},\n author = {Inna Gerlovina},\n year = {2023},\n note = {R package version 1.2.1},\n url = {https://CRAN.R-project.org/package=dcifer},\n }\n```\n:::\n:::", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/dcifer/dcifer_installation/execute-results/html.json b/_freeze/tutorials/dcifer/dcifer_installation/execute-results/html.json new file mode 100644 index 0000000..b956835 --- /dev/null +++ b/_freeze/tutorials/dcifer/dcifer_installation/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "9aed6c0b1c779798fda52cc19e596124", + "result": { + "markdown": "---\ntitle: \"Installing dcifer\"\noutput: html_document\n---\n\n\n\n\nThe `dcifer` R package is both on CRAN and part of the [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/dcifer). \n\nYou can install it by running the following code: \n\n::: {.cell}\n\n```{.r .cell-code}\n# Install dcifer in R:\ninstall.packages('dcifer', repos = c('https://plasmogenepi.r-universe.dev', 'https://cloud.r-project.org'))\n```\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/hierfstat/analysis/execute-results/html.json b/_freeze/tutorials/hierfstat/analysis/execute-results/html.json new file mode 100644 index 0000000..0aea399 --- /dev/null +++ b/_freeze/tutorials/hierfstat/analysis/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "fb22d84461eec5e2a13bbf1da5cc5347", + "result": { + "markdown": "---\ntitle: \"Analysis tutorial\"\noutput: html_document\n---\n\n\n\n\nFirst we will load the libraries we will need:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(hierfstat)\nlibrary(vcfR)\n```\n:::\n\n\nWe're going to be using WGS data from Vietnam which was generated for the \n[Pf3k Project](https://www.malariagen.net/parasite/pf3k). First load in the metadata containing \nthe sample names and information on where the samples were collected.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmeta<-read.csv(\"../../data/wgs/pf3k/Vietnam/pf3k.metadata.Vietnam.csv\")\n```\n:::\n\n\nThen we can load in the genetic data from VCF format.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvcf<-read.vcfR(\"../../data/wgs/pf3k/Vietnam/SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.Vietnam.vcf.gz\")\n```\n:::\n\n\nNext we extract the genotypes and convert them into a dosage format.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvcf_gts<-extract.gt(vcf)\n\nrecode_gt<-function(x){\n x<-gsub(\"[|]\",\"/\",x)\n x<-gsub(\"0/0\",0,x)\n x<-gsub(\"0/1\",1,x)\n x<-gsub(\"1/1\",2,x)\n as.numeric(x)\n}\n\nraw_gts<-apply(vcf_gts,MARGIN = 2,function(x){recode_gt(x)})\n```\n:::\n\n\n\nNow we can remove loci which are monomorphic.\n\n::: {.cell}\n\n```{.r .cell-code}\ncalculate_af<-function(x){\n sum(x,na.rm = T)/length(x)*2\n}\naf <- apply(raw_gts,MARGIN = 1,calculate_af)\ntransposed_gts<-as.data.frame(t(raw_gts[which(af>0 & af<1),]))\n```\n:::\n\n\nWe now have a matrix where the samples are rows and the columns are loci. Hierfstat requires you to also\nprovide population assignments for each sample by adding it to the data as the first column.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndat<-cbind(meta$site,transposed_gts)\n```\n:::\n\n\nNow we are ready to calculate some statistics!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresults<-basic.stats(dat)\nresults$overall\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Ho Hs Ht Dst Htp Dstp Fst Fstp Fis Dest \n 0.0674 0.0635 0.0639 0.0003 0.0640 0.0005 0.0052 0.0078 -0.0601 0.0005 \n```\n:::\n:::\n\n\nThe `results$overall` table contains basic statistics averaged over loci. The \nstatistics presented are defined in eq.7.38– 7.43 pp.164–5 of Nei (1987).\n\n\n## Summary\n\nWe loaded in data from VCF format, converted this to dosage and finally calculated basic statistics.", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/hierfstat/background/execute-results/html.json b/_freeze/tutorials/hierfstat/background/execute-results/html.json new file mode 100644 index 0000000..4afd6a8 --- /dev/null +++ b/_freeze/tutorials/hierfstat/background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "48c9ea104b86eb97287507096ee471c1", + "result": { + "markdown": "---\ntitle: \"Hierfstat\"\noutput: html_document\nauthor: \"Jody Phelan\"\ndate: \"13-12-2023\"\n---\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases Calculate F statistics
Authors Jerome Goudet Thibaut Jombart Zhian N. Kamvar Eric Archer Olivier Hardy
Latest version v0.5-11
License GPL (>=2)
Website https://cran.r-project.org/web/packages/hierfstat/index.html
Code repository https://github.com/jgx65/hierfstat
Publication https://www.sciencedirect.com/science/article/pii/S1567134807001037
Tutorial authors Jody Phelan
Tutorial date 13-12-2023
\n\n`````\n:::\n:::\n\n\n## Purpose\n\nThis tool allows you to calculate F statistics on genetic data. You can read in data from a \nvariety of formats including VCF.\n\n## Existing resources\n\n- The original paper with a samll tutorial is available [here](https://www.sciencedirect.com/science/article/pii/S1567134807001037)\n\n## Citation\n\n```\n@Manual{,\n title = {hierfstat: Estimation and Tests of Hierarchical F-Statistics},\n author = {Jerome Goudet and Thibaut Jombart},\n year = {2022},\n note = {https://www.r-project.org, https://github.com/jgx65/hierfstat},\n}\n```\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/hierfstat/installation/execute-results/html.json b/_freeze/tutorials/hierfstat/installation/execute-results/html.json new file mode 100644 index 0000000..cff866f --- /dev/null +++ b/_freeze/tutorials/hierfstat/installation/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "374d068796f5f044e1abee39dfeb5cff", + "result": { + "markdown": "---\ntitle: \"Installing hierfstat\"\noutput: html_document\n---\n\n\n\n\nWe can easily install `hierfstat` from the [plasmogenepi](https://plasmogenepi.r-universe.dev) repository\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif(!require(\"hierfstat\")) install.packages('hierfstat', repos = c('https://plasmogenepi.r-universe.dev', 'https://cloud.r-project.org'))\n```\n:::\n\n\nWe will also need `vcfR` to load some data. Install this with:\n\n::: {.cell}\n\n```{.r .cell-code}\nif(!require(\"vcfR\")) install.packages(\"vcfR\")\n```\n:::\n\nOnce this finishes, check that the library loads ok\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(hierfstat)\n```\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/hmmibdr/hmmibdr_background/execute-results/html.json b/_freeze/tutorials/hmmibdr/hmmibdr_background/execute-results/html.json new file mode 100644 index 0000000..b074bd4 --- /dev/null +++ b/_freeze/tutorials/hmmibdr/hmmibdr_background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "2c8d089f6ddcd0a20f880f78ce8cac12", + "result": { + "markdown": "---\ntitle: \"hmmibdr\"\noutput: html_document\nauthor: \"Sophie Berube\"\ndate: \"13-12-2023\"\n---\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases Rcpp wrapper for hmmIBD
Authors OJ Watson
Latest version 0.2.0
License OJ Watson 2019
Website NA
Code repository https://github.com/OJWatson/hmmibdr/tree/main
Publication NA
Tutorial authors Sophie Berube
Tutorial date 13-12-2023
\n\n`````\n:::\n:::\n\n\n## Purpose\nThis package is an Rcpp wrapper for the hmmibd software. Currently, there are several c and python scripts (within the hmmibd software) required to convert files from VCF format to a text format that can then be used by either hmmibd or hmmibdr to perform further analysis. However, these scripts require the user to run c and python code; these are the same requirements as those for performing analysis using only the hmmibd software . Therefore, we recommend the use of the hmmibd software directly for the entire analysis. \n\nPlease refer to the hmmibd tutorial for a complete demonstration of the tool. \n\n## Existing resources\n\n- A short tutorial for hmmibdr is located on the [Github Page](https://github.com/OJWatson/hmmibdr/tree/main)\n- See the hmmibd for further information.\n\n## Citation\n\n```\n@Manual{,\n title = {hmmibdr: HMM Identity by Descent},\n author = {OJ Watson},\n year = {2023},\n note = {R package version 0.2.0},\n }\n\n```", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/isoRelate/isoRelate_background/execute-results/html.json b/_freeze/tutorials/isoRelate/isoRelate_background/execute-results/html.json new file mode 100644 index 0000000..145ee0d --- /dev/null +++ b/_freeze/tutorials/isoRelate/isoRelate_background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "1145b354edfc591a49855909d11ff6ac", + "result": { + "markdown": "---\ntitle: \"isoRelate\"\noutput: html_document\nauthor: \"Kirsty McCann\"\ndate: \"12 Dec 2023\"\n---\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases Identity-by-decent (IBD), IBD multiple infections, IBD networks and significance plots
Authors Lyndan Henden
Latest version v0.1.0
License YEAR:2018 COPYRIGHT HOLDER: Lyndal Henden
Website https://github.com/bahlolab/isoRelate/
Code repository https://github.com/bahlolab/isoRelate/
Publication https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007279
Tutorial authors Kirsty McCann
Tutorial date 12-Dec-23
\n\n`````\n:::\n:::\n\n\n## Purpose\n\nThe *isoRelate* R package has been developed with the intention to perform pairwise identity by decent analysis on haploid recombining organisms in the presence of multiclonal infections using SNP genotype data, but can also be applicable to whole genome sequencing data.\n\n*isoRelate* uses a novel statistic that has been developed by inferred IBD status at genomic locations. This statistic is used to identify loci under positive selection and illustrate visually appealling relatedness networks as a means of exploring shared haplotypes within populations.\n\nAppropriate uses for *isoRelate*\n1. Estimate proportion of genome shared IBD between isolate pairs,\n2. Detect genomic regions that are identicial by decent between isolate pairs,\n3. IBD within multiple infections,\n4. Identify loci under positive selection,\n5. Develops networks and is visually appealing.\n\n*isoRelate* performs pairwise relatedness mapping on haploid isolates using a first order continuous time hidden Markov model (HMM).\nThose isolates with multiple infections where the infected individual is carrying multiple genetically distinct strains of the species may have multiplicity of infection (MOI) greater than 1, and isoRelate can analyze them using a diploid model, rather than a haploid model.\n\nThe IBD segments are calculated using the Viterbi algorithm, which finds the single most likely sequence of IBD states that could have generated the observed genotypic data. An alternative method to this is to calculate the posterior probability of IBD sharing, which calculates the probability of sharing 0, 1 or 2 alleles IBD at each SNP, given the genotypic data. Thus, in addition to the Viterbi algorithm, we provide a function to generate the average posterior probability of IBD sharing for each isolate pair, which is calculated as; \n\n$$avePostPr = \\frac{PostPr(IBD = 1)}{2} + PostPr(IBD = 2).$$\n\n## Data formats\n\nTo use this tool, *isoRelate* requires PED and MAP formats that contains unphased genotype data for SNPs. Previously, generated of ped and map files has not been adequately documented. This tutorial attempts to add additional steps to process vcf data into pedmap format for use in *isoRelate*.\n\nPlease note: indels will cause problems when running isoRelate!\n\nThe typical pedmap format is as follows:\n1. Family ID\n2. Isolate ID\n3. Paternal ID\n4. Maternal ID\n5. Multiplicity of infection (MOI) (1 = single infection or haploid, 2 = multiple infections or diploid)\n6. Phenotype (1=unaffected, 2=affected, 0=unknown)\n\nThe IDs are alphanumeric: the combination of family and isolate ID should uniquely identify a sample.\nColumns 7 onwards (white-space delimited) are the isolate genotypes for biallelic SNPs where the A and B alleles are coded as 1 and 2 respectively and missing genotypes are coded as 0. \nAll SNPs must have two alleles specified and each allele should be in a separate column.\nFor single infections, genotypes should be specified as homozygous. \nEither Both alleles should be missing (i.e. 0) or neither. Column labels are not required.\n\nImportantly the paternal ID, maternal ID and phenotype columns are not used by isoRelate, however are required for completeness of the pedigree. For this tutorial, these columns have not been included.\n\nExamples of informative family IDs are the sample collection site or country, however family IDs can be the same as the isolate IDs.\n\nThe MAP file contains exactly 4 columns of information:\n1. Chromosome \n2. SNP identifier\n3. Genetic map distance (centi-Morgans or Morgans)\n4. Base-pair position\n\nwhere each row describes a single marker. \nGenetic map distances and base-pair positions are expected to be positive values. \nThe MAP file must be ordered by increasing chromosome genetic map distance. \nSNP identifiers can contain any characters except spaces or tabs; also, you should avoid * symbols in names. \nThe MAP file must contain as many markers as are in the PED file. Column labels are not required.\n\n## Existing resources\n\n- isoRelate vignettes - https://github.com/bahlolab/isoRelate/blob/master/vignettes/introduction.Rmd\n- Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens - https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007279\n\n## Citation\n\nBibTeX style citation. For an R package, you can get this using `citation(package = \"isoRelate\")`:\n\n```\n@Manual{,\n title = {Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens},\n author = {Lyndan Henden, Stuart Lee, Alyssa Barry, Melanie Bahlo},\n note = {R package version 1.0.2},\n }\n```\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/isoRelate/isoRelate_installation/execute-results/html.json b/_freeze/tutorials/isoRelate/isoRelate_installation/execute-results/html.json new file mode 100644 index 0000000..0a8d022 --- /dev/null +++ b/_freeze/tutorials/isoRelate/isoRelate_installation/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "83572eeaa7b7d9472bdf43e2f57200f4", + "result": { + "markdown": "---\ntitle: \"Installing isoRelate\"\noutput: html_document\n---\n\n\n\n\n## Step 1: Use plasmogenepi.r-universe installaion\n\nplasmogenepi.r-universe [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds), greatly simplifies package installation. \n\nTo install isoRelate, use the following:\n\n::: {.cell}\n\n```{.r .cell-code}\n# Install isoRelate in R:\ninstall.packages('isoRelate', repos = c('https://plasmogenepi.r-universe.dev', 'https://cloud.r-project.org'))\n```\n:::\n\n\nAssuming all has been installed correctly with no errors, now load the package in RStudio:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(isoRelate)\n```\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/malariaem/malariaem_background/execute-results/html.json b/_freeze/tutorials/malariaem/malariaem_background/execute-results/html.json new file mode 100644 index 0000000..e8168bd --- /dev/null +++ b/_freeze/tutorials/malariaem/malariaem_background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "e8c017b027ee069a618e24a6a3320491", + "result": { + "markdown": "---\ntitle: \"malaria.em\"\noutput: html_document\nauthor: \"Nicholas Hathaway\"\ndate: \"2023-12-15\"\nbibliography: reference.bib\n---\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases estimate haplotype frequencies
Authors Xiaohong Li
Latest version NA
Code repository https://cran.r-project.org/src/contrib/Archive/malaria.em/
Publication https://doi.org/10.2202/1544-6115.1321
\n\n`````\n:::\n:::\n\n\n## Purpose\nProgram designed to discern the combinations of mutations at the population level to estimate frequencies of these clonal sequences. This program utilizes an expectation maximization (EM) approach to maximum likelihood estimation of haplotype frequencies from the SNP loci per sample information. \n\nThe code for this program was removed from CRAN in 2014 and does not appear to be activiely maintained. A program that does something similar is [MultiLociBiallelicModel](../MultiLociBiallelicModel/MultiLociBiallelicModel_background.qmd). \n\n## Citation\n\n[@Li2007-ph]\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/moire/moire_analysis/execute-results/html.json b/_freeze/tutorials/moire/moire_analysis/execute-results/html.json new file mode 100644 index 0000000..431c6cb --- /dev/null +++ b/_freeze/tutorials/moire/moire_analysis/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "a9d3b86249c5b71b9c7d743fd2c75df2", + "result": { + "markdown": "---\ntitle: \"Running Moire Example\"\noutput: html_document\n---\n\n\n\n\n## The data\n\nIn this analysis we will be using the SNP barcode data from the sanger 100 SNP Plasmodium falciparum barcode [(Chang et al. 2019)]( https://doi.org/10.7554/eLife.43481). \nThis was generated by subsetting the WGS data (also described within the [Data section](../../website_docs/data_description.qmd) of this website) and we will be using the data from the DRC. \nSee the description in the [Data section](../../website_docs/data_description.qmd) for more details on this dataset.\n\nIn this tutorial we will use `PGEhammer` to convert data from the VCF format to the format required by Dcifer. To install the package run the following command\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Install PGEhammer in R:\ninstall.packages('PGEhammer', repos = c('https://plasmogenepi.r-universe.dev', 'https://cloud.r-project.org'))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nThe downloaded binary packages are in\n\t/var/folders/wx/rr171mzs0lj0mtflng6dwl7h0000gp/T//RtmpawfTNW/downloaded_packages\n```\n:::\n:::\n\n\nNow we can import libraries that we will need. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(here)\nlibrary(vcfR)\nlibrary(PGEhammer)\n```\n:::\n\n\nMoire requires input data in either long or wide format, which can be loaded using the `load_long_form_data()` or `load_delimited_data()` functions, respectively. The long format represents data with each observation on a separate row, while the wide format uses a separate column for each variable. In the following steps, we will convert the Variant Call Format (VCF) data to the required long format using the function vcf2long from PGEhammer.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the vcf \nvcf <- read.vcfR('../../data/snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.DRCongo.vcf.gz')\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nScanning file to determine attributes.\nFile attributes:\n meta lines: 136\n header_line: 137\n variant count: 91\n column count: 122\n\nMeta line 136 read in.\nAll meta lines processed.\ngt matrix initialized.\nCharacter matrix gt created.\n Character matrix gt rows: 91\n Character matrix gt cols: 122\n skip: 0\n nrows: 91\n row_num: 0\n\nProcessed variant: 91\nAll variants processed\n```\n:::\n\n```{.r .cell-code}\n# Convert to long format \ndf <- vcf2long(vcf)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nConverting from vcf to long format...\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nReformatting complete.\n```\n:::\n\n```{.r .cell-code}\nhead(df) |>\n kable() |>\n kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\"))\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
sample_id locus allele read_count
QG0182-C Pf3D7_01_v3_145515 allele-1 36
QG0182-C Pf3D7_01_v3_145515 allele-2 0
QG0182-C Pf3D7_01_v3_179347 allele-1 0
QG0182-C Pf3D7_01_v3_179347 allele-2 188
QG0182-C Pf3D7_01_v3_180554 allele-1 0
QG0182-C Pf3D7_01_v3_180554 allele-2 150
\n\n`````\n:::\n:::\n\n\nPrior to executing Moire, it is advisable to introduce a new column to identify alleles with an insufficient read count. In the following step, we apply a filter to isolate alleles with a read count below 10.\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$is_missing <- df$read_count < 10\n```\n:::\n\n\nFor the purpose of this tutorial we will subset the data to just include chromosome 1\n\n::: {.cell}\n\n```{.r .cell-code}\nsubset_df <- df[grepl('^Pf3D7_01', df$locus), ]\n```\n:::\n\n\n## Running the MCMC \n\nNow that we have the data in the correct format we will run the Markov chain Monte Carlo (MCMC) with the default parameters.\n\n\n::: {.cell hash='moire_analysis_cache/html/mcmc_471c8685dd27a2656c6522403dd7a491'}\n\n```{.r .cell-code}\ndata <- moire::load_long_form_data(subset_df)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nAdding missing grouping variables: `locus`\n```\n:::\n\n```{.r .cell-code}\nmcmc_results <- moire::run_mcmc(data, is_missing=data$is_missing, verbose=FALSE)\n```\n:::\n\n\n## Summarising the results\n\nFrom the MCMC results we can produce estimations for each sample and summarise the results. \n\nFirst we will look at COI using `summarize_coi`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Estimate the COI for each sample\ncoi_summary <- moire::summarize_coi(mcmc_results)\n\nhead(coi_summary) |>\n kable() |>\n kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\"))\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
sample_id post_coi_lower post_coi_med post_coi_upper post_coi_mean naive_coi offset_naive_coi prob_polyclonal
QG0182-C 18 30 40.000 29.771 2 2 1
QG0182-C 19 30 39.000 29.390 2 2 1
QG0182-C 20 29 39.000 29.374 2 2 1
QG0182-C 18 28 37.000 27.604 2 2 1
QG0182-C 21 29 38.025 29.508 2 2 1
QG0183-C 20 30 40.000 29.987 2 2 1
\n\n`````\n:::\n:::\n\n\nThis dataframe includes summaries of both the posterior distribution of COI for each biological sample and the naive estimates.\n\nBelow we use the `summarize_he` function to summarise locus heterozygosity from the posterior distribution of sampled allele frequencies.\n\n::: {.cell}\n\n```{.r .cell-code}\nhe_summary <- moire::summarize_he(mcmc_results)\n\nhead(he_summary) |>\n kable() |>\n kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\"))\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
locus post_stat_lower post_stat_med post_stat_upper post_stat_mean
Pf3D7_01_v3_145515 0.4109930 0.4874592 0.4999798 0.4792203
Pf3D7_01_v3_179347 0.4080085 0.4880050 0.4999290 0.4797910
Pf3D7_01_v3_180554 0.4193941 0.4903099 0.4999454 0.4807489
Pf3D7_01_v3_283144 0.4123817 0.4918062 0.4999624 0.4819307
Pf3D7_01_v3_535211 0.4262921 0.4895070 0.4999511 0.4819215
\n\n`````\n:::\n:::\n\n\nUsing `summarize_allele_freqs` we can look at individual allele frequencies\n\n::: {.cell}\n\n```{.r .cell-code}\nallele_freq_summary <- moire::summarize_allele_freqs(mcmc_results)\n\nhead(allele_freq_summary) |>\n kable() |>\n kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\"))\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
post_allele_freqs_lower post_allele_freqs_med post_allele_freqs_upper post_allele_freqs_mean locus allele
0.3041579 0.5005537 0.6774932 0.4964165 Pf3D7_01_v3_145515 allele-1
0.3225068 0.4994463 0.6958421 0.5035835 Pf3D7_01_v3_145515 allele-2
0.3087308 0.4980584 0.6845234 0.4976799 Pf3D7_01_v3_179347 allele-1
0.3154766 0.5019416 0.6912692 0.5023201 Pf3D7_01_v3_179347 allele-2
0.3112285 0.5079918 0.6829471 0.5037133 Pf3D7_01_v3_180554 allele-1
0.3170529 0.4920082 0.6887715 0.4962867 Pf3D7_01_v3_180554 allele-2
\n\n`````\n:::\n:::\n\n\nThe `summarize_relatedness` function provides a dataframe of within-host relatedness\n\n::: {.cell}\n\n```{.r .cell-code}\nrelatedness_summary <- moire::summarize_relatedness(mcmc_results)\n\nhead(relatedness_summary) |>\n kable() |>\n kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\"))\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
sample_id post_relatedness_lower post_relatedness_med post_relatedness_upper post_relatedness_mean
QG0182-C 0.0197030 0.4604298 0.8541539 0.4249814
QG0182-C 0.0337680 0.3783443 0.8409948 0.4144203
QG0182-C 0.0191771 0.3811882 0.8407041 0.4019389
QG0182-C 0.0190355 0.3933736 0.8386370 0.4018551
QG0182-C 0.0135405 0.3737309 0.8285997 0.3866478
QG0183-C 0.0256464 0.4261874 0.8394141 0.4265783
\n\n`````\n:::\n:::\n\n\nThe *moire* tool introduces a new metric called effective MOI, which adjusts for within-host relatedness. A detailed description of this metric and how to interpret it can be found [here](https://eppicenter.github.io/moire/articles/mcmc_demo.html#what-is-effective-coi).\n\n::: {.cell}\n\n```{.r .cell-code}\neffective_coi_summary <- moire::summarize_effective_coi(mcmc_results)\n\nhead(effective_coi_summary) |>\n kable() |>\n kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\"))\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
sample_id post_effective_coi_lower post_effective_coi_med post_effective_coi_upper post_effective_coi_mean
QG0182-C 5.135987 16.71772 32.38607 17.47504
QG0182-C 5.313454 17.67340 32.47499 17.59741
QG0182-C 5.074139 17.62940 31.90710 17.89686
QG0182-C 5.008999 16.24006 31.78271 16.96374
QG0182-C 5.553618 18.14092 34.68141 18.56370
QG0183-C 5.266727 17.06538 33.06818 17.52451
\n\n`````\n:::\n:::\n\n\n## Summary\n\nIn summary, *moire* estimates allele frequencies, MOI, and within-host relatedness. We have shown how to generate basic results from a VCF. Moire has extensive [documentation](https://eppicenter.github.io/moire/index.html), including more details on [other functionality](https://eppicenter.github.io/moire/articles/mcmc_demo.html#estimating-allele-frequencies) available within the tool and a [tutorial](https://eppicenter.github.io/moire/articles/mcmc_demo.html#estimating-allele-frequencies) validating the outputs using simulated data.", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/moire/moire_background/execute-results/html.json b/_freeze/tutorials/moire/moire_background/execute-results/html.json new file mode 100644 index 0000000..f78d37d --- /dev/null +++ b/_freeze/tutorials/moire/moire_background/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "fb80cf8484a4c0228cc6152ecee792d2", + "result": { + "markdown": "---\ntitle: \"moire\"\noutput: html_document\nauthor: \"Kathryn Murie\"\ndate: \"11-Dec-23\"\n---\n\n\n\n\n
\n
\n\n## Summary sheet\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Main use-cases Estimating multiplicity of infection (MOI), population allele frequencies, and within-host relatedness from polyallelic genomics data.
Authors Maxwell Murphy, Bryan Greenhouse
Latest version v3.1.0
License GNU General Public License v3.0
Website https://eppicenter.github.io/moire/
Code repository https://github.com/EPPIcenter/moire
Publication https://doi.org/10.1101/2023.10.03.560769
Tutorial authors Kathryn Murie
Tutorial date 11-Dec-23
\n\n`````\n:::\n:::\n\n\n## Purpose\n\nThe *moire* (Multiplicity Of Infection and allele frequency REcovery) tool can be used to estimate allele frequencies, MOI, and within-host relatedness from genetic data subject to experimental error. It utilises a Markov Chain Monte Carlo (MCMC) based approach to Bayesian estimation and can take both polyallelic and SNP data as inputs. This tool also introduces a new metric called effective MOI (eMOI), which combines MOI and within-host relatedness into a unified and comparable measure of genetic diversity. \n\n## Existing resources\n\n- The [*moire* website](https://eppicenter.github.io/moire/index.htm) provides basic usage instructions\n- Within the [*moire* website](https://eppicenter.github.io/moire/articles/mcmc_demo.html) there is a more in depth tutorial using simulated genotyping data. \n\n## Citation\n\nPlease use the following citation:\n\n```\n@Article{,\n title = {MOIRE: A software package for the estimation of allele frequencies and effective multiplicity of infection from polyallelic data},\n author = {Maxwell Murphy and Bryan Greenhouse},\n journal = {bioRxiv},\n year = {2023},\n doi = {10.1101/2023.10.03.560769},\n abstract = {Malaria parasite genetic data can provide insight into parasite phenotypes, evolution, and transmission. However, estimating key parameters such as allele frequencies, multiplicity of infection (MOI), and within-host relatedness from genetic data has been challenging, particularly in the presence of multiple related coinfecting strains. Existing methods often rely on single nucleotide polymorphism (SNP) data and do not account for within-host relatedness. In this study, we introduce a Bayesian approach called MOIRE (Multiplicity Of Infection and allele frequency REcovery), designed to estimate allele frequencies, MOI, and within-host relatedness from genetic data subject to experimental error. Importantly, MOIRE is flexible in accommodating both polyallelic and SNP data, making it adaptable to diverse genotyping panels. We also introduce a novel metric, the effective MOI (eMOI), which integrates MOI and within-host relatedness, providing a robust and interpretable measure of genetic diversity. Using extensive simulations and real-world data from a malaria study in Namibia, we demonstrate the superior performance of MOIRE over naive estimation methods, accurately estimating MOI up to 7 with moderate sized panels of diverse loci (e.g. microhaplotypes). MOIRE also revealed substantial heterogeneity in population mean MOI and mean relatedness across health districts in Namibia, suggesting detectable differences in transmission dynamics. Notably, eMOI emerges as a portable metric of within-host diversity, facilitating meaningful comparisons across settings, even when allele frequencies or genotyping panels are different. MOIRE represents an important addition to the analysis toolkit for malaria population dynamics. Compared to existing software, MOIRE enhances the accuracy of parameter estimation and enables more comprehensive insights into within-host diversity and population structure. Additionally, MOIRE{\textquoteright}s adaptability to diverse data sources and potential for future improvements make it a valuable asset for research on malaria and other organisms, such as other eukaryotic pathogens. MOIRE is available as an R package at https://eppicenter.github.io/moire/.Competing Interest StatementThe authors have declared no competing interest.},\n publisher = {Cold Spring Harbor Laboratory},\n}\n```", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/tutorials/moire/moire_installation/execute-results/html.json b/_freeze/tutorials/moire/moire_installation/execute-results/html.json new file mode 100644 index 0000000..a964a6a --- /dev/null +++ b/_freeze/tutorials/moire/moire_installation/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "7df195f0977590bf6c1444db0b0a5945", + "result": { + "markdown": "---\ntitle: \"Installing moire\"\noutput: html_document\n---\n\n\n\n\n# Install moire in R:\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages('moire', repos = c('https://plasmogenepi.r-universe.dev', 'https://cloud.r-project.org'))\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nPackage which is only available in source form, and may need\n compilation of C/C++/Fortran: 'moire'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\ninstalling the source package 'moire'\n```\n:::\n:::", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/website_docs/R_universe/execute-results/html.json b/_freeze/website_docs/R_universe/execute-results/html.json new file mode 100644 index 0000000..6039164 --- /dev/null +++ b/_freeze/website_docs/R_universe/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "1a8840e88e06cfb80ecf6f40c1240f06", + "result": { + "markdown": "---\ntitle: \"The PlasmoGenEpi R-universe\"\nsubtitle: \"The package ecosystem simplifying tool installation\"\nformat: html\n---\n\n\nThe rOpenSci [R-universe system](https://ropensci.org/r-universe/) provides a framework for users and developers of R-packages to create a package 'ecosystem' in a **simple** and **accessible** way. The R-universe build infrastructure provides a high-performance package server, integrated monitoring tools, and REST APIs, including a front-end dashboard, to provide a dynamics and automated R package repository. \n\n::: {.column-margin}\n![rOpenSci R-universe logo](img/runiverse_logo.png)\n:::\n\nThis infrastructure provides pre-built packages, which greatly simplifies installation of non-CRAN packages and those with complex dependencies such as C++ compilers. The R-universe platform is used by professionals from different industries including brain research, climate change, infectious disease research and government data analysis. If you are interested in learning more about the R-universe infrastructure, you can check out this [blog post](https://doi.org/10.59350/cq5fj-wj639) and we encourage you to explore the resources at the [rOpenSci R-universe website](https://ropensci.org/r-universe/) and the [R-universe dashboard](https://r-universe.dev/search/). \n\n\n### The [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds)\nThe [PlasmoGenEpi](https://www.plasmogenepi.org) community has created a dedicated [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds) with R packages relevant to *Plasmodium* genomic analyses. \n\nThe dashboard build page can be seen below, where a timeline of updates to the packages is displayed as a bar chart, and below it the general information for each package, including links to the Github repository, who is the main maintaner of the package, and when it was last built. \n\n![The PlasmoGenEpi R-universe build dashboard](img/pge_runiv_screenshot.png)\n\nYou can navigate through to the 'Packages' tab for more detailed descriptions of each package, as seen below.\n\n![The PlasmoGenEpi R-universe package dashboard](img/pge_pkgs_screenshot.png)\n\nYou can navigate through to the 'API' tab for more details on how to access the r-universe via its API, as seen below.\n\n![The PlasmoGenEpi R-universe package dashboard](img/pge_api_screenshot.png)\n\n### Installation instructions\n\nHaving R packages within the R-universe ecosystem makes the installation process **much** simpler! This is an important aim of our community to ensure that the R packages available for analysis of *Plasmodium* genomic analysis are easily accessible to end-users. \n\n\n\nBelow we show an example of how you could install the DRpower package via the [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/DRpower). \n\n::: {.column-margin}\n![You can see how easy this will be with one line of code! 🙌\n](https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExcDRwN2wxMjN0MnRxbnY0bDB3NDFueThhOXdsZzdiOGNvcXU0bzNtYyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3o8dFn5CXJlCV9ZEsg/giphy.gif){ width=300 height=200 }\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Install 'DRpower' in R:\ninstall.packages('DRpower', \n repos = c('https://plasmogenepi.r-universe.dev', \n 'https://cloud.r-project.org'))\n```\n:::\n\n\nAll the [tool tutorials](tutorials_overview.qmd) in PGEforge have installation instructions, and where applicable, provide instructions for installation via the [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds).", + "supporting": [ + "R_universe_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/website_docs/software_standards/execute-results/html.json b/_freeze/website_docs/software_standards/execute-results/html.json new file mode 100644 index 0000000..550cb67 --- /dev/null +++ b/_freeze/website_docs/software_standards/execute-results/html.json @@ -0,0 +1,20 @@ +{ + "hash": "db135b5feb6049541afb42c37d885c0b", + "result": { + "markdown": "---\ntitle: \"Software standards\"\nformat: html\n---\n\n\n\n\n\n\n## Framework for evaluating software standards in *Plasmodium* genomics\n\nPGEforge aims to foster an ecosystem of high-quality, user-friendly tools that can be seamlessly integrated into genomic analysis workflows. One of the biggest challenges is the variability and lack of systematic assessment of existing tools, which often do not adhere to best practices in software development, including [FAIR standards](https://doi.org/10.1038/sdata.2016.18), maintenance, and usability.\n\nWorking towards this goal, a robust software standards evaluation framework was formulated to guide the development and assessment of tools used in *Plasmodium* genomic data analysis from both the end-user and developer perspective. This framework is crucial in addressing the variability and challenges associated with existing software tools but also to guide development of new tools, ensuring that they meet high standards of usability, accessibility, and reliability. \n\n### 'Ideal' software practices\nOne of the primary objectives of this framework is to define ‘ideal’ software practices that are not tool-specific but applicable across a range of genomic analysis tools. These practices encompass:\n\n- Comprehensive documentation\n- Ease of installation\n- Reliable and maintainable software\n\nAdditionally, during the [2023 RADISH23 hackathon](radish23.qmd), focused discussions highlighted the following software practices that are not necessarily essential, but are \"nice-to-have's\": \n\n- Uses standard data input formats \n- Computationally efficient\n- Informative error handling\n- Multiple languages for tutorials\n- Minimal dependencies\n- Modular code (eg split into functions)\n- Well annotated code\n\nThese practices can guide development of new tools and/or improvement of existing tools. \n\n### Evaluation criteria \nTo implement these standards and facilitate tool evaluation and development, PGEforge has developed a set of measurable criteria that can be applied to evaluate the performance and usability of various tools. There are two categories:\n\n- **User-facing:** criteria to evaluate the tool from an end-user perspective, for example whether installation instructions are available and easy-to-follow\n- **Developer-facing:** criteria to evaluate the tool from a developer perspective, for example whether unit tests are implemented\n\n::: {.column-margin}\nThe evaluation criteria encompass the following key themes, in line with the 'ideal' software practices for both end-users and developers:\n\n- Quality and comprehensiveness of documentation\n- Simplicity of installation processes\n- Quality assurance and maintenance\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n\n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Criteria Type Notes
User-facing
Installation instructions exist binary
Usage instructions exist binary
Tutorials exist binary Must have explanation of inputs, outputs, and test data set in a worked example
Test data sets and results available binary
Developer-facing
Open source binary
Has software tests binary Eg, unit tests
More than 90% code coverage reported binary
Clear channels for software maintenance and issues binary Eg, GitHub issues, author contact information
\n\n`````\n:::\n:::\n\n\n::: {.column-margin}\n
\n
\nEvery criteria is scored on the following scale: \n\n- 0: Criteria not fulfilled\n- 1: Criteria fulfilled but not entirely\n- 2: Criteria fulfilled \n\n
\nThis is then translated to an **end-user score** and **development score** for the tool. \n:::\n\n### Tool evaluation\nEvery *Plasmodium* genomic analysis tool can be evaluated against these objective software standards to provide these scores. The resulting evaluation matrix and overview of each tool can be found [here](tools_to_standards.qmd).\n\n", + "supporting": [ + "software_standards_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/website_docs/tool_landscaping/execute-results/html.json b/_freeze/website_docs/tool_landscaping/execute-results/html.json new file mode 100644 index 0000000..edbf80c --- /dev/null +++ b/_freeze/website_docs/tool_landscaping/execute-results/html.json @@ -0,0 +1,20 @@ +{ + "hash": "ab73f3dc63bbdda4d7ac5d2413505da2", + "result": { + "markdown": "---\ntitle: \"Tool landscaping\"\nformat: html\n---\n\n\n## Overview\nIn line with the scope of PGEforge, we focus our efforts on landscaping available tools that are commonly applied to *Plasmodium* genetic data and that focus on downstream analysis. Tools are considered within this scope if they:\n\n- Focus on downstream analysis tools. This includes tools whose primary goal is to extract signal from pre-processed data, but does not include tools that are primarily used within upstream bioinformatic steps, such as variant callers and quality filters.\n- Focus on *Plasmodium* genetics, including both *P. falciparum* and *P. vivax*. \n\nIn our initial landscaping, we did not consider applications to mosquito genetics or many broader population genetics tools, despite some tools and techniques being applicable for these purposes. However, we encourage contributions to this and anything else within the scope of PGEforge (i.e. *Plasmodium* genomic epidemiology tools), please see our [contributor guidelines](how_to_contribute.qmd) and some of our planned areas of [future work](future_work.qmd).\n\n## Landscaping matrix\n\n\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n", + "supporting": [ + "tool_landscaping_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n\n\n\n\n\n\n\n\n\n\n\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/website_docs/tools_to_functions/execute-results/html.json b/_freeze/website_docs/tools_to_functions/execute-results/html.json new file mode 100644 index 0000000..02b6d10 --- /dev/null +++ b/_freeze/website_docs/tools_to_functions/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "e25debd0d9041725cd8b9059b05ffbc4", + "result": { + "markdown": "---\ntitle: \"Tools to functions\"\nformat: html\n---\n\n\n\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n", + "supporting": [ + "tools_to_functions_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/website_docs/tools_to_functions/figure-html/unnamed-chunk-3-1.png b/_freeze/website_docs/tools_to_functions/figure-html/unnamed-chunk-3-1.png new file mode 100644 index 0000000..86bb707 Binary files /dev/null and b/_freeze/website_docs/tools_to_functions/figure-html/unnamed-chunk-3-1.png differ diff --git a/_freeze/website_docs/tools_to_standards/execute-results/html.json b/_freeze/website_docs/tools_to_standards/execute-results/html.json new file mode 100644 index 0000000..849753a --- /dev/null +++ b/_freeze/website_docs/tools_to_standards/execute-results/html.json @@ -0,0 +1,20 @@ +{ + "hash": "f9f99446c068a5ecb2c2989043c7930a", + "result": { + "markdown": "---\ntitle: \"Overview of tools based on software standards\"\nformat: html\n---\n\n\n\n\n\n\n\n::: {.cell}\n\n:::\n\n\n----------\n\nThe following *Plasmodium* genomic analysis tools were identified during [tool landscaping](tool_landscaping.qmd) and were evaluating using the [software standards criteria](software_standards.qmd) to determine an end-user and development score for each tool. \n\nIn line with the scope of PGEforge, we focus our efforts on evaluating available tools that are commonly applied to *Plasmodium* genetic data and that focus on downstream analysis. In other words tools with the primary goal of extracting signal from pre-processed data, not those focused on upstream bioinformatic data processing.\n\nIf you would like to contribute to this effort, please take a look at our [contributor guidelines](how_to_contribute.qmd)!\n\n\n*Note: tools in grey have not yet been evaluated* \n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n", + "supporting": [ + "tools_to_standards_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n\n\n\n\n\n\n\n\n\n\n\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/website_docs/tutorials_overview/execute-results/html.json b/_freeze/website_docs/tutorials_overview/execute-results/html.json new file mode 100644 index 0000000..d27ca5b --- /dev/null +++ b/_freeze/website_docs/tutorials_overview/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "956c7fa36d556886a4be4028fbd319db", + "result": { + "markdown": "---\ntitle: \"Tutorials\"\nsubtitle: \"Comprehensive guides covering the entire process of using *Plasmodium* genomic analysis tools\"\nformat: html\n---\n\n\n## Overview\nThe aim of the PGEforge tutorials is to provide worked examples that show *how* to use a tool by *working through* code. This involves code showing you how to install the tool, what input data formats you need to use, how to wrangle the data (if applicable), and how to use the tool functionalities to analyse data. Summary documents detailing the purpose of each tool can also help you decide which tool you may want to use for a certain application.\n\n:::{.column-margin}\nHere is a simplified example!\n\n::: {.cell}\n\n```{.r .cell-code}\n# my favorite tool\ninstall(\"myfavoritetool\")\n\nthe_best_data <- read.csv(\"simulated_data_from_PGEforge.csv\")\n\nthe_best_analysis <- myfavoritetool::analysis_function(the_best_data)\nplot(the_best_analysis)\n```\n:::\n\n\nAnd of course, your output will be: `the_best_results!` :) \n:::\n\nThe following resources are available for each tool:\n\n- A summary document detailing the main purpose and use cases, license, code repository, relevant publication(s), citation information and links to any additional resources\n- Complete installation instructions\n- A fully reproducible and worked-through tutorial showing example usage of the tool and its functionalities. This often uses the [canonical simulated or empirical datasets](data_description.qmd) as input data\n\n## How to contribute\n\nThis is a live resource and we plan to continue adding to this as new tools become available! We hope this will grow into a common resource for analysis of malaria genetic data. \n\nIf you are interested in contributing, there are [templates available](https://mrc-ide.github.io/PGEforge/tutorials/Template/Template_background.html) and [instructions](how_to_contribute.qmd) on how to get started. ", + "supporting": [ + "tutorials_overview_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml index 134b259..1ef39fa 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -20,7 +20,7 @@ website: page-footer: background: "#f9eaf1" foreground: "#c82f61" - center: "This website is built with ❤️ and [Quarto](https://quarto.org/). © 2024" + center: "This website is built with 💖️, and [quarto](https://quarto.org/). © 2024" sidebar: background: "#f9eaf1" @@ -36,20 +36,28 @@ website: contents: - href: index.qmd text: Welcome - - section: "Data" + - section: website_docs/data_description.qmd contents: - - website_docs/data_description.qmd - #- text: "The data-DAG (TODO)" + - text: "WGS" + file: website_docs/data_wgs.qmd + - text: "Microhaplotypes" + file: website_docs/data_mhaps.qmd + - text: "SNPs" + file: website_docs/data_snps.qmd + - text: "pfhrp2/3 deletion counts" + file: website_docs/data_counts.qmd - section: "Tool landscaping" contents: - text: "The landscaping matrix" file: website_docs/tool_landscaping.qmd - text: "Installing via the plasmogenepi.r-universe" file: website_docs/R_universe.qmd - - section: "Tutorials" + - section: website_docs/software_standards.qmd + contents: + - text: "Overview by tool" + file: website_docs/tools_to_standards.qmd + - section: website_docs/tutorials_overview.qmd contents: - - text: "**Overview of all tutorials**" - file: website_docs/tutorials_overview.qmd - section: "coiaf" contents: - text: "Summary" @@ -94,7 +102,7 @@ website: file: tutorials/estMOI/installation.Rmd - text: "Analyze data with `estMOI`" file: tutorials/estMOI/analysis.Rmd - - section: "Hierfstat" + - section: "hierfstat" contents: - text: "Summary" file: tutorials/hierfstat/background.Rmd @@ -112,30 +120,6 @@ website: file: tutorials/hmmIBD/hmmIBD_installation.md - text: "Basic IBD analysis with `hmmIBD`" file: tutorials/hmmIBD/hmmIBD_analysis.md - - section: "hmmibdr" - contents: - - text: "Summary" - file: tutorials/hmmibdr/hmmibdr_background.Rmd - - section: "malaria.em" - contents: - - text: "Summary" - file: tutorials/malariaem/malariaem_background.qmd - - section: "MalHaploFreq" - contents: - - text: "Summary" - file: tutorials/MalHaploFreq/MalHaploFreq_background.qmd - - section: "MLMOI" - contents: - - text: "Summary" - file: tutorials/MLMOI/Template_background.Rmd - # - section: "moimix" - # contents: - # - text: "Summary" - # file: tutorials/moimix/moimix_background.Rmd - # - text: "Installation" - # file: tutorials/moimix/moimix_installation.Rmd - # - text: "Analyze data with `moimix`" - # file: tutorials/moimix/moimix_analysis.Rmd - section: "moire" contents: - text: "Summary" @@ -160,12 +144,12 @@ website: file: tutorials/paneljudge/paneljudge_installation.Rmd - text: "Tutorial for Running `paneljudge`" file: tutorials/paneljudge/paneljudge_analysis.Rmd - - section: "rehh" - contents: - - text: "Summary" - file: tutorials/rehh/Template_background.Rmd - - text: "Installation" - file: tutorials/rehh/Template_installation.Rmd + # - section: "rehh" + # contents: + # - text: "Summary" + # file: tutorials/rehh/Template_background.Rmd + # - text: "Installation" + # file: tutorials/rehh/Template_installation.Rmd # - text: "Analyze data with `rehh`" # file: tutorials/rehh/Template_analysis.Rmd - section: "THEREALMcCOIL" @@ -176,14 +160,11 @@ website: file: tutorials/THEREALMcCOIL/RMCL_installation.Rmd - text: "Analyze data with `THEREALMcCOIL`" file: tutorials/THEREALMcCOIL/RMCL_analysis.Rmd - - section: "Software standards" - contents: - - website_docs/software_standards.qmd - - text: "Overview by tool" - file: website_docs/tools_to_standards.qmd - - section: "Analysis workflows" + - section: website_docs/workflows.qmd contents: - - website_docs/workflows.qmd + - text: "Use cases" + file: website_docs/use_cases.qmd + - text: "Coming soon!" - section: "How to contribute" contents: - website_docs/how_to_contribute.qmd @@ -193,6 +174,8 @@ website: file: website_docs/radish23.qmd - href: website_docs/contributors.qmd text: "Contributors" + - href: website_docs/future_work.qmd + text: "Future work" format: html: @@ -200,6 +183,8 @@ format: toc: true # fontcolor: "#c82f61" linkcolor: "#c82f61" + header-includes: | + execute: freeze: auto diff --git a/index.qmd b/index.qmd index 29c00c4..c456d92 100644 --- a/index.qmd +++ b/index.qmd @@ -3,14 +3,24 @@ title: "PGEforge" subtitle: "Resources for enhancing **P**lasmodium **G**enomic **E**pidemiology analysis" --- -
-Your resource hub for
*Plasmodium* genomic analysis -
+ + + + + + +![](website_docs/img/pgeforge_header.png) ## Welcome PGEforge is a community-driven platform designed to simplify *Plasmodium* genomic data analysis. The process of analyzing genomic data often involves various software tools with different formats, user interfaces, and levels of accessibility. These differences can create barriers, especially for those without strong computational skills. +Although there are numerous existing software tools available to analyze data for malaria genomic surveillance, there is little guidance outlining how to choose, use, or assemble these tools to translate genetic data into interpretable and actionable results. For example, estimating the extent of antimalarial resistance from sequence data requires several different steps depending on the number and type of relevant polymorphisms and outcomes of interest. To improve the accessibility, accuracy, and reproducibility of genomic surveillance, a roadmap is needed to guide the specific analysis functionalities needed for any given end result, as well as knowledge of which tools are available to perform those functionalities. + +An adjacent but extremely important challenge is the large variation in software standards of currently available tools for analyzing *Plasmodium* genomic data, often limiting the accessibility and utility of these tools. Many tools are challenging to use due to inadequate documentation, difficulty in installation due to operating system incompatibility or poorly documented dependencies, and requirements for data to be input in specific, nonstandard formats. + +## Our motivation + PGEforge aims to overcome these challenges by providing clear tutorials, streamlined workflows, and comprehensive resources to help researchers at all levels understand and analyze genomic data. By prioritizing the end-user, we strive to encourage better software development and foster an inclusive research community. Our goal is to make advanced genomic analysis accessible to everyone, promoting collaboration and accelerating progress in malaria research. ## How to use this site @@ -18,9 +28,9 @@ The site is organized to help you efficiently access and utilize our resources. - [Data](website_docs/data_description.qmd): Access a curated selection of datasets commonly used in genomic analysis. These datasets help you familiarize yourself with standard data formats and are used consistently across our tutorials, ensuring easy comparison of tools. - [Tool Landscaping](website_docs/tool_landscaping.qmd): Discover a wide range of analysis tools, complete with basic information on their functions, where to find them, and whether we offer tutorials for them. We also introduce the “PlasmoGenEpi R-universe,” a website that simplifies tool installation. -- [Tutorials](website_docs/tutorials_overview.qmd): Explore comprehensive guides covering the entire process of installing software, formatting data, running basic functions, and interpreting outputs. These tutorials are designed to make complex tasks straightforward and accessible. - [Software Standards](website_docs/tools_to_standards.qmd): Learn about our objective software standards aimed at guiding developers towards best practices and creating user-friendly tools. -- [Analysis Workflows](website_docs/workflows.qmd): Understand how different tools can be integrated to address common questions in malaria genomic epidemiology. This section outlines typical workflows and maps tools to various analytical steps. +- [Tutorials](website_docs/tutorials_overview.qmd): Explore comprehensive guides covering the entire process of installing software, formatting data, running basic functions, and interpreting outputs. These tutorials are designed to make complex tasks straightforward and accessible. +- [Analysis Workflows](website_docs/workflows.qmd): Understand how different tools can be integrated to address common questions in malaria genomic epidemiology. This section outlines eight defined use cases, typical analysis workflows and maps tools to various analytical steps. - [How to Contribute](website_docs/how_to_contribute.qmd): Find out how you can contribute to this community-driven resource. We welcome input from all areas of the research community to continuously improve and expand our platform. diff --git a/website_docs/R_universe.qmd b/website_docs/R_universe.qmd index 189f302..a71eddc 100644 --- a/website_docs/R_universe.qmd +++ b/website_docs/R_universe.qmd @@ -1,8 +1,51 @@ --- -title: "Installing via the plasmogenepi.r-universe" +title: "The PlasmoGenEpi R-universe" +subtitle: "The package ecosystem simplifying tool installation" format: html --- -# Overview -PGEforge hosts simulated and empirical datasets of: +The rOpenSci [R-universe system](https://ropensci.org/r-universe/) provides a framework for users and developers of R-packages to create a package 'ecosystem' in a **simple** and **accessible** way. The R-universe build infrastructure provides a high-performance package server, integrated monitoring tools, and REST APIs, including a front-end dashboard, to provide a dynamics and automated R package repository. +::: {.column-margin} +![rOpenSci R-universe logo](img/runiverse_logo.png) +::: + +This infrastructure provides pre-built packages, which greatly simplifies installation of non-CRAN packages and those with complex dependencies such as C++ compilers. The R-universe platform is used by professionals from different industries including brain research, climate change, infectious disease research and government data analysis. If you are interested in learning more about the R-universe infrastructure, you can check out this [blog post](https://doi.org/10.59350/cq5fj-wj639) and we encourage you to explore the resources at the [rOpenSci R-universe website](https://ropensci.org/r-universe/) and the [R-universe dashboard](https://r-universe.dev/search/). + + +### The [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds) +The [PlasmoGenEpi](https://www.plasmogenepi.org) community has created a dedicated [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds) with R packages relevant to *Plasmodium* genomic analyses. + +The dashboard build page can be seen below, where a timeline of updates to the packages is displayed as a bar chart, and below it the general information for each package, including links to the Github repository, who is the main maintaner of the package, and when it was last built. + +![The PlasmoGenEpi R-universe build dashboard](img/pge_runiv_screenshot.png) + +You can navigate through to the 'Packages' tab for more detailed descriptions of each package, as seen below. + +![The PlasmoGenEpi R-universe package dashboard](img/pge_pkgs_screenshot.png) + +You can navigate through to the 'API' tab for more details on how to access the r-universe via its API, as seen below. + +![The PlasmoGenEpi R-universe package dashboard](img/pge_api_screenshot.png) + +### Installation instructions + +Having R packages within the R-universe ecosystem makes the installation process **much** simpler! This is an important aim of our community to ensure that the R packages available for analysis of *Plasmodium* genomic analysis are easily accessible to end-users. + + + +Below we show an example of how you could install the DRpower package via the [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/DRpower). + +::: {.column-margin} +![You can see how easy this will be with one line of code! 🙌 +](https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExcDRwN2wxMjN0MnRxbnY0bDB3NDFueThhOXdsZzdiOGNvcXU0bzNtYyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3o8dFn5CXJlCV9ZEsg/giphy.gif){ width=300 height=200 } +::: + +```{r eval=F} +# Install 'DRpower' in R: +install.packages('DRpower', + repos = c('https://plasmogenepi.r-universe.dev', + 'https://cloud.r-project.org')) +``` + +All the [tool tutorials](tutorials_overview.qmd) in PGEforge have installation instructions, and where applicable, provide instructions for installation via the [plasmogenepi.r-universe](https://plasmogenepi.r-universe.dev/builds). \ No newline at end of file diff --git a/website_docs/contributors.qmd b/website_docs/contributors.qmd index 15091d5..86f5363 100644 --- a/website_docs/contributors.qmd +++ b/website_docs/contributors.qmd @@ -5,101 +5,48 @@ format: html ## Main contributors -PGEforge was supported by the following people, listed alphabetically. - -#### Jorge Amaya-Romero - -![](img/people/jorge.jpeg){fig-align="left" width="100px"} - -**Affiliations:** Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health - -#### Sophie Bérubéi - -![](img/people/SophieB.jpeg){fig-align="left" width="100px"} - -**Affiliations:** Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health - -#### Nick Brazeau - -![](img/people/Brazeau_NF.jpg){fig-align="left" width="100px"} - -**Affiliations:** PGY-1, Internal Medicine, Duke University Medical Center - -#### Bryan Greenhouse - -![](img/people/UCSF_20170322_B_Greenhouse_426.jpeg){fig-align="left" width="100px"} - -**Affiliations:** EPPIcenter, University of California San Francisco - -#### Nicholas Hathaway - -![](img/people/hathaway_nicholas_headshot.jpg){fig-align="left" width="100px"} - -**Affiliations:** Department of Medicine, University of Massachusetts Chan Medical School - -#### Jason Hendry - -![](img/people/jason.jpeg){fig-align="left" width="100px"} - -**Affiliations:** Max Planck Institute for Infection Biology - -#### Kirsty McCann - -![](img/people/KirstyMcCann.jpg){fig-align="left" width="100px"} - -**Affiliations:** Centre for Innovation in Infectious Disease and Immunology Research (CIIDIR) -The Institute for Mental and Physical Health and Clinical Translation (IMPACT), School of Medicine, Deakin University - -#### Kathryn Murie - -![](img/people/kathryn.jpeg){fig-align="left" width="100px"} - -**Affiliations:** EPPIcenter, University of California San Francisco - -#### Max Murphy - -![](img/people/max.jpeg){fig-align="left" width="100px"} - -**Affiliations:** EPPIcenter, University of California San Francisco - -#### Karamoko Niare - -![](img/people/Karamoko_Photo.jpeg){fig-align="left" width="100px"} - -**Affiliations:** Brown University - -#### Jody Phelan - -![](img/people/jody.jpeg){fig-align="left" width="100px"} - -**Affiliations:** London School of Hygiene and Tropical Medicine - -#### Shazia Ruybal-Pesántez - -![](img/people/shazia.jpg){fig-align="left" width="100px"} - -**Affiliations:** MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London; Instituto de Microbiología, Universidad San Francisco de Quito - -#### Stephen F. Schaffner - -![](img/people/steve_schaffner.jpg){fig-align="left" width="100px"} - -**Affiliations:** Broad Institute of Harvard and MIT; Department of Organismic and Evolutionary Biology, Harvard University; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University - -#### Alfred Simkin - -![](img/people/alfred.jpeg){fig-align="left" width="100px"} - -**Affiliations:** Pathology and Laboratory Medicine, Brown University - -#### Bob Verity - -![](img/people/bob.jpeg){fig-align="left" width="100px"} - -**Affiliations:** MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London - -#### Amy Wesolowski - -![](img/people/Wesolowski.jpg){fig-align="left" width="100px"} - -**Affiliations:** Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health \ No newline at end of file +PGEforge was supported by the following people, listed alphabetically. + ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Name | | Affiliations | ++=============================+=====================================================================================+============================================================================================================================================================================================================================+ +| #### Jorge Amaya-Romero | ![](img/people/jorge.jpeg){fig-align="left" width="100px"} | Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Sophie Bérubé | ![](img/people/SophieB.jpeg){fig-align="left" width="100px"} | Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Nick Brazeau | ![](img/people/Brazeau_NF.jpg){fig-align="left" width="100px"} | PGY-1, Internal Medicine, Duke University Medical Center | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Mouhamadou Fadel Diop | ![](img/people/fadel.jpeg){fig-align="left" width="126"} | Medical Research Council Unit The Gambia at the London School of Hygiene and Tropical Medicine | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Bryan Greenhouse | ![](img/people/UCSF_20170322_B_Greenhouse_426.jpeg){fig-align="left" width="100px"} | EPPIcenter, University of California San Francisco | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Nicholas Hathaway | ![](img/people/hathaway_nicholas_headshot.jpg){fig-align="left" width="100px"} | Department of Medicine, University of Massachusetts Chan Medical School | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Jason Hendry | ![](img/people/jason.jpeg){fig-align="left" width="100px"} | Max Planck Institute for Infection Biology | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Kirsty McCann | ![](img/people/KirstyMcCann.jpg){fig-align="left" width="100px"} | Centre for Innovation in Infectious Disease and Immunology Research (CIIDIR) The Institute for Mental and Physical Health and Clinical Translation (IMPACT), School of Medicine, Deakin University | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Kathryn Murie | ![](img/people/kathryn.jpeg){fig-align="left" width="100px"} | EPPIcenter, University of California San Francisco | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Max Murphy | ![](img/people/max.jpeg){fig-align="left" width="100px"} | EPPIcenter, University of California San Francisco | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Karamoko Niare | ![](img/people/Karamoko_Photo.jpeg){fig-align="left" width="100px"} | Brown University | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Jody Phelan | ![](img/people/jody.jpeg){fig-align="left" width="100px"} | London School of Hygiene and Tropical Medicine | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Shazia Ruybal-Pesántez | ![](img/people/shazia.jpg){fig-align="left" width="100px"} | MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London; Instituto de Microbiología, Universidad San Francisco de Quito | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Stephen F. Schaffner | ![](img/people/steve_schaffner.jpg){fig-align="left" width="100px"} | Broad Institute of Harvard and MIT; Department of Organismic and Evolutionary Biology, Harvard University; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Alfred Simkin | ![](img/people/alfred.jpeg){fig-align="left" width="100px"} | Pathology and Laboratory Medicine, Brown University | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Aimee Taylor | ![](img/people/Taylor-A.jpg){fig-align="left" width="108"} | Institut Pasteur | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Bob Verity | ![](img/people/bob.jpeg){fig-align="left" width="100px"} | MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| #### Amy Wesolowski | ![](img/people/Wesolowski.jpg){fig-align="left" width="100px"} | Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health | ++-----------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +## Community rules + +All contributors to PGEforge adhere to our [community rules](how_to_contribute.qmd#community-rules). diff --git a/website_docs/data_counts.qmd b/website_docs/data_counts.qmd new file mode 100644 index 0000000..92284a5 --- /dev/null +++ b/website_docs/data_counts.qmd @@ -0,0 +1,8 @@ +--- +title: "*pfhrp2/3* deletion count data" +format: html +bibliography: ../references/references.bib +--- + +## *Pfhrp2/3* gene deletion count data +The *pfhrp2/3* gene deletion count data is available within the subfolder `pfhrp2-3_counts`. The data come from a study by [Feleke et al. (2021)](https://doi.org/10.1038/s41564-021-00962-4).[@Feleke2021-bo] diff --git a/website_docs/data_description.qmd b/website_docs/data_description.qmd index f91f2a7..5778821 100644 --- a/website_docs/data_description.qmd +++ b/website_docs/data_description.qmd @@ -1,90 +1,87 @@ --- -title: "Data description" +title: "Data" format: html -bibliography: ../references/references.bib --- -# Overview -PGEforge hosts simulated and empirical datasets of: -- [Whole genome sequencing (WGS)](whole-genome-sequencing-wgs-data-for-p.-falciparum) -- [Microhaplotype data](#microhaplotype-data) -- [SNP barcoding data](#snp-barcoding-data) -- [*pfhrp2/3* deletion count data](#pfhrp23-gene-deletion-count-data) +## Overview of data formats in *Plasmodium* genomics -They can be located at the [PGEforge/data](https://github.com/mrc-ide/PGEforge/tree/main/data) folder. +Understanding the different types of genomic data formats is essential for anyone involved in *Plasmodium* genomic analysis, especially beginners or end-users of various analysis tools. Here we briefly cover the primary data formats used in malaria research and common input file formats such as `.vcf`, `.fasta`, and others. -## Whole genome sequencing (WGS) data for *P. falciparum* +#### Genomic data generation +Before we jump into the different data formats, it is important to understand how malaria genomic data are generated. The variety in data formats arises from different molecular marker panels and available methodologies. The figure below illustrates how different genotyping approaches capture distinct aspects of the *Plasmodium* genome and how genomic data are 'generated' using different techniques and genotyping platforms. Next-generation sequencing is now the most common approach due to significant decreases in cost and its high-throughput nature both for whole genome sequencing but also sequencing of specific molecular markers or genomic regions of interest, also known as [targeted sequencing](https://eu.idtdna.com/pages/technology/next-generation-sequencing/dna-sequencing/targeted-sequencing). -### Pf3k -All of the data within the subfolder `wgs/pf3k` was derived from the [Pf3k Project](https://www.malariagen.net/parasite/pf3k). Currently, there are three VCF files, with corresponding CSVs containing metadata, for samples from: -- Democratic Republic of the Congo ($n=113$) -- Vietnam ($n=97$) -- *In vitro* mixtures of laboratory strains ($n=25$) +::: {.column-margin} +
+Targeted sequencing is similar to WGS, but sample preparation workflow requires an extra step: target enrichment either by [hybridization capture](https://eu.idtdna.com/pages/technology/next-generation-sequencing/dna-sequencing/targeted-sequencing/hybridization-capture) or [PCR amplicons](https://eu.idtdna.com/pages/technology/next-generation-sequencing/dna-sequencing/targeted-sequencing/amplicon-sequencing). These methods are often used for enriching longer and shorter genomic regions, respectively. +::: -Each VCF contains 247,496 high-quality (VQSLOD>6) biallelic SNPs across all fourteen somatic chromosomes. The VCFs are sorted and an index file is provided. The Fws statistics provided in the metadata CSVs were collected from the [Pf7 data set](https://www.malariagen.net/sites/default/files/Pf7_fws.txt), which contains the Pf3k samples. These were not calculated for the *in vitro* lab mixtures. +![Figure sourced from [Ruybal-Pesántez et al 2024, *Molecular markers for malaria genetic epidemiology: progress and pitfalls*](https://doi.org/10.1016/j.pt.2023.11.006)](img/genetic_variation_types.png) -### Simulated -All of the data within the subfolder `wgs/simulated` was simulated. In brief, a simulated sample with a given complexity of infection (COI), $K$, is created by randomly sampling $K$ clonal haplotypes ($F_{ws} > 0.95$) from a given country within the [Pf3k Project](https://www.malariagen.net/parasite/pf3k), assigning these haplotypes to $j \leq K$ bites, simulating meiosis if $j < K$, randomly sampling proportions for each haplotype, and then simulating read count data given the proportions and final genotypes. Sequencing error is simulated at a fixed rate and present in the read counts. No variant calling error is simulated; the genotypes are perfect. At present, there is only one VCF file with a corresponding CSV and BED file containing metadata, with samples simulated from: -- Democratic Republic of the Congo ($n=40$) +::: {.column-margin} +
+
+Raw sequencing data is processed by bioinformatic pipelines to call alleles and haplotypes. The outputs of this process, such as `VCF`, `FASTA`, or haplotype tables, serve as input formats for downstream data analysis tools. +::: -The COI of these samples ranges from one to four, and about half of them have within-host relatedness. +#### Data processing +Raw sequence data from next-generation sequencing platforms is processed through bioinformatic pipelines into structured formats suitable for further downstream genomic analysis. Typically this process involves a series of steps. Initially, the raw data undergoes quality control to remove low-quality reads and contaminants. The cleaned data is then aligned to a reference genome, which helps identify the position of each sequence read. From this alignment, variants can be 'called', which means variants are identified through differences between the sample and the reference genome. These variants are compiled into Variant Call Format (`.vcf`) files. Alternatively, the entire cleaned and aligned sequences can be compiled into FASTA format (`.fasta`) files, which represent the sequences in a text-based format. Additionally, haplotypes may also be called, which means combinations of alleles at multiple loci are identified. These haplotypes are often tabulated in `.csv` or text-based formats. +::: {.column-margin} +
+
+Bioinformatic pipelines in malaria genomics need to account for the unique characteristics of *Plasmodium* genomes, such as high AT content and extensive polymorphism. +::: -### Lab isolates sub-setted -There are a set of bam files with vcf calls subsetting to just CSP (PF3D7_0304600), CELTOS (PF3D7_1133400), and AMA1 (PF3D7_1216600). These can be found within the [`wgs/labisolate_subset`](https://github.com/mrc-ide/PGEforge/tree/main/data/wgs/labisolate_subset) directory. With metadata describing what is in each file [`wgs/labisolate_subset/allControlMixtures.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/wgs/labisolate_subset/allControlMixtures.tab.txt), [`wgs/labisolate_subset/allControlSampNameToMixName.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/wgs/labisolate_subset/allControlSampNameToMixName.tab.txt) +#### Common input formats +When working with *Plasmodium* genomic data, several input file formats are commonly used across different analysis tools: -## Microhaplotype data +::: {.column-margin style="font-size: 8px"} + -### Mozambique Field Samples + + + + + + + + + + + + +::: -Targeted amplicon data from analysis for the following paper "Sensitive, Highly Multiplexed Sequencing of Microhaplotypes From the Plasmodium falciparum Heterozygome"[@Tessema2022-ba] +- [`.vcf` (Variant Call Format)](https://en.wikipedia.org/wiki/Variant_Call_Format): Widely used for storing gene sequence variants, such as SNPs, insertions, deletions, and other types of genetic variants. The standard format includes a mandatory header starting with # or ## in the case of 'special header keywords', which provides the metadata such as file format, reference genome. This is followed by the 'body', which is tab-separated and each line represents a variant and its relevant information, such as chromosome and position, reference and alternate allele(s), and sample genotype information. -This contains 82 field samples gathered from northern and southern Mozambique and had 100 targets (91 diversity targets and 9 targeted drug targets). +![](img/vcf_format.png) -The results file can be found within directory [`amplicon/moz2018_heome1_results_fieldSamples.tsv.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_heome1_results_fieldSamples.tsv.gz) along with metadata [`amplicon/moz2018_fieldSamples_meta.tsv`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_fieldSamples_meta.tsv). Results are in a 4 column format. +::: {.column-margin} +
+
+Figure sourced and modified from [Wikipedia](https://en.wikipedia.org/wiki/Variant_Call_Format) +::: -* **sample** - The name of the sample -* **target** - The name of the amplicon target -* **target_popUID** - A population identifier for the haplotype for this target for this sample -* **readCnt** - The read count for this haplotype for this sample for this target +- [`.fasta`](https://en.wikipedia.org/wiki/FASTA_format): Widely used for representing nucleotide and protein sequences, in which nucleotides or amino acids are represented using single-letter codes. Each entry begins with a header line starting with > followed by a unique sequence ID and description line, and then the sequence data itself, with one letter per nucleic or amino acid. -### Lab Control mixtures +![](img/fasta_format.png) -Targeted amplicon data from the same 100 target panel as [above](#mozambique-field-samples). Mixtures are made of various combinations of 7 lab strains of P. falciparum and with some mixtures done in replicate at different 4 different parasite densities (10, 100, 1k, 10K. +::: {.column-margin} +
+Figure sourced from [Wikipedia](https://en.wikipedia.org/wiki/FASTA_format) +::: -![Parasite Densities](img/moz2018_fieldSamples_parasiteDensities.png) +- `.txt` or `.csv`: Simple text and comma-separated values formats that are often used to represent haplotype or count data. -![Parasite Mixtures](img/moz2018_fieldSamples_parasiteMixtures.png) +## Available datasets in PGEforge +As part of the PGEforge community resource, we have compiled simulated and empirical datasets of these common data formats. These datasets are used in the [tool tutorials](tutorials_overview.qmd) to make them fully reproducible and are freely available for anyone to use. -Results are organized in a similar 4 column table as above. The results file can be found within directory [`amplicon/moz2018_heome1_results_controlSamples.tsv.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_heome1_results_controlSamples.tsv.gz) along with metadata [`amplicon/moz2018_controlSamples_meta.tsv`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_controlSamples_meta.tsv), [`amplicon/samplesToMixFnp.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/samplesToMixFnp.tab.txt), [`amplicon/mixSetUpFnp.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/mixSetUpFnp.tab.txt). +More details on the datasets hosted on PGEforge can be found in the links below: -### Simulated data +- [Whole genome sequencing (WGS)](data_wgs.qmd) +- [Microhaplotype data](data_mhaps.qmd) +- [SNP barcoding data](data_snps.qmd) +- [*pfhrp2/3* deletion count data](data_counts.qmd) -Targeted amplicon data was also simulated *in silico* to create 100 samples sampled from Mozambique and for a newer diversity panel called MAD^4HatTeR with 50 targets selected for thier diversity. - -Results are organized in a similar 4 column table as above. The results file can be found within directory [`amplicon/mozSim_MAD4HATTERDiversitySubPanel.tab.txt.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/mozSim_MAD4HATTERDiversitySubPanel.tab.txt.gz) - - -## SNP barcoding data - -SNP barcode data from the sanger 100 SNP Plasmodium falciparum barcode [@Chang2019-ar]. - -[sanger101_snp_barcode_withGenes.bed](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/sanger101_snp_barcode_withGenes.bed) - -### Field Samples - -The barcode was subsetted from the [above WGS data](#pf3k) to just the sanger barcode for the Vietnam and DRC data. The results file can be found within directory [`snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.Vietnam.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.Vietnam.vcf.gz), [`snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.DRCongo.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.DRCongo.vcf.gz) - -### Lab Isolates - -The barcode was also explicitly called with several monoclonal lab isolates and then lab created mixtures of these isolates. Data can be found [`snp_barcode/controls_sanger100.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/controls_sanger100.vcf.gz) with meta data with what mixtures are what found [`snp_barcode/allControlMixtures.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/allControlMixtures.tab.txt) and [`snp_barcode/allControlSampNameToMixName.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/allControlSampNameToMixName.tab.txt) - -### Simulated - -The barcode was also simulated for 100 samples (50 Bangladesh and 50 Ghana). Data can be found [`snp_barcode/SpotMalariapfPanel_simData_sanger100.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/SpotMalariapfPanel_simData_sanger100.vcf.gz). The simulations were created by simulating super infections by sampling the barcode from each of these countries and selecting COIs based on the COIs observed for each country. -To use data without indels, the data can be found [`snp_barcode/SpotMalariapfPanel_simData_snponly_sanger100.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/SpotMalariapfPanel_simData_snponly_sanger100.vcf.gz). - - -## *Pfhrp2/3* gene deletion count data -The *pfhrp2/3* gene deletion count data is available within the subfolder `pfhrp2-3_counts`. The data come from a study by [Feleke et al. (2021)](https://doi.org/10.1038/s41564-021-00962-4).[@Feleke2021-bo] +They can be accessed at the [PGEforge/data](https://github.com/mrc-ide/PGEforge/tree/main/data) folder on Github. \ No newline at end of file diff --git a/website_docs/data_mhaps.qmd b/website_docs/data_mhaps.qmd new file mode 100644 index 0000000..b525dc5 --- /dev/null +++ b/website_docs/data_mhaps.qmd @@ -0,0 +1,36 @@ +--- +title: "Microhaplotype data" +format: html +bibliography: ../references/references.bib +--- + +## Microhaplotype data + +### Mozambique Field Samples + +Targeted amplicon data from analysis for the following paper "Sensitive, Highly Multiplexed Sequencing of Microhaplotypes From the Plasmodium falciparum Heterozygome"[@Tessema2022-ba] + +This contains 82 field samples gathered from northern and southern Mozambique and had 100 targets (91 diversity targets and 9 targeted drug targets). + +The results file can be found within directory [`amplicon/moz2018_heome1_results_fieldSamples.tsv.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_heome1_results_fieldSamples.tsv.gz) along with metadata [`amplicon/moz2018_fieldSamples_meta.tsv`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_fieldSamples_meta.tsv). Results are in a 4 column format. + +* **sample** - The name of the sample +* **target** - The name of the amplicon target +* **target_popUID** - A population identifier for the haplotype for this target for this sample +* **readCnt** - The read count for this haplotype for this sample for this target + +### Lab Control mixtures + +Targeted amplicon data from the same 100 target panel as [above](#mozambique-field-samples). Mixtures are made of various combinations of 7 lab strains of P. falciparum and with some mixtures done in replicate at different 4 different parasite densities (10, 100, 1k, 10K. + +![Parasite Densities](img/moz2018_fieldSamples_parasiteDensities.png) + +![Parasite Mixtures](img/moz2018_fieldSamples_parasiteMixtures.png) + +Results are organized in a similar 4 column table as above. The results file can be found within directory [`amplicon/moz2018_heome1_results_controlSamples.tsv.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_heome1_results_controlSamples.tsv.gz) along with metadata [`amplicon/moz2018_controlSamples_meta.tsv`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/moz2018_controlSamples_meta.tsv), [`amplicon/samplesToMixFnp.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/samplesToMixFnp.tab.txt), [`amplicon/mixSetUpFnp.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/mixSetUpFnp.tab.txt). + +### Simulated data + +Targeted amplicon data was also simulated *in silico* to create 100 samples sampled from Mozambique and for a newer diversity panel called MAD^4HatTeR with 50 targets selected for thier diversity. + +Results are organized in a similar 4 column table as above. The results file can be found within directory [`amplicon/mozSim_MAD4HATTERDiversitySubPanel.tab.txt.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/amplicon/mozSim_MAD4HATTERDiversitySubPanel.tab.txt.gz) \ No newline at end of file diff --git a/website_docs/data_snps.qmd b/website_docs/data_snps.qmd new file mode 100644 index 0000000..5227c63 --- /dev/null +++ b/website_docs/data_snps.qmd @@ -0,0 +1,24 @@ +--- +title: "Single nucleotide polymorphism (SNP) data" +format: html +bibliography: ../references/references.bib +--- + +## SNP barcoding data + +SNP barcode data from the sanger 100 SNP Plasmodium falciparum barcode [@Chang2019-ar]. + +[sanger101_snp_barcode_withGenes.bed](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/sanger101_snp_barcode_withGenes.bed) + +### Field Samples + +The barcode was subsetted from the [above WGS data](#pf3k) to just the sanger barcode for the Vietnam and DRC data. The results file can be found within directory [`snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.Vietnam.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.Vietnam.vcf.gz), [`snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.DRCongo.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/sangerBarcode_SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.DRCongo.vcf.gz) + +### Lab Isolates + +The barcode was also explicitly called with several monoclonal lab isolates and then lab created mixtures of these isolates. Data can be found [`snp_barcode/controls_sanger100.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/controls_sanger100.vcf.gz) with meta data with what mixtures are what found [`snp_barcode/allControlMixtures.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/allControlMixtures.tab.txt) and [`snp_barcode/allControlSampNameToMixName.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/allControlSampNameToMixName.tab.txt) + +### Simulated + +The barcode was also simulated for 100 samples (50 Bangladesh and 50 Ghana). Data can be found [`snp_barcode/SpotMalariapfPanel_simData_sanger100.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/SpotMalariapfPanel_simData_sanger100.vcf.gz). The simulations were created by simulating super infections by sampling the barcode from each of these countries and selecting COIs based on the COIs observed for each country. +To use data without indels, the data can be found [`snp_barcode/SpotMalariapfPanel_simData_snponly_sanger100.vcf.gz`](https://github.com/mrc-ide/PGEforge/tree/main/data/snp_barcode/SpotMalariapfPanel_simData_snponly_sanger100.vcf.gz). diff --git a/website_docs/data_wgs.qmd b/website_docs/data_wgs.qmd new file mode 100644 index 0000000..4627341 --- /dev/null +++ b/website_docs/data_wgs.qmd @@ -0,0 +1,25 @@ +--- +title: "Whole genome sequencing (WGS) data" +format: html +bibliography: ../references/references.bib +--- + +## *P. falciparum* WGS data + +### Pf3k +All of the data within the subfolder `wgs/pf3k` was derived from the [Pf3k Project](https://www.malariagen.net/parasite/pf3k). Currently, there are three VCF files, with corresponding CSVs containing metadata, for samples from: +- Democratic Republic of the Congo ($n=113$) +- Vietnam ($n=97$) +- *In vitro* mixtures of laboratory strains ($n=25$) + +Each VCF contains 247,496 high-quality (VQSLOD>6) biallelic SNPs across all fourteen somatic chromosomes. The VCFs are sorted and an index file is provided. The Fws statistics provided in the metadata CSVs were collected from the [Pf7 data set](https://www.malariagen.net/sites/default/files/Pf7_fws.txt), which contains the Pf3k samples. These were not calculated for the *in vitro* lab mixtures. + +### Simulated +All of the data within the subfolder `wgs/simulated` was simulated. In brief, a simulated sample with a given complexity of infection (COI), $K$, is created by randomly sampling $K$ clonal haplotypes ($F_{ws} > 0.95$) from a given country within the [Pf3k Project](https://www.malariagen.net/parasite/pf3k), assigning these haplotypes to $j \leq K$ bites, simulating meiosis if $j < K$, randomly sampling proportions for each haplotype, and then simulating read count data given the proportions and final genotypes. Sequencing error is simulated at a fixed rate and present in the read counts. No variant calling error is simulated; the genotypes are perfect. At present, there is only one VCF file with a corresponding CSV and BED file containing metadata, with samples simulated from: +- Democratic Republic of the Congo ($n=40$) + +The COI of these samples ranges from one to four, and about half of them have within-host relatedness. + + +### Lab isolates sub-setted +There are a set of bam files with vcf calls subsetting to just CSP (PF3D7_0304600), CELTOS (PF3D7_1133400), and AMA1 (PF3D7_1216600). These can be found within the [`wgs/labisolate_subset`](https://github.com/mrc-ide/PGEforge/tree/main/data/wgs/labisolate_subset) directory. With metadata describing what is in each file [`wgs/labisolate_subset/allControlMixtures.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/wgs/labisolate_subset/allControlMixtures.tab.txt), [`wgs/labisolate_subset/allControlSampNameToMixName.tab.txt`](https://github.com/mrc-ide/PGEforge/tree/main/data/wgs/labisolate_subset/allControlSampNameToMixName.tab.txt) diff --git a/website_docs/future_work.qmd b/website_docs/future_work.qmd new file mode 100644 index 0000000..64d0121 --- /dev/null +++ b/website_docs/future_work.qmd @@ -0,0 +1,38 @@ +--- +title: "What's in store for the future?" +format: html +--- + +:::{.column-margin} +![](img/pgeforge_ecosystem.png){ width=262 height=300 } +::: + +## The PGEforge vision +PGEforge is a **community-driven** platform created by and for malaria genomic epidemiology analysis tool developers and end-users. If you are interested in contributing to these efforts, please [get involved](how_to_contribute.qmd)! + +## Tool benchmarking +The process of evaluating tools involves assessing their adherence to objective [software standards](tools_to_standards.qmd) from both the end-user and developer perspectives. This is a great way to determine how well we are doing at making our tools accessible and usable by anyone. However, it is equally important to evaluate how *well* tools perform their intended functions and how they compare to other tools designed for the same purpose. + +This type of evaluation could be achieved through **formal benchmarking** of different tools and their analysis functionalities. We can use [canonical simulated datasets](data_description.qmd) with known ground truth to evaluate the accuracy, sensitivity, and specificity of the tools in producing desired outcomes. What do we mean by ground truth? For example, we can simulate data to have specific data features that we are interested in benchmarking, including complexity of infection (COI), missingness, sequencing errors and parasitemia. We can then evaluate how well the tools estimate these features. With empirical "real-world" datasets, a ground truth may not be known, but we can use these datasets to compare consistency of each tool obtaining the same outcomes. + +By systematically benchmarking tools against known datasets, we can ensure that the tools not only meet software standards but also perform reliably in practical applications, such as [key use cases for *Plasmodium* genomic epidemiology](use_cases.qmd). We plan to host these types of benchmarking resources and results in PGEforge in the near future. + +:::{.column-margin} +Ongoing work in this space includes **benchmarking COI estimation tools.** +::: + + +## Adding and evaluating more tools +Our initial efforts to [landscape available tools](tool_landscaping.qmd) were focused on tools for downstream *Plasmodium* genetic analysis and included 40 available tools. However, future work could include the following areas (not an exhaustive list!): + +- Tools with applications to mosquito genetics, as long as they are generally within the scope of PGEforge (i.e. *Plasmodium* genomic epidemiology tools) +- Bioinformatic pre-processing tools/pipelines, such as variant calling and quality filters +- Tools that are not specifically engineered for *Plasmodium* but can be applied to *Plasmodium*genetic data, such as more generic population genetics tools that estimate metrics such as F-statistics, extended haplotype homozygosity, etc + +## Tool development +**Do we want to include some of the results from the tools to functionality evaluation? We could say here that tools that can phase are needed for all use cases!** + +Tools to functions -- we still can't phase! + +## Thank you! +Thank you for your interest in contributing to PGEforge. Your efforts help us build a stronger, more inclusive research community making *Plasmodium* genomic analysis accessible to all! diff --git a/website_docs/how_to_contribute.qmd b/website_docs/how_to_contribute.qmd index 71e6950..9ba9763 100644 --- a/website_docs/how_to_contribute.qmd +++ b/website_docs/how_to_contribute.qmd @@ -1,8 +1,105 @@ --- -title: "Add your tool to the list" +title: "Get involved" format: html --- -# Overview -PGEforge hosts simulated and empirical datasets of: +:::{.column-margin} +![](img/pgeforge_ecosystem.png){ width=262 height=300 } +::: +## The PGEforge vision +PGEforge is a **community-driven** platform created by and for malaria genomic epidemiology analysis tool developers and end-users. If you are interested in contributing to these efforts, please [get involved](how_to_contribute.qmd)! + +## How to contribute to PGEforge +We are excited to have you join us and we welcome input from all areas of the research community to continuously improve and expand our platform. PGEforge aims to contribute to an inclusive and sustainable ecosystem of existing and new tools for *Plasmodium* genomic data analysis. From less to more involvement, there are several ways to get involved and join us in contributing to these aims. All contributors must adhere to our [community rules](#community-rules) and will be [recognized as a contributor](contributors.qmd). + +Below are some ways to get involved with current workstreams, but we also have some ideas and plans in the works for the [future](future_work.qmd) beyond what is currently available in PGEforge! + +## Add tool to landscaping and software standards matrices +Landscaping, documenting and benchmarking available *Plasmodium* genomic data analysis tools and any tools within the realm of *Plasmodium* genomic epidemiology (PGE) will require continuous updating and curation. As new tools become available, they need to be assessed and integrated into our existing framework. While we have made significant progress in identifying, documenting and evaluating tools commonly applied to *Plasmodium* genetic data, the [tool landscaping](tool_landscaping.qmd) and [evaluation against software standards](software_standards.qmd) is not exhaustive. For example, our initial efforts focused on tools for downstream *Plasmodium* genetic analysis, specifically targeting tools that extract signals from already processed data and focusing only on tools applicable *P. falciparum* and *P. vivax*. In terms of tool evaluation, we focused our efforts on evaluating the tools for which resources were developed in the [tutorials section](tutorials_overview.qmd). [Future work](future_work.qmd) could also involve other types of tools, however. + +:::{.column-margin} +So far, PGEforge hosts a landscaping matrix of 40 identified tools, full documentation and evaluation of 17 tools, and complete resources (summary documents, installation instructions, and fully-worked through tutorials) developed for 11 tools. +::: + +In order to ensure this 'live document' remains up-to-date, we encourage contributions to the following areas: + +- Updating the documentation and filling in the gaps for remaining tools in the [landscaping matrix](tool_landscaping.qmd) +- Evaluating remaining tools based on software standards (these tools are highlighted in grey in the [overview by tools based on software standards](tools_to_standards.qmd)) +- Updating [tool scores against software standards](tools_to_standards.qmd) if developers release tool updates +- Add a **new** tool to landscaping and software standard matrices + +To contribute to any of these areas, there are two ways to do this depending on your comfort level with using GitHub and R. + +:::{.column-margin} +Don't forget to also [add yourself as a contributor](#add-yourself-as-a-contributor)! +::: + +1. Download the relevant `.csv` files and edit them locally. Then open an [issue on Github](https://github.com/mrc-ide/PGEforge/issues/new) and attach your updated `.csv` files. The location of the files are described below: + +- [`MMS_software_landscaping.csv`](https://github.com/mrc-ide/PGEforge/tree/main/website_docs/tables/MMS_software_landscaping.csv): this is the [tool landscaping matrix](tool_landscaping.qmd). +- [`Tools_to_standards.csv`](https://github.com/mrc-ide/PGEforge/tree/main/website_docs/tables/Tools_to_standards.csv): this is the evaluation of each tool with respect to the [objective software standards](software_standards.qmd). Please make sure you follow the [evaluation criteria rubric](https://mrc-ide.github.io/PGEforge/software_standards.qmd#evaluation-criteria) when you score each of the criteria. + +**Please ensure you only modify the relevant cells or add new rows as needed** + +2. Follow the [Github contributor guidelines](#contributing-guidelines-for-github) to create your own branch and PR your changes directly. Make sure you: + +- Add your headshot photo to the [`website_docs/img/people`](https://github.com/mrc-ide/PGEforge/tree/main/website_docs/img/people) folder +- Add your name in alphabetical order to the [website_docs/contributors.qmd](https://github.com/mrc-ide/PGEforge/blob/main/website_docs/contributors.qmd) and reference your photo by using the following code ```![](img/people/yourname.jpeg){fig-align="left" width="100px"}```. + +## Develop resources for a tool + +If you have developed a new *Plasmodium* genomic data analysis tool or identified a tool that is missing tutorial resources, you can develop them and contribute them to PGEforge! Follow these steps to contribute: + +:::{.column-margin} +Don't forget to also [add yourself as a contributor](#add-yourself-as-a-contributor)! +::: + +**Step 1:** +In PGEforge we focus on *Plasmodium* genomic **data analysis** tools. The tool should be within the scope of *Plasmodium* genomic epidemiology data analysis. + +**Step 2:** +Take a look at the [tool landscaping](tool_landscaping.qmd) in case we have already evaluated that specific tool. In some instances, we opted to not develop tutorial resources for tools where there was difficulty installing and running the software, run-time errors, or if the tool was clearly superseded by more recent tools. You will find more details in the [tool landscaping](tool_landscaping.qmd). If you identify a tool that is not in the landscaping matrix, please [add it](#add-tool-to-landscaping-and-software-standards-matrices). + +**Step 3:** +If you have reached this point and have found a tool that is missing from PGEforge, great! After [adding it to the landscaping and software standards matrices](#add-tool-to-landscaping-and-software-standards-matrices) you can develop the tool resources. We provide all the templates that you need to get started. For every tool, we develop the following resources: + + - *Summary Document:* Provide an overview of the tool, including its main purpose, use cases, license, code repository, relevant publications, and citation details. Fill in these details in the template .csv file and then "Render" the accompanying .qmd file to display the information in a nicely formatted table. + - *Installation Instructions:* Write clear, step-by-step instructions for installing the tool, ensuring it is accessible to users with varying technical skills. If the tool is an R package and *not* already in the [PlasmoGenEpi R-universe](R_universe.qmd), please let us know! Otherwise, we strongly suggest you instruct users to install from R-universe. + - *Fully Worked Tutorials:* Develop comprehensive tutorials that guide users through the entire process of using the tool to perform a specific analysis, from data import to interpretation of results. PGEforge hosts both empirical and simulated [canonical datasets](data_description.qmd) for the commonly used data input formats. We strongly encourage you to use these when you develop these tutorials as this ensures that they are fully reproducible. + +You can find the templates for all of these documents in our [Github repository](https://github.com/mrc-ide/PGEforge/tree/main/tutorials/Template) and example tutorials in the [Tutorials section](tutorials_overview.qmd). + +**Please make sure to follow our [GitHub contribution guidelines](#contributing-guidelines-for-github).** + +## Developing analysis workflows +Coming soon! + +## Add yourself as a contributor +Everyone who contributes to PGEforge is listed as a [contributor](contributors.qmd). + +There are two ways to do this: + +1. Open an [issue on Github](https://github.com/mrc-ide/PGEforge/issues/new) and make sure you include the following information so we can add you to the contributors list: + +- Full name +- Affiliations +- Headshot photo + +2. Follow the [Github contributor guidelines](#contributing-guidelines-for-github) to create your own branch and PR your changes directly. Make sure you: + +- Add your headshot photo to the [`website_docs/img/people`](https://github.com/mrc-ide/PGEforge/tree/main/website_docs/img/people) folder +- Add your name in alphabetical order to the [website_docs/contributors.qmd](https://github.com/mrc-ide/PGEforge/blob/main/website_docs/contributors.qmd) and reference your photo by using the following code ```![](img/people/yourname.jpeg){fig-align="left" width="100px"}```. + +## Contributing guidelines for GitHub + +To ensure a smooth and efficient collaboration process, please follow these guidelines when contributing to our GitHub repository: + +- **Create a New Branch:** Start by creating a new branch from the `develop` branch. This keeps your changes separate until they are ready to be reviewed and merged. +- **Pull Request (PR) for Review:** Once you have made your changes, create a PR into the `develop` branch. Our team will review your PR and provide feedback or approve it for merging. Never make any changes to the `main` branch, and please always PR into `develop`. + +## Thank you! +Thank you for your interest in contributing to PGEforge. Your efforts help us build a stronger, more inclusive research community making *Plasmodium* genomic analysis accessible to all! + +## Community rules +PGEforge is dedicated to creating an inclusive, respectful, and engaging community. We believe in open collaboration and active participation, empowering each other to share and gain knowledge, resources, and opportunities as a community. We believe in the benefits of a wide range of perspectives, experiences and ideas. We welcome contributions from everyone who shares our goals and wants to contribute, regardless of age, gender identity, sexual orientation, disability, ethnicity, nationality, race, religion, education, level of experience, career stage, or socioeconomic status. PGEforge aims to foster a harassment-free experience for everyone and expects all contributors to demonstrate empathy, kindness, and respect, and to engage constructively with differing viewpoints. Unacceptable behaviors, including harassment, discriminatory language, and personal attacks, will not be tolerated. By fostering a positive and inclusive community, we aim to empower everyone to contribute and collaborate effectively to a vibrant and growing PGEforge community. \ No newline at end of file diff --git a/website_docs/img/fasta_format.png b/website_docs/img/fasta_format.png new file mode 100644 index 0000000..6f98f3b Binary files /dev/null and b/website_docs/img/fasta_format.png differ diff --git a/website_docs/img/genetic_variation_types.png b/website_docs/img/genetic_variation_types.png new file mode 100644 index 0000000..1d32c6e Binary files /dev/null and b/website_docs/img/genetic_variation_types.png differ diff --git a/website_docs/img/people/Taylor-A.jpg b/website_docs/img/people/Taylor-A.jpg new file mode 100644 index 0000000..c2fe5fa Binary files /dev/null and b/website_docs/img/people/Taylor-A.jpg differ diff --git a/website_docs/img/people/fadel.jpeg b/website_docs/img/people/fadel.jpeg new file mode 100644 index 0000000..1ddb3d8 Binary files /dev/null and b/website_docs/img/people/fadel.jpeg differ diff --git a/website_docs/img/pge_api_screenshot.png b/website_docs/img/pge_api_screenshot.png new file mode 100644 index 0000000..fa4c59f Binary files /dev/null and b/website_docs/img/pge_api_screenshot.png differ diff --git a/website_docs/img/pge_pkgs_screenshot.png b/website_docs/img/pge_pkgs_screenshot.png new file mode 100644 index 0000000..4595d6d Binary files /dev/null and b/website_docs/img/pge_pkgs_screenshot.png differ diff --git a/website_docs/img/pge_runiv_screenshot.png b/website_docs/img/pge_runiv_screenshot.png new file mode 100644 index 0000000..3770d45 Binary files /dev/null and b/website_docs/img/pge_runiv_screenshot.png differ diff --git a/website_docs/img/pgeforge_ecosystem.png b/website_docs/img/pgeforge_ecosystem.png new file mode 100644 index 0000000..3a1e491 Binary files /dev/null and b/website_docs/img/pgeforge_ecosystem.png differ diff --git a/website_docs/img/pgeforge_header.png b/website_docs/img/pgeforge_header.png new file mode 100644 index 0000000..dd03cb6 Binary files /dev/null and b/website_docs/img/pgeforge_header.png differ diff --git a/website_docs/img/runiverse_logo.png b/website_docs/img/runiverse_logo.png new file mode 100644 index 0000000..e5f597a Binary files /dev/null and b/website_docs/img/runiverse_logo.png differ diff --git a/website_docs/img/vcf_format.png b/website_docs/img/vcf_format.png new file mode 100644 index 0000000..2c6eb6e Binary files /dev/null and b/website_docs/img/vcf_format.png differ diff --git a/website_docs/radish23.qmd b/website_docs/radish23.qmd index 20e5c51..419c99d 100644 --- a/website_docs/radish23.qmd +++ b/website_docs/radish23.qmd @@ -6,19 +6,34 @@ format: image: "website_docs/img/radish.png" --- -## Welcome +## Background + +The **R**eproducibility, **A**ccessibility, **D**ocumentation and **I**nter-operability **S**tandards **H**ackathon (**RADISH23**), organised by Bob Verity, Shazia Ruybal-Pesántez, Bryan Greenhouse and Amy Wesolowski took place at Johns Hopkins Bloomberg School of Public Health in Baltimore, USA from 11-14th December 2023, with 16 participants from 11 institutions. ::: {.column-margin} -![A cute radish](img/radish.png) +![A cute radish](img/radish.png){ width=200 height=200 } ::: -The **R**eproducibility, **A**ccessibility, **D**ocumentation and **I**nter-operability **S**tandards **H**ackathon (**RADISH23**), organised by Bob Verity, Shazia Ruybal-Pesántez, Bryan Greenhouse and Amy Wesolowski took place on Mon 11th Dec to Thurs 14th Dec 2023 in Baltimore, USA. - ![](img/radish23_group_photo.jpg) -*For more details on RADISH23 attendees and contributors to PGEforge, see the [Contributors page](website_docs/contributors.qmd) * +*For more details on RADISH23 attendees and contributors to PGEforge, see the [Contributors page](contributors.qmd) * ## Main aims -Our aim was to take the wide range of software tools in malaria genomic epidemiology and make it so that more people can use them more reliably. This is a first step in working towards more ambitious goals like designing workflows, benchmarking, or coming up with guidelines for best practices. +Our aim was to take the wide range of software tools in malaria genomic epidemiology and create community resources so that more people can use them more reliably. Over the course of 4 days, we began creating a systematic framework for analysis by curating existing software tools, identifying the gaps in commonly used tools, in addition to having broader discussions about standardizing software practices. + +One of our main aims to create community resources that allowed anyone with basic computer skills to be able to run common *Plasmodium* genetic analyses locally. By framing this work within the wider context of use-cases, we made progress towards harmonizing which tools need to be chained together to answer specific questions relevant to malaria control as part of well-defined and flexible workflows. + +## Outputs +The event was primarily coding-based and hands-on with the aim that the materials developed for each tool will allow an end-user to go from installation of the tool on their local machine to analysis using the standardized datasets we compiled. + + +::: {.column-margin} +![](img/PXL_20231213_195311094.jpg){ width=400 height=300} +::: + +Prior to the hackathon, we carried out a [scoping/landscaping exercise](tool_landscaping.qmd) to identify all available analysis tools and evaluated them against a set of [software standards](software_standards.qmd), identifying those that are superseded or relegated and those we would prioritize during the hackathon. With those priority tools in mind, [comprehensive guides](tutorials_overview.qmd) were developed for the tools, including summary documents, installation aids and tutorials. + +In addition, we compiled [simulated and empirical datasets](data_description.qmd) of genomic data in common formats required as input for the various tools to allow reproducibility for end-users running the tutorials and for future uses. -To view the tutorials developed during RADISH23, see the [Tutorials](website_docs/tutorials_overview.qmd). \ No newline at end of file +Alongside the coding tasks, there were several small break-out sessions where participants defined malaria genomic surveillance data +analysis use cases and sketched workflows for each of them, including functionality requirements and mapping these functionalities to available tools. \ No newline at end of file diff --git a/website_docs/software_standards.qmd b/website_docs/software_standards.qmd index 2a5ef50..e3da7d1 100644 --- a/website_docs/software_standards.qmd +++ b/website_docs/software_standards.qmd @@ -1,8 +1,81 @@ --- -title: "Evaluation framework" +title: "Software standards" format: html --- -# Overview -PGEforge hosts simulated and empirical datasets of: +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +library(dplyr) +library(janitor) +library(kableExtra) +``` + +```{r read data, echo=F, include=F} +#Read csv data into R +stds <- read.csv("tables/Objective_software_standards.csv") +``` + +## Framework for evaluating software standards in *Plasmodium* genomics + +PGEforge aims to foster an ecosystem of high-quality, user-friendly tools that can be seamlessly integrated into genomic analysis workflows. One of the biggest challenges is the variability and lack of systematic assessment of existing tools, which often do not adhere to best practices in software development, including [FAIR standards](https://doi.org/10.1038/sdata.2016.18), maintenance, and usability. + +Working towards this goal, a robust software standards evaluation framework was formulated to guide the development and assessment of tools used in *Plasmodium* genomic data analysis from both the end-user and developer perspective. This framework is crucial in addressing the variability and challenges associated with existing software tools but also to guide development of new tools, ensuring that they meet high standards of usability, accessibility, and reliability. + +### 'Ideal' software practices +One of the primary objectives of this framework is to define ‘ideal’ software practices that are not tool-specific but applicable across a range of genomic analysis tools. These practices encompass: + +- Comprehensive documentation +- Ease of installation +- Reliable and maintainable software + +Additionally, during the [2023 RADISH23 hackathon](radish23.qmd), focused discussions highlighted the following software practices that are not necessarily essential, but are "nice-to-have's": + +- Uses standard data input formats +- Computationally efficient +- Informative error handling +- Multiple languages for tutorials +- Minimal dependencies +- Modular code (eg split into functions) +- Well annotated code + +These practices can guide development of new tools and/or improvement of existing tools. + +### Evaluation criteria +To implement these standards and facilitate tool evaluation and development, PGEforge has developed a set of measurable criteria that can be applied to evaluate the performance and usability of various tools. There are two categories: + +- **User-facing:** criteria to evaluate the tool from an end-user perspective, for example whether installation instructions are available and easy-to-follow +- **Developer-facing:** criteria to evaluate the tool from a developer perspective, for example whether unit tests are implemented + +::: {.column-margin} +The evaluation criteria encompass the following key themes, in line with the 'ideal' software practices for both end-users and developers: + +- Quality and comprehensiveness of documentation +- Simplicity of installation processes +- Quality assurance and maintenance +::: + +```{r echo=F} +stds %>% + clean_names(case = "sentence") %>% + kable() %>% + pack_rows("User-facing", start_row = 1, end_row = 4) %>% + pack_rows("Developer-facing", start_row = 5, end_row = 8) %>% + kable_styling(full_width = T, position = "left") +``` + +::: {.column-margin} +
+
+Every criteria is scored on the following scale: + +- 0: Criteria not fulfilled +- 1: Criteria fulfilled but not entirely +- 2: Criteria fulfilled + +
+This is then translated to an **end-user score** and **development score** for the tool. +::: + +### Tool evaluation +Every *Plasmodium* genomic analysis tool can be evaluated against these objective software standards to provide these scores. The resulting evaluation matrix and overview of each tool can be found [here](tools_to_standards.qmd). diff --git a/website_docs/styles.scss b/website_docs/styles.scss index ba59def..99c5a7e 100644 --- a/website_docs/styles.scss +++ b/website_docs/styles.scss @@ -20,4 +20,11 @@ a{ text-align: center; font-weight: bold; font-size: 48px; /* Adjust the size as needed */ + display: flex; + align-items: center; /* Aligns items vertically in the center */ + justify-content: space-between; /* Ensures there's space between the text and image */ +} + +.title-image { + margin-left: 20px; /* Adds space between the text and the image */ } \ No newline at end of file diff --git a/website_docs/tables/MMS_software_landscaping.csv b/website_docs/tables/MMS_software_landscaping.csv index 0d595d2..ab11372 100644 --- a/website_docs/tables/MMS_software_landscaping.csv +++ b/website_docs/tables/MMS_software_landscaping.csv @@ -1,41 +1,41 @@ -Landscaping_completed_by,Field_status,Last_updated,Stage,Theme,Major_use_case,Software_name,Tutorial_status,Tutorial_by,Reason_relegated_or_superseded,Reviewed,Reviewed_by,Software_authors,Software_download_link,Repos_link,Webpage_link,Implementation,Installation,Functionalities,Functionality_notes,Preprint_DOI,Preprint_year,Publication_DOI,Publication_year,Documentation_location,Documentation_link,Tutorials_location,Tutorials_link -Bob Verity,Completed,11-Sep-23,Release,analysis,MOI estimation,MLMOI,Relegated,NA,"manual installation from tarball, unclear examples - 3 digit numbers associated with empty sample names",NA,Alfred Simkin,"Meraj Hashemi, Kristan Schneider",https://cran.r-project.org/src/contrib/Archive/MLMOI/,https://github.com/Maths-against-Malaria/MOI-Bias-correction,NA,R package,Manual package install from .tar ball (removed from CRAN archive),"MOI estimation, Allele frequency estimation",NA,NA,NA,https://doi.org/10.1371/journal.pone.0261889,2021,within package,NA,within package,NA -Shazia Ruybal,Completed,6-Dec-23,Release,analysis,Frequency/prevalence estimation of phased genotypes,MalHaploFreq,Superseded,Nick Hathaway,Superseeded by MultiLociBiallelicModel because MultiLociBiallelicModel can do more loci and doesn't require MOI to be supplied and can in fact estimate MOI,Pending,,Ian Hastings,http://pcwww.liv.ac.uk/hastings/MalHaploFreq/,NA,NA,.exe file ,Manual download of .exe file,Haplotype frequency estimation,"MOI estimation and prevalence estimates are secondary (essentially untested), program does not work on Mac or Linux",NA,NA,https://doi.org/10.1186/1475-2875-7-130,2008,software website,http://pcwww.liv.ac.uk/hastings/MalHaploFreq/,NA,NA -Shazia Ruybal,Completed,6-Dec-23,Release,analysis,Frequency/prevalence estimation of phased genotypes,malaria.em,Relegated,Nick Hathaway,Code has not be updated since 2014 and is no longer available for donwload other than from archives,NA,,Xiaohong Li,https://cran.r-project.org/src/contrib/Archive/malaria.em/,NA,NA,R package,Manual package install from .tar ball (CRAN archive),"MOI estimation, haplotype frequency estimation",NA,NA,NA,https://doi.org/10.2202/1544-6115.1321,2007,within package,NA,NA,NA -Shazia Ruybal,Completed,6-Dec-23,Beta,analysis,Frequency/prevalence estimation of phased genotypes,FreqEstimationModel,Assigned,,,Pending,,Aimee Taylor,https://github.com/aimeertaylor/FreqEstimationModel/tree/master,https://github.com/aimeertaylor/FreqEstimationModel,NA,R package,Manual download of files or github clone and run files locally,"MOI, population-level allele and haplotype frequency estimation",Statistical model based on prevalence data. Allows for missing data,NA,NA,https://doi.org/10.1186/1475-2875-13-102,2014,within package,NA,within package,NA -Shazia Ruybal,Completed,6-Dec-23,Beta,analysis,Frequency/prevalence estimation of phased genotypes,MultiLociBiallelicModel,Done,Nicholas Hathaway,NA,Pending,,Christian Tsoungui Obama,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel,NA,Standalone R scripts,Git clone and run scripts,MOI and haplotype frequency estimation,NA,NA,NA,https://doi.org/10.3389/fepid.2022.943625,2022,github repo,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel,github repo,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel/blob/main/src/SNP_MLE.R -Shazia Ruybal,Completed,6-Dec-23,Release,analysis,MOI estimation,pfmix,Relegated,Karamoko,"bunch of errors in the source codes, very insufficient documentation on how to install and run the tool. Tool updated 7 years ago",NA,,John O'Brien,https://github.com/cascobayesian/pfmix/tree/master,https://github.com/cascobayesian/pfmix,NA,R package,Manual package install from .tar ball,"MOI estimation, strain mixture proportion inference",also inference of proportion 'unexplained' mixture proportions within infections,NA,NA,https://doi.org/10.1371/journal.pcbi.1004824,2016,github repo,https://github.com/cascobayesian/pfmix/tree/master,github repo README,https://github.com/cascobayesian/pfmix/tree/master -Kathryn Murie,Completed,20-Nov-23,Release,analysis,MOI estimation,THEREALMcCOIL,Done,Nick Brazeau,,Pending,,Hsiao-Han Chang,https://github.com/EPPIcenter/THEREALMcCOIL,https://github.com/EPPIcenter/THEREALMcCOIL,,Standalone R scripts,Git clone and compile from source files,COI and population allele frequency estimation,Can be used in categorical (SNP either het or hom) or proportional mode (relative signal intensity for each allele used),NA,NA,https://doi.org/10.1371/journal.pcbi.1005348,2017,github repo. Link to gh-page in README that doesn't work (https://eppicenter.github.io/THEREALMcCOIL/),https://github.com/EPPIcenter/THEREALMcCOIL,NA,NA -Kathryn Murie,Completed,20-Nov-23,Release,analysis,MOI estimation,McCOILR,Relegated,Nick Brazeau,Producing inflated COI estimates as compared to original C++ file from EPPICenter. Presumed seed bug ,Pending,,"Hsiao-Han Chang, OJ Watson",https://github.com/OJWatson/McCOILR,https://github.com/OJWatson/McCOILR,,R package,devtools install,COI and population allele frequency estimation,Wrapper on THEREALMcCOIL,NA,NA,NA,NA,package and website ,https://ojwatson.github.io/McCOILR/articles/introduction.html,package and website,https://ojwatson.github.io/McCOILR/articles/introduction.html -Kathryn Murie,Completed,20-Nov-23,Release,analysis,MOI estimation,coiaf,Done,Max Murphy,NA,Pending,,"Aris Paschalidis, OJ Watson",https://github.com/bailey-lab/coiaf,https://github.com/bailey-lab/coiaf,https://bailey-lab.github.io/coiaf/index.html,R package,devtools install,COI estimation,can get discrete or continuous COI values,NA,NA,https://doi.org/10.1371/journal.pcbi.1010247,2023,package and website ,https://bailey-lab.github.io/coiaf/index.html,package and website,https://bailey-lab.github.io/coiaf/articles/example_real_data.html -Jason Hendry,Completed,21-Apr-20,Release,analysis,Phasing,DEploid,Superseded,Jason Hendry,DeploidIBD is the same program just without a flag,NA,,Joe Zhu,https://github.com/DEploid-dev/DEploid,https://github.com/DEploid-dev/DEploid,https://deploid.readthedocs.io/en/latest/#,C++,Make,"COI, proportion and haplotype estimation",NA,NA,NA,https://doi.org/10.1093/bioinformatics/btx530,2018,github repo and website,https://deploid.readthedocs.io/en/latest/,github repo and website,https://deploid.readthedocs.io/en/latest/ -Jason Hendry,Completed,21-Apr-20,Release,analysis,Phasing,Deploid-r,Relegated,Jason Hendry,links to old version of Deploid,Pending,,"Joe Zhu, Jacob Almagro-Garcia, Gil McVean",https://github.com/DEploid-dev/DEploid-r,https://github.com/DEploid-dev/DEploid-r,https://deploid.readthedocs.io/en/latest/index.html,R package,devtools install,"COI, proportion and haplotype estimation",Rcpp wrapper for DEploid,NA,NA,NA,NA,github repo,https://github.com/DEploid-dev/DEploid-r,NA,NA -Jason Hendry,Completed,20-Mar-22,Release,analysis,Phasing,DEploidIBD,Assigned,Jason Hendry,,Pending,,Joe Zhu,https://github.com/DEploid-dev/DEploid,https://github.com/DEploid-dev/DEploid,https://deploid.readthedocs.io/en/latest/index.html,C++,Make,"COI, proportion, within-sample IBD and haplotype estimation","DEploid with ""-ibd"" flag",NA,NA,https://doi.org/10.7554/eLife.40845,2019,github repo and website,https://deploid.readthedocs.io/en/latest/,github repo and website,https://deploid.readthedocs.io/en/latest/ -Kirsty McCann,Completed,22-Nov-23,Release,analysis,Relatedness estimation,isoRelate,Done,Kirsty McCann,,Pending,,Lyndal Henden,https://github.com/bahlolab/isoRelate,https://github.com/bahlolab/isoRelate,NA,R package,devtools install,Infering pairwise IBD in haploid species,NA,NA,NA,https://doi.org/10.1371/journal.pgen.1007279,2018,within package,https://github.com/bahlolab/isoRelate/blob/master/README.md ,within package,NA -Steve Schaffner,Completed,21-Nov-23,Release,analysis,Relatedness estimation,hmmIBD,Assigned,Stephen Schaffner,,Pending,,Steve Schaffner,https://github.com/glipsnort/hmmIBD,https://github.com/glipsnort/hmmIBD,NA,C + Python scripts,Compile with any C compiler,Identifies IBD segments between pairs of genomes,Includes scripts to extract and thin data,NA,NA,https://doi.org/10.1186/s12936-018-2349-7,2018,within package,NA,NA,NA -Nick Brazeau,Completed,11-Dec-23,Beta,study design,Panel design,paneljudge,Done,Nick Brazeau,NA,Pending,,"Aimee Taylor, Pierre Jacob",https://github.com/aimeertaylor/paneljudge,https://github.com/aimeertaylor/paneljudge,NA,R package,devtools install,performance of a panel of genetic markers designed to test relatedness,can also estimate relatedness between monoclonals,NA,NA,https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13622,2022,within package,https://github.com/aimeertaylor/paneljudge,vignette within package,NA -Sophie Berube,Completed,19-Nov-23,Release,analysis,Relatedness estimation,dcifer,Assigned,Shazia Ruybal,,Pending,,Inna Gerlovina,https://CRAN.R-project.org/package=dcifer,https://github.com/EPPIcenter/dcifer,https://eppicenter.github.io/dcifer/,R package,CRAN install,estimation of relatedness (IBD) between polyclonal infections,in addition to estimating relatedness also allows user to perform statistical inference also produces estimates of COI and population allele frequencies. ,NA,NA,https://doi.org/10.1093/genetics/iyac126,2022,package and website,https://eppicenter.github.io/dcifer/,vignette (CRAN),https://cloud.r-project.org/web/packages/dcifer/vignettes/vignetteDcifer.pdf -Shazia Ruybal,Completed,9-Nov-23,Release,analysis,MOI estimation + allele frequency + within host relatedness,moire,Assigned,Kathryn Murie,,Pending,,"Maxwell Murphy, Bryan Greenhouse",https://github.com/EPPIcenter/moire,https://github.com/EPPIcenter/moire,https://eppicenter.github.io/moire/,R package,devtools install,"Estimation of COI, estimation of pop-level allele frequencies",uses MCMC based approaches,https://doi.org/10.1101/2023.10.03.560769,2023,NA,NA,package and website,https://eppicenter.github.io/moire/index.html,vignette,https://eppicenter.github.io/moire/articles/mcmc_demo.html -Bob Verity,Completed,9-Nov-23,Beta,analysis,Population structure,MALECOT,Assigned,Jorge Amaya-Romero,,Pending,,Bob Verity,https://github.com/bobverity/MALECOT,https://github.com/bobverity/MALECOT,https://bobverity.github.io/MALECOT/,R package,devtools install,"Inference of population strucure, estimation of COI, estimation of allele frequencies",NA,NA,NA,NA,NA,package and website,https://bobverity.github.io/MALECOT/index.html,package and website,https://bobverity.github.io/MALECOT/index.html -Bob Verity,Completed,9-Nov-23,Development,simulation,Transmission simulation,SIMPLEGEN,In development,NA,,NA,,"Bob Verity, Shazia Ruybal, Isobel Routledge, Sophie Berube, Daniel Larremore",https://github.com/mrc-ide/SIMPLEGEN/tree/master,https://github.com/mrc-ide/SIMPLEGEN/tree/master,https://mrc-ide.github.io/SIMPLEGEN/,R package,devtools install,transmission simulation of malaria epidemiology and genotypes,NA,NA,NA,NA,NA,package and website,https://mrc-ide.github.io/SIMPLEGEN/index.html,package and website,https://mrc-ide.github.io/SIMPLEGEN/index.html -Bob Verity,Completed,10-Dec-23,Beta,simulation,Transmission simulation,magenta,Simulator,NA,,NA,,OJ Watson,https://github.com/OJWatson/magenta,https://github.com/OJWatson/magenta,https://ojwatson.github.io/magenta/index.html,R package,devtools install,transmission simulation of malaria epidemiology and genotypes,Implements interventions and a basic spatial model,NA,NA,https://doi.org/10.1093/molbev/msaa225,2020,website,https://ojwatson.github.io/magenta/articles/Introduction.html,website,https://ojwatson.github.io/magenta/articles/Introduction.html -Bob Verity,Completed,9-Nov-23,Release,study design,pfhrp2/3 study design,DRpower,Done,Bob Verity,,Done,Steve Schaffner,"Bob Verity, Shazia Ruybal",https://github.com/mrc-ide/DRpower,https://github.com/mrc-ide/DRpower,https://mrc-ide.github.io/DRpower/,R package,devtools install,power and sample size calculation for pfhrp2/3 studies,additional interactive web app,NA,NA,NA,NA,package and website,https://mrc-ide.github.io/DRpower/,package and website,https://mrc-ide.github.io/DRpower/ -Alfred Simkin,Completed,7-Jul-22,Release,visualization,Mapping,pixelate,Assigned,Bob Verity,,Pending,,"Aimee Taylor, James Watson, Caroline Buckee",https://github.com/aimeertaylor/pixelate,https://github.com/aimeertaylor/pixelate,NA,R package,CRAN install,pixelates geographic maps according to levels of uncertainty,An R package,https://doi.org/10.48550/arXiv.2005.11993,2020,NA,NA,github repo,https://github.com/aimeertaylor/pixelate,vignettes folder of github repo,https://rdrr.io/github/artaylor85/pixelate/ -Alfred Simkin,Completed,13-Nov-15,Release,analysis,MOI estimation,estMOI,Done,Jody Phelan,NA,Pending,,Samuel Assefa,https://sourceforge.net/projects/estmoi/,https://github.com/sammy-assefa/estMOI/tree/master,NA,perl script,download perl script,estimates multiplicity of infection,seems that it maybe used to be a website and is now a perl script,NA,NA,https://doi.org/10.1093/bioinformatics/btu005,2014,pdf in github,https://github.com/sammy-assefa/estMOI/blob/master/README.pdf,NA,NA -Jason Hendry,Completed,21-Feb-21,Release,simulation,GenEpi Simulation,forward-dream,Simulator,NA,,NA,,"Jason Hendry, Gil McVean",https://github.com/JasonAHendry/fwd-dream,https://github.com/JasonAHendry/fwd-dream,NA,Python,conda,Simulates malaria transmission and evolution,NA,https://doi.org/10.1101/2020.08.27.269928,2020,https://doi.org/10.1371/journal.pcbi.1009287,2021,github,https://github.com/JasonAHendry/fwd-dream,NA,NA -Kirsty McCann,Completed,22-Nov-23,Release,analysis,MOI estimation,moimix,Superseded,Sophie Berube,Tool does not estimate MOI it only differentiates poly/monoclonal infections and therefore is superseded by other MOI tools. ,Pending,,Stuart Lee,https://github.com/bahlolab/moimix,https://github.com/bahlolab/moimix,NA,R package,BiocManager,"Estimate MOI, heterzyosity, call major calls",NA,NA,NA,NA,NA,github,https://github.com/bahlolab/moimix/blob/master/README.md,vignette within package,https://github.com/bahlolab/moimix/blob/master/vignettes/introduction.Rmd -Nick Hathaway,Completed,21-Nov-23,Release,simulation,,hrp2malaRia,Simulator,NA,,NA,,OJ Watson,,https://github.com/OJWatson/hrp2malaRia,,,,,,,,https://doi.org/10.7554/eLife.25008,2017,,,, -Nick Hathaway,Completed,21-Nov-23,Release,analysis,,hmmibdr,Relegated,,need c and python scripts to produce input files from raw VCFs,Pending,,"OJ Watson, Steve Schaffner",,https://github.com/OJWatson/hmmibdr,,,,,,,,,,,,, -Bob Verity,Completed,12-Dec-23,Release,analysis,Basic popgen analysis of MIP data,MIPanalyzer,Assigned,,,Pending,,"Bob Verity, OJ Watson, Nick Brazeau",https://github.com/mrc-ide/MIPanalyzer,https://github.com/mrc-ide/MIPanalyzer,https://mrc-ide.github.io/MIPanalyzer/index.html,R package,devtools install,"Filtering of MIP data, PCA, pairwise genetic distances",NA,NA,NA,NA,NA,website,https://mrc-ide.github.io/MIPanalyzer/index.html,website,https://mrc-ide.github.io/MIPanalyzer/index.html -Bob Verity,Completed,11-Dec-23,Beta,simulation,Simulation of SNP data in demes,PlasmoSim,Simulator,NA,,NA,,Bob Verity,https://github.com/mrc-ide/PlasmoSim,https://github.com/mrc-ide/PlasmoSim,https://mrc-ide.github.io/PlasmoSim/,R package,devtools install,"Simulation of genetic data (SNPs, haplotypes) or ancestry (IBD). Options for human movement between demes.",NA,NA,NA,NA,NA,website,https://mrc-ide.github.io/PlasmoSim/,website,https://mrc-ide.github.io/PlasmoSim/ -Kirsty McCann,Completed,22-Nov-23,Beta,analysis,Population genetic simulation,polySimIBD,Simulator,NA,,NA,,"Nicholas Brazeau, Bob Verity",https://github.com/nickbrazeau/polySimIBD,https://github.com/nickbrazeau/polySimIBD,NA,R package,"remotes, github install",forwards in-time simulation of pop gen,under development,NA,NA,NA,NA,github,https://github.com/nickbrazeau/polySimIBD,vignette,https://github.com/nickbrazeau/polySimIBD/tree/master/vignettes -Bob Verity,Completed,10-Dec-23,Development,analysis,"Estimate deme-level relatedness, accounting for migration",discent,In development,NA,,NA,,"Nicholas Brazeau, Bob Verity",https://github.com/nickbrazeau/discent,https://github.com/nickbrazeau/discent,NA,R package,vignettes inside repos,Estimates deme inbreeding spatial coefficient (DISC),NA,NA,NA,NA,NA,github,https://github.com/nickbrazeau/discent,NA,NA -Shazia Ruybal,Completed,9-Nov-23,Development,analysis,MOI estimation,SNP-slice,In development,NA,,NA,,"Nianqiao Ju, Jiawei Liu, Qixin He",NA,NA,NA,NA,NA,"Reconstruction of SNP haplotypes, estimation of COI",Reconstruction of SNP haplotypes allows for inference of haplotype/strain frequencies,https://doi.org/10.1101/2023.07.29.551098,2023,NA,NA,NA,NA,NA,NA -Shazia Ruybal,Completed,9-Nov-23,Release,analysis,Relatedness estimation (genetic similarity),BRO,Relegated,NA,Not a packaged open-source tool,NA,,Dan Larremore,https://github.com/dblarremore/BayesianRepertoireOverlap,https://github.com/dblarremore/BayesianRepertoireOverlap,https://bro.colorado.edu,Python scripts (also packaged as an online tool),Fork repo and run scripts locally,Relatedness estimation with uncertainty,Developed for var repertoires,NA,NA,https://doi.org/10.1371/journal.pcbi.1006898,2019,github repo,https://github.com/dblarremore/BayesianRepertoireOverlap,NA,NA -Shazia Ruybal,Completed,9-Nov-23,Development,analysis,Relatedness estimation,Pv3Rs,In development,NA,,NA,,"Aimee Taylor, Yong See Foo",https://github.com/aimeertaylor/Pv3Rs,https://github.com/aimeertaylor/Pv3Rs,NA,R package,devtools installation from github repo,Relatedness estimation with uncertainty ,compute_posterior() function could also be used for Pf by setting prior probability of relapse=0,https://doi.org/10.1101/2022.11.23.22282669,2022,NA,NA,within package,NA,Vignettes within package,NA -Bryan Greenhouse,Completed,3-Dec-23,Development,analysis,TES outcome classification,"""CDC Bayesian algorithm""",In development,NA,,NA,,Mateusz Plucinski,https://github.com/GTJuniorDesign0100-2020/anti-malarial-MCMC-bayesian-algorithm,https://github.com/GTJuniorDesign0100-2020/anti-malarial-MCMC-bayesian-algorithm,NA,Python script,GitHub clone and run scripts,Estimate TES outcomes from genotyping data,This version is NOT designed for NGS data but length polymorphism data using capillary electrophoresis. There is a version with minor modifications adapted from this; Monica Golumbeanu @SwissTPH hopefully getting a clean version on a GitHub repo soon,NA,NA,https://doi.org/10.1128%2FAAC.00072-15,2015,github repo,https://github.com/GTJuniorDesign0100-2020/anti-malarial-MCMC-bayesian-algorithm/blob/master/README.md,Help in package and github readme,NA -Jorge Amaya,Completed,7-Sep-23,Release,analysis,Amplicon Analysis ,AmpSeq,In development,NA,Too upstream,NA,,"Jorge Amaya, Ruchit Panchal, Phillip Schwabl, Pablo Manrique",https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,Python scripts,GitHub Clone and run scripts,Denoising and amplicon assembly,NA,NA,NA,https://doi.org/10.1111/1755-0998.13622,2022,github repo,https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,NA,NA -Nick Hathaway,In progress,21-Nov-23,Development,simulation,Amplicon Simulation,AmpSim,In development,NA,,NA,,Nicholas Hathaway,https://github.com/nickjhathaway/elucidator,https://github.com/nickjhathaway/elucidator,,C++,"GitHub Clone, docker, vagrant",Simulating PCR and sequencing noise ,,,,,,,,, -Karamoko Niaré ,Completed,15-Sep-21,Release,analysis,Selection (extended haplotype heterozygosity),rehh,Done,,,Pending,,"Alexander Klassmann, Mathieu Gautier, Renaud Vitalis",https://cran.r-project.org/web/packages/rehh/index.html,https://gitlab.com/oneoverx/rehh,NA,R package ,devtools installation ,Estmimating and plotting EHH/iHS,NA,NA,NA,"https://doi.org/10.1093/bioinformatics/bts115, https://doi.org/10.1111/1755-0998.12634","2012, 2017",gitlab repo,https://gitlab.com/oneoverx/rehh,Vignette,https://cran.r-project.org/web/packages/rehh/vignettes/rehh.html -Karamoko Niaré ,Completed,5-May-22,Release,analysis,Estimation and Tests of Hierarchical F-Statistics,Hierfstat,Done,,,Pending,,"Jerome Goudet, Thibaut Jombart, Zhian N. Kamvar, Eric Archer, Olivier Hardy",https://github.com/jgx65/hierfstat,https://cran.r-project.org/web/packages/hierfstat/index.html,https://rdrr.io/cran/hierfstat/man/hierfstat.html,R package ,devtools installation ,Different types of F-statistics (per locus and pairwise),NA,NA,NA,https://doi.org/10.1111/j.1471-8286.2004.00828.x,2004,github repo,https://github.com/jgx65/hierfstat,Vignette,https://cran.r-project.org/web/packages/hierfstat/vignettes/hierfstat.html -Karamoko Niaré ,Completed,29-Aug-23,Development,analysis,Estimating microhaplotype heterozygosity,MicroHaploHet,In development,NA,,NA,,Karamoko Niare,NA,NA,NA,Bash/Python,GitHub Clone,Microhaplotype heterozygosity and microhaplotype-based PCA,NA,NA,NA,NA,NA,github repo,NA,NA,NA \ No newline at end of file +Tool,Authors,Stage,Theme,Major_use_case,Tutorial on PGEforge,Reason_no_tutorial,Software_download_link,Repos_link,Webpage_link,Implementation,Installation,Functionalities,Functionality_notes,Preprint_DOI,Preprint_year,Publication_DOI,Publication_year,Documentation_location,Documentation_link,Tutorials_location,Tutorials_link,Run settings/parameters output,Runs on windows,Runs on Mac Intel,Runs on Mac ARM,Runs on Linux,Installation via build from source,"Installation via pre-built from repo (e.g. PIP, Conda, CRAN)",Installation via container,Installation via direct download,GitHub branch rules exist,Master branch protected,Auto tests required for branch merging,Run via GUI,Run via command line,"Implementation language (e.g. C++, R, Python)",Command line language,Last_updated +MLMOI,"Meraj Hashemi, Kristan Schneider",Release,analysis,MOI estimation,no,This tool was relegated due to difficulty with installation as the package has been removed from CRAN. Manual installation instructions were difficult to follow. ,https://cran.r-project.org/src/contrib/Archive/MLMOI/,https://github.com/Maths-against-Malaria/MOI-Bias-correction,NA,R package,Manual package install from .tar ball (removed from CRAN archive),"MOI estimation, Allele frequency estimation",NA,NA,NA,https://doi.org/10.1371/journal.pone.0261889,2021,within package,NA,within package,NA,,,,,,,,,,,,,,,,,11-Sep-23 +MalHaploFreq,Ian Hastings,Release,analysis,Frequency/prevalence estimation of phased genotypes,no,"This tool has largely been superseeded by MultiLociBiallelicModel, which also estimates prevalence but does not require MOI, can do more loci and estimates the MOI from the input data",http://pcwww.liv.ac.uk/hastings/MalHaploFreq/,NA,NA,.exe file ,Manual download of .exe file,Haplotype frequency estimation,"MOI estimation and prevalence estimates are secondary (essentially untested), program does not work on Mac or Linux",NA,NA,https://doi.org/10.1186/1475-2875-7-130,2008,software website,http://pcwww.liv.ac.uk/hastings/MalHaploFreq/,NA,NA,,,,,,,,,,,,,,,,,06-Dec-23 +malaria.em,Xiaohong Li,Release,analysis,Frequency/prevalence estimation of phased genotypes,no,"The code for this program was removed from CRAN in 2014, does not appear to be activiely maintained and is no longer available for download other than from archives. A program that does something similar is MultiLociBiallelicModel. ",https://cran.r-project.org/src/contrib/Archive/malaria.em/,NA,NA,R package,Manual package install from .tar ball (CRAN archive),"MOI estimation, haplotype frequency estimation",NA,NA,NA,https://doi.org/10.2202/1544-6115.1321,2007,within package,NA,NA,NA,,,,,,,,,,,,,,,,,06-Dec-23 +FreqEstimationModel,Aimee Taylor,Beta,analysis,Frequency/prevalence estimation of phased genotypes,planned,,https://github.com/aimeertaylor/FreqEstimationModel/tree/master,https://github.com/aimeertaylor/FreqEstimationModel,NA,R package,Manual download of files or github clone and run files locally,"MOI, population-level allele and haplotype frequency estimation",Statistical model based on prevalence data. Allows for missing data,NA,NA,https://doi.org/10.1186/1475-2875-13-102,2014,within package,NA,within package,NA,,,,,,no,yes,no,no,,,,no,yes,R,R,06-Dec-23 +MultiLociBiallelicModel,Christian Tsoungui Obama,Beta,analysis,Frequency/prevalence estimation of phased genotypes,yes,,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel,NA,Standalone R scripts,Git clone and run scripts,MOI and haplotype frequency estimation,NA,NA,NA,https://doi.org/10.3389/fepid.2022.943625,2022,github repo,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel,github repo,https://github.com/Maths-against-Malaria/MultiLociBiallelicModel/blob/main/src/SNP_MLE.R,yes,yes,yes,yes,yes,no,no,no,yes,no,no,no,no,yes,R,R,06-Dec-23 +pfmix,John O'Brien,Release,analysis,MOI estimation,no,"Due to limited documentation on installation, we were unable to install and run the tool. The tool has not been updated in seven years ",https://github.com/cascobayesian/pfmix/tree/master,https://github.com/cascobayesian/pfmix,NA,R package,Manual package install from .tar ball,"MOI estimation, strain mixture proportion inference",also inference of proportion 'unexplained' mixture proportions within infections,NA,NA,https://doi.org/10.1371/journal.pcbi.1004824,2016,github repo,https://github.com/cascobayesian/pfmix/tree/master,github repo README,https://github.com/cascobayesian/pfmix/tree/master,,,,,,,,,,,,,,,,,06-Dec-23 +THEREALMcCOIL,Hsiao-Han Chang,Release,analysis,MOI estimation,yes,,https://github.com/EPPIcenter/THEREALMcCOIL,https://github.com/EPPIcenter/THEREALMcCOIL,,Standalone R scripts,Git clone and compile from source files,COI and population allele frequency estimation,Can be used in categorical (SNP either het or hom) or proportional mode (relative signal intensity for each allele used),NA,NA,https://doi.org/10.1371/journal.pcbi.1005348,2017,github repo. Link to gh-page in README that doesn't work (https://eppicenter.github.io/THEREALMcCOIL/),https://github.com/EPPIcenter/THEREALMcCOIL,NA,NA,yes,yes,yes,yes,yes,yes,no,no,no,no,no,no,no,no,R,C++,20-Nov-23 +McCOILR,"Hsiao-Han Chang, OJ Watson",Release,analysis,MOI estimation,no,"This tool is an R wrapper for THEREALMcCOIL, it produces inflated COI estimates as compared to original C++ file from EPPICenter. Presumed seed bug",https://github.com/OJWatson/McCOILR,https://github.com/OJWatson/McCOILR,,R package,devtools install,COI and population allele frequency estimation,Wrapper on THEREALMcCOIL,NA,NA,NA,NA,package and website ,https://ojwatson.github.io/McCOILR/articles/introduction.html,package and website,https://ojwatson.github.io/McCOILR/articles/introduction.html,,,,,,,,,,,,,,,,,20-Nov-23 +coiaf,"Aris Paschalidis, OJ Watson",Release,analysis,MOI estimation,yes,,https://github.com/bailey-lab/coiaf,https://github.com/bailey-lab/coiaf,https://bailey-lab.github.io/coiaf/index.html,R package,devtools install,COI estimation,can get discrete or continuous COI values,NA,NA,https://doi.org/10.1371/journal.pcbi.1010247,2023,package and website ,https://bailey-lab.github.io/coiaf/index.html,package and website,https://bailey-lab.github.io/coiaf/articles/example_real_data.html,no,yes,yes,yes,yes,yes,no,no,no,?,?,?,no,yes,R,R,20-Nov-23 +DEploid,Joe Zhu,Release,analysis,Phasing,no,"This tool has been superseded by DeploidIBD, which is the same program just run without the IBD flag",https://github.com/DEploid-dev/DEploid,https://github.com/DEploid-dev/DEploid,https://deploid.readthedocs.io/en/latest/#,C++,Make,"COI, proportion and haplotype estimation",NA,NA,NA,https://doi.org/10.1093/bioinformatics/btx530,2018,github repo and website,https://deploid.readthedocs.io/en/latest/,github repo and website,https://deploid.readthedocs.io/en/latest/,,,,,,,,,,,,,,,,,21-Apr-20 +Deploid-r,"Joe Zhu, Jacob Almagro-Garcia, Gil McVean",Release,analysis,Phasing,no,This tool is considered relegated as it links to an old version of Deploid,https://github.com/DEploid-dev/DEploid-r,https://github.com/DEploid-dev/DEploid-r,https://deploid.readthedocs.io/en/latest/index.html,R package,devtools install,"COI, proportion and haplotype estimation",Rcpp wrapper for DEploid,NA,NA,NA,NA,github repo,https://github.com/DEploid-dev/DEploid-r,NA,NA,,,,,,,,,,,,,,,,,21-Apr-20 +DEploidIBD,Joe Zhu,Release,analysis,Phasing,yes,,https://github.com/DEploid-dev/DEploid,https://github.com/DEploid-dev/DEploid,https://deploid.readthedocs.io/en/latest/index.html,C++,Make,"COI, proportion, within-sample IBD and haplotype estimation","DEploid with ""-ibd"" flag",NA,NA,https://doi.org/10.7554/eLife.40845,2019,github repo and website,https://deploid.readthedocs.io/en/latest/,github repo and website,https://deploid.readthedocs.io/en/latest/,no,yes,yes,yes,yes,,,yes,,,,,no,yes,C++,bash,20-Mar-22 +isoRelate,Lyndal Henden,Release,analysis,Relatedness estimation,planned,,https://github.com/bahlolab/isoRelate,https://github.com/bahlolab/isoRelate,NA,R package,devtools install,Infering pairwise IBD in haploid species,NA,NA,NA,https://doi.org/10.1371/journal.pgen.1007279,2018,within package,https://github.com/bahlolab/isoRelate/blob/master/README.md ,within package,NA,no,no,no,no,yes? wasm ,no,yes,no,no,,,,no,yes,C++ and R,R,22-Nov-23 +hmmIBD,Steve Schaffner,Release,analysis,Relatedness estimation,yes,,https://github.com/glipsnort/hmmIBD,https://github.com/glipsnort/hmmIBD,NA,C + Python scripts,Compile with any C compiler,Identifies IBD segments between pairs of genomes,Includes scripts to extract and thin data,NA,NA,https://doi.org/10.1186/s12936-018-2349-7,2018,within package,NA,NA,NA,y (to stdout),?,yes,yes,yes,yes,no,no,no,no,no,no,no,yes,C + Python,C + Python,21-Nov-23 +paneljudge,"Aimee Taylor, Pierre Jacob",Beta,study design,Panel design,yes,,https://github.com/aimeertaylor/paneljudge,https://github.com/aimeertaylor/paneljudge,NA,R package,devtools install,performance of a panel of genetic markers designed to test relatedness,can also estimate relatedness between monoclonals,NA,NA,https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13622,2022,within package,https://github.com/aimeertaylor/paneljudge,vignette within package,NA,yes,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,"R, C++",R,11-Dec-23 +dcifer,Inna Gerlovina,Release,analysis,Relatedness estimation,yes,,https://CRAN.R-project.org/package=dcifer,https://github.com/EPPIcenter/dcifer,https://eppicenter.github.io/dcifer/,R package,CRAN install,estimation of relatedness (IBD) between polyclonal infections,in addition to estimating relatedness also allows user to perform statistical inference also produces estimates of COI and population allele frequencies. ,NA,NA,https://doi.org/10.1093/genetics/iyac126,2022,package and website,https://eppicenter.github.io/dcifer/,vignette (CRAN),https://cloud.r-project.org/web/packages/dcifer/vignettes/vignetteDcifer.pdf,,,,,,,,,,,,,,,,,19-Nov-23 +moire,"Maxwell Murphy, Bryan Greenhouse",Release,analysis,MOI estimation + allele frequency + within host relatedness,yes,,https://github.com/EPPIcenter/moire,https://github.com/EPPIcenter/moire,https://eppicenter.github.io/moire/,R package,devtools install,"Estimation of COI, estimation of pop-level allele frequencies",uses MCMC based approaches,https://doi.org/10.1101/2023.10.03.560769,2023,NA,NA,package and website,https://eppicenter.github.io/moire/index.html,vignette,https://eppicenter.github.io/moire/articles/mcmc_demo.html,no,yes,yes,yes,yes,no,yes,no,no,no,no,no,no,yes,C++,R,09-Nov-23 +MALECOT,Bob Verity,Beta,analysis,Population structure,planned,,https://github.com/bobverity/MALECOT,https://github.com/bobverity/MALECOT,https://bobverity.github.io/MALECOT/,R package,devtools install,"Inference of population strucure, estimation of COI, estimation of allele frequencies",NA,NA,NA,NA,NA,package and website,https://bobverity.github.io/MALECOT/index.html,package and website,https://bobverity.github.io/MALECOT/index.html,,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,C++ and R,R,09-Nov-23 +SIMPLEGEN,"Bob Verity, Shazia Ruybal, Isobel Routledge, Sophie Berube, Daniel Larremore",Development,simulation,Transmission simulation,no,This tool is still under development,https://github.com/mrc-ide/SIMPLEGEN/tree/master,https://github.com/mrc-ide/SIMPLEGEN/tree/master,https://mrc-ide.github.io/SIMPLEGEN/,R package,devtools install,transmission simulation of malaria epidemiology and genotypes,NA,NA,NA,NA,NA,package and website,https://mrc-ide.github.io/SIMPLEGEN/index.html,package and website,https://mrc-ide.github.io/SIMPLEGEN/index.html,,,,,,,,,,,,,,,,,09-Nov-23 +magenta,OJ Watson,Beta,simulation,Transmission simulation,no,This tool is a simulator and thus not within the scope of PGEforge genomic data analysis tools,https://github.com/OJWatson/magenta,https://github.com/OJWatson/magenta,https://ojwatson.github.io/magenta/index.html,R package,devtools install,transmission simulation of malaria epidemiology and genotypes,Implements interventions and a basic spatial model,NA,NA,https://doi.org/10.1093/molbev/msaa225,2020,website,https://ojwatson.github.io/magenta/articles/Introduction.html,website,https://ojwatson.github.io/magenta/articles/Introduction.html,,,,,,,,,,,,,,,,,10-Dec-23 +DRpower,"Bob Verity, Shazia Ruybal",Release,study design,pfhrp2/3 study design,yes,,https://github.com/mrc-ide/DRpower,https://github.com/mrc-ide/DRpower,https://mrc-ide.github.io/DRpower/,R package,devtools install,power and sample size calculation for pfhrp2/3 studies,additional interactive web app,NA,NA,NA,NA,package and website,https://mrc-ide.github.io/DRpower/,package and website,https://mrc-ide.github.io/DRpower/,,yes,yes,yes,,no,yes,no,no,,,,no,yes,C++ and R ,R,09-Nov-23 +pixelate,"Aimee Taylor, James Watson, Caroline Buckee",Release,visualization,Mapping,planned,,https://github.com/aimeertaylor/pixelate,https://github.com/aimeertaylor/pixelate,NA,R package,CRAN install,pixelates geographic maps according to levels of uncertainty,An R package,https://doi.org/10.48550/arXiv.2005.11993,2020,NA,NA,github repo,https://github.com/aimeertaylor/pixelate,vignettes folder of github repo,https://rdrr.io/github/artaylor85/pixelate/,,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,R,R,07-Jul-22 +estMOI,Samuel Assefa,Release,analysis,MOI estimation,yes,,https://sourceforge.net/projects/estmoi/,https://github.com/sammy-assefa/estMOI/tree/master,NA,perl script,download perl script,estimates multiplicity of infection,seems that it maybe used to be a website and is now a perl script,NA,NA,https://doi.org/10.1093/bioinformatics/btu005,2014,pdf in github,https://github.com/sammy-assefa/estMOI/blob/master/README.pdf,NA,NA,yes,?,yes,yes,yes,yes,no,no,yes,?,?,?,no,yes,Perl,Perl,13-Nov-15 +forward-dream,"Jason Hendry, Gil McVean",Release,simulation,GenEpi Simulation,no,This tool is a simulator and thus not within the scope of genomic data analysis tools,https://github.com/JasonAHendry/fwd-dream,https://github.com/JasonAHendry/fwd-dream,NA,Python,conda,Simulates malaria transmission and evolution,NA,https://doi.org/10.1101/2020.08.27.269928,2020,https://doi.org/10.1371/journal.pcbi.1009287,2021,github,https://github.com/JasonAHendry/fwd-dream,NA,NA,,,,,,,,,,,,,,,,,21-Feb-21 +moimix,Stuart Lee,Release,analysis,MOI estimation,no,"This tool was considered superseded by other MOI estimation tools as it does not estimate MOI, rather it only differentiates poly/monoclonal infections",https://github.com/bahlolab/moimix,https://github.com/bahlolab/moimix,NA,R package,BiocManager,"Estimate MOI, heterzyosity, call major calls",NA,NA,NA,NA,NA,github,https://github.com/bahlolab/moimix/blob/master/README.md,vignette within package,https://github.com/bahlolab/moimix/blob/master/vignettes/introduction.Rmd,,,,,,,,,,,,,,,,,22-Nov-23 +hrp2malaRia,OJ Watson,Release,simulation,,no,This tool is a simulator and thus not within the scope of genomic data analysis tools,,https://github.com/OJWatson/hrp2malaRia,,,,,,,,https://doi.org/10.7554/eLife.25008,2017,,,,,,,,,,,,,,,,,,,,,21-Nov-23 +hmmibdr,"OJ Watson, Steve Schaffner",Release,analysis,,no,"This tool is an R wrapper for hmmIBD, to install locally need C compiler and to run python scripts to produce input files from raw VCFs. ",,https://github.com/OJWatson/hmmibdr,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,21-Nov-23 +MIPanalyzer,"Bob Verity, OJ Watson, Nick Brazeau",Release,analysis,Basic popgen analysis of MIP data,planned,,https://github.com/mrc-ide/MIPanalyzer,https://github.com/mrc-ide/MIPanalyzer,https://mrc-ide.github.io/MIPanalyzer/index.html,R package,devtools install,"Filtering of MIP data, PCA, pairwise genetic distances",NA,NA,NA,NA,NA,website,https://mrc-ide.github.io/MIPanalyzer/index.html,website,https://mrc-ide.github.io/MIPanalyzer/index.html,,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,C++ and R,R,12-Dec-23 +PlasmoSim,Bob Verity,Beta,simulation,Simulation of SNP data in demes,no,This tool is a simulator and thus not within the scope of genomic data analysis tools,https://github.com/mrc-ide/PlasmoSim,https://github.com/mrc-ide/PlasmoSim,https://mrc-ide.github.io/PlasmoSim/,R package,devtools install,"Simulation of genetic data (SNPs, haplotypes) or ancestry (IBD). Options for human movement between demes.",NA,NA,NA,NA,NA,website,https://mrc-ide.github.io/PlasmoSim/,website,https://mrc-ide.github.io/PlasmoSim/,,,,,,,,,,,,,,,,,11-Dec-23 +polySimIBD,"Nicholas Brazeau, Bob Verity",Beta,analysis,Population genetic simulation,no,This tool is a simulator and thus not within the scope of PGEforge genomic data analysis tools,https://github.com/nickbrazeau/polySimIBD,https://github.com/nickbrazeau/polySimIBD,NA,R package,"remotes, github install",forwards in-time simulation of pop gen,under development,NA,NA,NA,NA,github,https://github.com/nickbrazeau/polySimIBD,vignette,https://github.com/nickbrazeau/polySimIBD/tree/master/vignettes,,,,,,,,,,,,,,,,,22-Nov-23 +discent,"Nicholas Brazeau, Bob Verity",Development,analysis,"Estimate deme-level relatedness, accounting for migration",no,This tool is still under development,https://github.com/nickbrazeau/discent,https://github.com/nickbrazeau/discent,NA,R package,vignettes inside repos,Estimates deme inbreeding spatial coefficient (DISC),NA,NA,NA,NA,NA,github,https://github.com/nickbrazeau/discent,NA,NA,,,,,,,,,,,,,,,,,10-Dec-23 +SNP-slice,"Nianqiao Ju, Jiawei Liu, Qixin He",Development,analysis,MOI estimation,no,This tool is still under development,NA,NA,NA,NA,NA,"Reconstruction of SNP haplotypes, estimation of COI",Reconstruction of SNP haplotypes allows for inference of haplotype/strain frequencies,https://doi.org/10.1101/2023.07.29.551098,2023,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,09-Nov-23 +BRO,Dan Larremore,Release,analysis,Relatedness estimation (genetic similarity),no,This tool was not a packaged open-source tool so we were unable to install,https://github.com/dblarremore/BayesianRepertoireOverlap,https://github.com/dblarremore/BayesianRepertoireOverlap,https://bro.colorado.edu,Python scripts (also packaged as an online tool),Fork repo and run scripts locally,Relatedness estimation with uncertainty,Developed for var repertoires,NA,NA,https://doi.org/10.1371/journal.pcbi.1006898,2019,github repo,https://github.com/dblarremore/BayesianRepertoireOverlap,NA,NA,,,,,,,,,,,,,,,,,09-Nov-23 +Pv3Rs,"Aimee Taylor, Yong See Foo",Development,analysis,Relatedness estimation,no,This tool is still under development,https://github.com/aimeertaylor/Pv3Rs,https://github.com/aimeertaylor/Pv3Rs,NA,R package,devtools installation from github repo,Relatedness estimation with uncertainty ,compute_posterior() function could also be used for Pf by setting prior probability of relapse=0,https://doi.org/10.1101/2022.11.23.22282669,2022,NA,NA,within package,NA,Vignettes within package,NA,,,,,,,,,,,,,,,,,09-Nov-23 +Anti-malarial MCMC Bayesian Algorithm,Mateusz Plucinski,Development,analysis,TES outcome classification,no,This tool is still under development,https://github.com/GTJuniorDesign0100-2020/anti-malarial-MCMC-bayesian-algorithm,https://github.com/GTJuniorDesign0100-2020/anti-malarial-MCMC-bayesian-algorithm,NA,Python script,GitHub clone and run scripts,Estimate TES outcomes from genotyping data,This version is NOT designed for NGS data but length polymorphism data using capillary electrophoresis. There is a version with minor modifications adapted from this; Monica Golumbeanu @SwissTPH hopefully getting a clean version on a GitHub repo soon,NA,NA,https://doi.org/10.1128%2FAAC.00072-15,2015,github repo,https://github.com/GTJuniorDesign0100-2020/anti-malarial-MCMC-bayesian-algorithm/blob/master/README.md,Help in package and github readme,NA,,,,,,,,,,,,,,,,,03-Dec-23 +AmpSeq,"Jorge Amaya, Ruchit Panchal, Phillip Schwabl, Pablo Manrique",Release,analysis,Amplicon Analysis ,no,This tool focuses on upstream bioinformatic data processing and is thus not within the scope of PGEforge genomic data analysis tools,https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,Python scripts,GitHub Clone and run scripts,Denoising and amplicon assembly,NA,NA,NA,https://doi.org/10.1111/1755-0998.13622,2022,github repo,https://github.com/broadinstitute/malaria-amplicon-pipeline/tree/main,NA,NA,,,,,,,,,,,,,,,,,07-Sep-23 +AmpSim,Nicholas Hathaway,Development,simulation,Amplicon Simulation,no,This tool is still under development,https://github.com/nickjhathaway/elucidator,https://github.com/nickjhathaway/elucidator,,C++,"GitHub Clone, docker, vagrant",Simulating PCR and sequencing noise ,,,,,,,,,,,,,,,,,,,,,,,,,,21-Nov-23 +rehh,"Alexander Klassmann, Mathieu Gautier, Renaud Vitalis",Release,analysis,Selection (extended haplotype heterozygosity),no,"This tool focuses on population genetics, not specifically focused on Plasmodium and thus is not within the scope of PGEforge Plasmodium genomic data analysis tools",https://cran.r-project.org/web/packages/rehh/index.html,https://gitlab.com/oneoverx/rehh,NA,R package ,devtools installation ,Estmimating and plotting EHH/iHS,NA,NA,NA,"https://doi.org/10.1093/bioinformatics/bts115, https://doi.org/10.1111/1755-0998.12634","2012, 2017",gitlab repo,https://gitlab.com/oneoverx/rehh,Vignette,https://cran.r-project.org/web/packages/rehh/vignettes/rehh.html,,,,,,no,yes,no,no,,,,no,yes,C and R ,R ,15-Sep-21 +Hierfstat,"Jerome Goudet, Thibaut Jombart, Zhian N. Kamvar, Eric Archer, Olivier Hardy",Release,analysis,Estimation and Tests of Hierarchical F-Statistics,no,"This tool focuses on population genetics, not specifically focused on Plasmodium and thus is not within the scope of PGEforge Plasmodium genomic data analysis tools",https://github.com/jgx65/hierfstat,https://cran.r-project.org/web/packages/hierfstat/index.html,https://rdrr.io/cran/hierfstat/man/hierfstat.html,R package ,devtools installation ,Different types of F-statistics (per locus and pairwise),NA,NA,NA,https://doi.org/10.1111/j.1471-8286.2004.00828.x,2004,github repo,https://github.com/jgx65/hierfstat,Vignette,https://cran.r-project.org/web/packages/hierfstat/vignettes/hierfstat.html,no,yes,yes,yes,yes,yes,yes,no,no,?,?,?,no,yes,R,R,05-May-22 +MicroHaploHet,Karamoko Niare,Development,analysis,Estimating microhaplotype heterozygosity,no,This tool is still under development,NA,NA,NA,Bash/Python,GitHub Clone,Microhaplotype heterozygosity and microhaplotype-based PCA,NA,NA,NA,NA,NA,github repo,NA,NA,NA,,,,,,,,,,,,,,,,,29-Aug-23 \ No newline at end of file diff --git a/website_docs/tables/Objective_software_standards.csv b/website_docs/tables/Objective_software_standards.csv new file mode 100644 index 0000000..2a6908a --- /dev/null +++ b/website_docs/tables/Objective_software_standards.csv @@ -0,0 +1,9 @@ +criteria,type,notes +Installation instructions exist,binary, +Usage instructions exist,binary, +Tutorials exist,binary,"Must have explanation of inputs, outputs, and test data set in a worked example" +Test data sets and results available,binary, +Open source,binary, +Has software tests,binary,"Eg, unit tests" +More than 90% code coverage reported,binary, +Clear channels for software maintenance and issues,binary,"Eg, GitHub issues, author contact information" \ No newline at end of file diff --git a/website_docs/tables/Tools_to_standards.csv b/website_docs/tables/Tools_to_standards.csv index 48086af..5052e57 100644 --- a/website_docs/tables/Tools_to_standards.csv +++ b/website_docs/tables/Tools_to_standards.csv @@ -1,41 +1,41 @@ -Tool,Version controlled,Installation instructions exist,Usage instructions exist,Tutorials exist,Interactive tutorials exist,Test data sets and results available,Has unit tests,Code coverage reported,% code covered,Software actively maintained,Run settings/parameters output,Runs on windows,Runs on Mac Intel,Runs on Mac ARM,Runs on Linux,Open source,Installation via build from source,"Installation via pre-built from repo (e.g. PIP, Conda, CRAN)",Installation via container,Installation via direct download,GitHub branch rules exist,Master branch protected,Auto tests required for branch merging,Run via GUI,Run via command line,"Implementation language (e.g. C++, R, Python)",Command line language -MLMOI,relegated,,,,,,,,,,,,,,,,,,,,,,,,,, -MalHaploFreq,superseded ,,,,,,,,,,,,,,,,,,,,,,,,,, -malaria.em,relegated,,,,,,,,,,,,,,,,,,,,,,,,,, -FreqEstimationModel,yes,yes,not really ,yes ,no,test data - vignette generates results,no,no,NA,yes,,,,,,yes,no,yes,no,no,,,,no,yes,R,R -MultiLociBiallelicModel,no,yes,yes,no,no,yes,no,NA,NA,yes,yes,yes,yes,yes,yes,yes,no,no,no,yes,no,no,no,no,yes,R,R -pfmix,relegated,,,,,,,,,,,,,,,,,,,,,,,,,, -THEREALMcCOIL,yes,yes,yes,no,no,test data; not results,no,no,no,no,yes,yes,yes,yes,yes,yes,yes,no,no,no,no,no,no,no,no,R,C++ -McCOILR,relegated,,,,,,,,,,,,,,,,,,,,,,,,,, -coiaf,yes,yes,yes,yes - not comprehensive,no,no,yes,no,NA,yes,no,yes,yes,yes,yes,yes,yes,no,no,no,?,?,?,no,yes,R,R -DEploid,superseded,,,,,,,,,,,,,,,,,,,,,,,,,, -Deploid-r,relegated,,,,,,,,,,,,,,,,,,,,,,,,,, -DEploidIBD,yes,yes,yes,no,no,yes - there is input and example output figures. But no files for output you could compare directly ,yes,no,NA,contact author sha.joe.zhu@gmail.com joe.zhu@roche.com,no,yes,yes,yes,yes,yes,,,yes,,,,,no,yes,C++,bash -isoRelate,yes,yes,yes,yes,no,"no output, but could compare to figures in tutorial",no,no,NA,contact author - lyndal_henden@hotmail.com,no,no,no,no,yes? wasm ,yes,no,yes,no,no,,,,no,yes,C++ and R,R -hmmIBD,yes,yes,yes,yes,no,yes,no,no,NA,yes,y (to stdout),?,yes,yes,yes,yes,yes,no,no,no,no,no,no,no,yes,C + Python,C + Python -paneljudge,yes,yes,yes,yes,no,,(not in standard R package test format but exist),no,NA,yes (has been 10 months since last commit),yes,yes,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,"R, C++",R -dcifer,yes,yes,yes,yes,no,,,,,,,,,,,,,,,,,,,,,, -moire,yes,yes,yes,yes,no,yes,yes,no,NA,yes,no,yes,yes,yes,yes,yes,no,yes,no,no,no,no,no,no,yes,C++,R -MALECOT,yes,yes,yes,yes,no,no - tutorial uses simulated data ,yes,no,NA,contact author :) ,,yes,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,C++ and R,R -SIMPLEGEN,in development ,,,,,,,,,,,,,,,,,,,,,,,,,, -magenta,simulator ,,,,,,,,,,,,,,,,,,,,,,,,,, -DRpower,yes,yes,yes,yes,no,,yes,no ,NA,yes,,yes,yes,yes,,yes,no,yes,no,no,,,,no,yes,C++ and R ,R -pixelate,no,yes,yes,yes,no,,no,no,NA,contact author ,,yes,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,R,R -estMOI,yes,no,yes,no,no,no,no,no,NA,no,yes,?,yes,yes,yes,yes,yes,no,no,yes,?,?,?,no,yes,Perl,Perl -forward-dream,simulator,,,,,,,,,,,,,,,,,,,,,,,,,, -moimix,superseded,,,,,,,,,,,,,,,,,,,,,,,,,, -hrp2malaRia,simulator,,,,,,,,,,,,,,,,,,,,,,,,,, -hmmibdr,relegated,,,,,,,,,,,,,,,,,,,,,,,,,, -MIPanalyzer,yes,yes,no?,yes,no,yes,yes,no,NA,yes,,yes,yes,yes,yes,yes,no,yes,no,no,,,,no,yes,C++ and R,R -PlasmoSim,simulator,,,,,,,,,,,,,,,,,,,,,,,,,, -polySimIBD,simulator,,,,,,,,,,,,,,,,,,,,,,,,,, -discent,in development ,,,,,,,,,,,,,,,,,,,,,,,,,, -SNP-slice,in development ,,,,,,,,,,,,,,,,,,,,,,,,,, -BRO,relegated,,,,,,,,,,,,,,,,,,,,,,,,,, -Pv3Rs,in development ,,,,,,,,,,,,,,,,,,,,,,,,,, -"""CDC Bayesian algorithm""",in development ,,,,,,,,,,,,,,,,,,,,,,,,,, -AmpSeq,in development ,,,,,,,,,,,,,,,,,,,,,,,,,, -AmpSim,in development ,,,,,,,,,,,,,,,,,,,,,,,,,, -rehh,yes,yes,no?,yes,no,yes,yes,no,NA,contact author ,,,,,,yes,no,yes,no,no,,,,no,yes,C and R ,R -Hierfstat,yes,yes,yes,yes,no,yes,no,no,NA,yes,no,yes,yes,yes,yes,yes,yes,yes,no,no,?,?,?,no,yes,R,R -MicroHaploHet,in development,,,,,,,,,,,,,,,,,,,,,,,,,, \ No newline at end of file +tool,usercriteria_installation_instructions_exist,notes_installation,userscore_installation,usercriteria_usage_instructions_exist,notes_usage,userscore_usage,usercriteria_tutorials_exist,notes_tutorials,userscore_tutorials,usercriteria_test_datasets_and_results_available,notes_test_data,userscore_test_data,devcriteria_open_source,notes_open_source,devscore_open_source,devcriteria_has_software_tests,notes_software_tests,devscore_software_tests,devcriteria_more_than_90%_code_coverage_reported,notes_code_coverage_reported,devscore_code_coverage,devcriteria_clear_channels_for_software_maintenance_and_issues,notes_channels,devscore_channels +MLMOI,,,,,,,,,,,,,,,,,,,,,,,, +MalHaploFreq,,,,,,,,,,,,,,,,,,,,,,,, +malaria.em,,,,,,,,,,,,,,,,,,,,,,,, +FreqEstimationModel,yes,,2,yes,Not comprehensive,1,yes ,,2,yes,test data - vignette generates results,1,yes,,2,no,,0,no,,0,yes,,2 +MultiLociBiallelicModel,yes,,2,yes,,2,no,,0,yes,,2,yes,,2,no,,0,no,,0,yes,,2 +pfmix,,,,,,,,,,,,,,,,,,,,,,,, +THEREALMcCOIL,yes,,2,yes,,2,no,,0,yes,test data; not results,1,yes,,2,no,,0,no,,0,no,,0 +McCOILR,,,,,,,,,,,,,,,,,,,,,,,, +coiaf,yes,,2,yes,,2,yes,Not comprehensive,1,no,,0,yes,,2,yes,,2,no,,0,yes,,2 +DEploid,,,,,,,,,,,,,,,,,,,,,,,, +Deploid-r,,,,,,,,,,,,,,,,,,,,,,,, +DEploidIBD,yes,,2,yes,,2,no,,0,yes,There is input and example output figures. But no files for output you could compare directly ,1,yes,,2,yes,,2,no,,0,yes,contact author sha.joe.zhu@gmail.com joe.zhu@roche.com,2 +isoRelate,yes,,2,yes,,2,yes,,2,no,"no output, but could compare to figures in tutorial",0,yes,,2,no,,0,no,,0,yes,contact author - lyndal_henden@hotmail.com,2 +hmmIBD,yes,,2,yes,,2,yes,,2,yes,,2,yes,,2,no,,0,no,,0,yes,,2 +paneljudge,yes,,2,yes,,2,yes,,2,yes,,2,yes,,2,yes,Not in standard R package test format but exist,1,no,,0,yes,contact author,2 +dcifer,yes,,2,yes,,2,yes,,2,yes,,2,yes,,2,no,,0,no,,0,yes,,2 +moire,yes,,2,yes,,2,yes,,2,yes,,2,yes,,2,yes,,2,no,,0,yes,,2 +MALECOT,yes,,2,yes,,2,yes,,2,no,tutorial uses simulated data ,0,yes,,2,yes,,2,no,,0,yes,contact author,2 +SIMPLEGEN,,,,,,,,,,,,,,,,,,,,,,,, +magenta,,,,,,,,,,,,,,,,,,,,,,,, +DRpower,yes,,2,yes,,2,yes,,2,no,,0,yes,,2,yes,,2,no ,,0,yes,,2 +pixelate,yes,,2,yes,,2,yes,,2,yes,test data - vignette generates results,1,yes,,2,no,,0,no,,0,yes,contact author ,2 +estMOI,no,,0,yes,,2,no,,0,no,,0,yes,,2,no,,0,no,,0,no,,0 +forward-dream,,,,,,,,,,,,,,,,,,,,,,,, +moimix,,,,,,,,,,,,,,,,,,,,,,,, +hrp2malaRia,,,,,,,,,,,,,,,,,,,,,,,, +hmmibdr,,,,,,,,,,,,,,,,,,,,,,,, +MIPanalyzer,yes,,2,no,unable to find,0,yes,,2,yes,,2,yes,,2,yes,,2,no,,0,yes,,2 +PlasmoSim,,,,,,,,,,,,,,,,,,,,,,,, +polySimIBD,,,,,,,,,,,,,,,,,,,,,,,, +discent,,,,,,,,,,,,,,,,,,,,,,,, +SNP-slice,,,,,,,,,,,,,,,,,,,,,,,, +BRO,,,,,,,,,,,,,,,,,,,,,,,, +Pv3Rs,,,,,,,,,,,,,,,,,,,,,,,, +Anti-malarial MCMC Bayesian Algorithm,,,,,,,,,,,,,,,,,,,,,,,, +AmpSeq,,,,,,,,,,,,,,,,,,,,,,,, +AmpSim,,,,,,,,,,,,,,,,,,,,,,,, +rehh,yes,,2,no,unable to find,0,yes,,2,yes,,2,yes,,2,yes,,2,no,,0,yes,contact author ,2 +Hierfstat,yes,,2,yes,,2,yes,,2,yes,,2,yes,,2,no,,0,no,,0,yes,,2 +MicroHaploHet,,,,,,,,,,,,,,,,,,,,,,,, \ No newline at end of file diff --git a/website_docs/tool_landscaping.qmd b/website_docs/tool_landscaping.qmd index 42d7997..dfe1a7e 100644 --- a/website_docs/tool_landscaping.qmd +++ b/website_docs/tool_landscaping.qmd @@ -3,9 +3,19 @@ title: "Tool landscaping" format: html --- +## Overview +In line with the scope of PGEforge, we focus our efforts on landscaping available tools that are commonly applied to *Plasmodium* genetic data and that focus on downstream analysis. Tools are considered within this scope if they: + +- Focus on downstream analysis tools. This includes tools whose primary goal is to extract signal from pre-processed data, but does not include tools that are primarily used within upstream bioinformatic steps, such as variant callers and quality filters. +- Focus on *Plasmodium* genetics, including both *P. falciparum* and *P. vivax*. + +In our initial landscaping, we did not consider applications to mosquito genetics or many broader population genetics tools, despite some tools and techniques being applicable for these purposes. However, we encourage contributions to this and anything else within the scope of PGEforge (i.e. *Plasmodium* genomic epidemiology tools), please see our [contributor guidelines](how_to_contribute.qmd) and some of our planned areas of [future work](future_work.qmd). + +## Landscaping matrix ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) -library(tidyverse) +library(dplyr) +library(janitor) library(DT) ``` @@ -16,8 +26,45 @@ matrix <- read.csv("tables/MMS_software_landscaping.csv") ```{r echo=F} matrix_dt <- matrix %>% - select(Software_name, Software_authors, Theme, Stage, Implementation, Functionalities, Landscaping_completed_by, Last_updated, Tutorial_status, Tutorial_by, Reason_relegated_or_superseded) + clean_names(case = "title") %>% + rename("Tutorial on PGEforge" = "Tutorial on Pg Eforge") -datatable(matrix_dt) +datatable(matrix_dt, + extensions = c('FixedColumns', 'FixedHeader'), + options = list( + pageLength = nrow(matrix_dt), + fixedHeader = T, + scrollY = "600px", + scrollX = T, + fixedColumns = list(leftColumns = 2), + dom = "ft", + autoWidth = T + ), + caption = htmltools::tags$caption(style = "caption-side: bottom; text-align: left; font-style: italic", + "Last updated: 2 August 2024"), + class = 'white-space: nowrap' + ) %>% + formatStyle( + "Tool", + target = "cell", + fontWeight = "bold" + ) %>% + htmlwidgets::onRender(" + function(el, x) { + $(el).find('tbody td').css({ + 'border-left': '0.1px solid #ddd', + 'border-right': '0.1px solid #ddd' + }); + $(el).find('tbody td:first-child').css({ + 'border-left': 'none' + }); + $(el).find('tbody tr:odd').css({ + 'background-color': '#f9f9f9' + }); + $(el).find('tbody tr:odd td').css({ + 'background-color': '#f9f9f9' + }); + } + ") ``` diff --git a/website_docs/tools_to_functions.qmd b/website_docs/tools_to_functions.qmd index 169701c..97cfaa9 100644 --- a/website_docs/tools_to_functions.qmd +++ b/website_docs/tools_to_functions.qmd @@ -5,8 +5,10 @@ format: html ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) -library(tidyverse) -library(googlesheets4) +library(dplyr) +library(ggplot2) +library(tidyr) +# library(googlesheets4) library(DT) ``` @@ -15,11 +17,11 @@ library(DT) tool_fxn_df <- read.csv("tables/Tools_to_functions.csv") ``` -```{r echo=F} +```{r echo=F, eval=F} datatable(tool_fxn_df) ``` -```{r echo=F} +```{r echo=F, eval=F} tool_fxn_df %>% pivot_longer(cols = -Tool, names_to = "Functionality", values_to = "presence") %>% ggplot(aes(x = Tool, y = Functionality, fill = presence)) + diff --git a/website_docs/tools_to_standards.qmd b/website_docs/tools_to_standards.qmd index a1d11fe..33e92fb 100644 --- a/website_docs/tools_to_standards.qmd +++ b/website_docs/tools_to_standards.qmd @@ -5,7 +5,9 @@ format: html ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) -library(tidyverse) +library(dplyr) +library(stringr) +library(janitor) library(DT) ``` @@ -14,6 +16,126 @@ library(DT) tool_std_df <- read.csv("tables/Tools_to_standards.csv") ``` -```{r echo=F} -datatable(tool_std_df) -``` \ No newline at end of file +```{r fxns, include=F, echo=F} +calculate_scores <- function(data){ + # max score for user and dev criteria + max_score_user <- data %>% select(starts_with("usercriteria_")) %>% ncol() * 2 + max_score_dev <- data %>% select(starts_with("devcriteria_")) %>% ncol() * 2 + + # calculate end-user score + scores <- data %>% + rowwise() %>% + mutate(total_user_score = sum(c_across(starts_with("userscore_"))), + total_dev_score = sum(c_across(starts_with("devscore_"))), + total_score = total_user_score + total_dev_score, + total_user_percentage = (total_user_score/max_score_user)*100, + total_dev_percentage = (total_dev_score/max_score_dev)*100) %>% + ungroup() %>% + select(tool, total_user_score, total_user_percentage, total_dev_score, total_dev_percentage) + + return(scores) +} +``` + + +```{r calc_scores, echo=F} +scores <- calculate_scores(tool_std_df) + +tool_std_summary <- tool_std_df %>% + left_join(scores, by = "tool") %>% + rename_with(~ str_replace_all(., c( + "^usercriteria_" = "", + "^devcriteria_" = "", + "^userscore_" = "score_", + "^devscore_" = "score_" + ))) %>% + relocate(c(total_user_score), .before = open_source) %>% + relocate(c(total_dev_score), .after = score_channels) %>% + relocate(c(total_user_percentage, total_dev_percentage), .after = tool) %>% + clean_names(case = "title") %>% + rename("Total end-user score" = "Total User Score", + "Total development score" = "Total Dev Score", + "User score (%)" = "Total User Percentage", + "Developer score (%)" = "Total Dev Percentage") +``` + +---------- + +The following *Plasmodium* genomic analysis tools were identified during [tool landscaping](tool_landscaping.qmd) and were evaluating using the [software standards criteria](software_standards.qmd) to determine an end-user and development score for each tool. + +In line with the scope of PGEforge, we focus our efforts on evaluating available tools that are commonly applied to *Plasmodium* genetic data and that focus on downstream analysis. In other words tools with the primary goal of extracting signal from pre-processed data, not those focused on upstream bioinformatic data processing. + +If you would like to contribute to this effort, please take a look at our [contributor guidelines](how_to_contribute.qmd)! + + +*Note: tools in grey have not yet been evaluated* +```{r table_summary, echo=F} +datatable(tool_std_summary, + extensions = c('FixedColumns', 'FixedHeader'), + options = list( + pageLength = nrow(tool_std_df), + fixedHeader = T, + scrollY = "600px", + scrollX = T, + fixedColumns = list(leftColumns = 2), + dom = "ft", + autoWidth = T + ), + caption = htmltools::tags$caption(style = "caption-side: bottom; text-align: left; font-style: italic", + "Last updated: 2 August 2024"), + class = 'white-space: nowrap' + ) %>% + formatStyle( + "User score (%)", + backgroundColor = styleInterval( + c(25, 50), + c('#FFCCCC', '#FFF0CC', '#CCFFCC') + ), + color = styleInterval( + c(25, 50), + c('#CC3333', '#FFB84D', '#339933') + ), + fontWeight = "bold" + ) %>% + formatStyle( + "User score (%)", + target = "cell", + backgroundColor = styleEqual(c(NA), c("#D3D3D3")) + ) %>% + formatStyle( + "Developer score (%)", + backgroundColor = styleInterval( + c(25, 50), + c('#FFCCCC', '#FFF0CC', '#CCFFCC') + ), + color = styleInterval( + c(25, 50), + c('#CC3333', '#FFB84D', '#339933') + ), + fontWeight = "bold" + ) %>% + formatStyle( + "Developer score (%)", + target = "cell", + backgroundColor = styleEqual(c(NA), c("#D3D3D3")) + ) %>% + formatStyle( + c("Tool", "Total end-user score", "Total development score"), + target = "cell", + fontWeight = "bold" + ) %>% + htmlwidgets::onRender(" + function(el, x) { + $(el).find('tbody td').css({ + 'border-left': '0.1px solid #ddd', + 'border-right': '0.1px solid #ddd' + }); + $(el).find('tbody td:first-child').css({ + 'border-left': 'none' + }); + $(el).find('tbody tr:odd td:not(:nth-child(3)):not(:nth-child(4))').css({ + 'background-color': '#f9f9f9' + }); + } + ") +``` diff --git a/website_docs/tutorials_overview.qmd b/website_docs/tutorials_overview.qmd index 6005770..83feb9f 100644 --- a/website_docs/tutorials_overview.qmd +++ b/website_docs/tutorials_overview.qmd @@ -1,10 +1,20 @@ --- -title: "Tutorials overview" +title: "Tutorials" +subtitle: "Comprehensive guides covering the entire process of using *Plasmodium* genomic analysis tools" format: html --- -## Tutorial details +## Overview +The aim of the PGEforge tutorials is to provide worked examples that show *how* to use a tool by *working through* code. This involves code showing you how to install the tool, what input data formats you need to use, how to wrangle the data (if applicable), and how to use the tool functionalities to analyse data. Summary documents detailing the purpose of each tool can also help you decide which tool you may want to use for a certain application. -Some details of the various tutorials and the types of tools, eg simulation, analysis etc +The following resources are available for each tool: -Templates for the various documents can be found in the [Template section](https://mrc-ide.github.io/PGEforge/tutorials/Template/Template_background.html) and an example is provided in the [DRpower section](https://mrc-ide.github.io/PGEforge/tutorials/DRpower/DRpower_background.html). \ No newline at end of file +- A summary document detailing the main purpose and use cases, license, code repository, relevant publication(s), citation information and links to any additional resources +- Complete installation instructions +- A fully reproducible and worked-through tutorial showing example usage of the tool and its functionalities. This often uses the [canonical simulated or empirical datasets](data_description.qmd) as input data + +## How to contribute + +This is a live resource and we plan to continue adding to this as new tools become available! We hope this will grow into a common resource for analysis of malaria genetic data. + +If you are interested in contributing, there are [templates available](https://mrc-ide.github.io/PGEforge/tutorials/Template/Template_background.html) and [instructions](how_to_contribute.qmd) on how to get started. \ No newline at end of file diff --git a/website_docs/use_cases.qmd b/website_docs/use_cases.qmd new file mode 100644 index 0000000..28b0d53 --- /dev/null +++ b/website_docs/use_cases.qmd @@ -0,0 +1,5 @@ +--- +title: "COMING SOON!" +format: html +--- + diff --git a/website_docs/workflows.qmd b/website_docs/workflows.qmd index cf6cd24..c2aeea6 100644 --- a/website_docs/workflows.qmd +++ b/website_docs/workflows.qmd @@ -2,7 +2,6 @@ title: "Analysis workflows" format: html --- -# Overview -PGEforge hosts simulated and empirical datasets of: +# COMING SOON!