Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dsl2 port #182

Closed
wants to merge 78 commits into from
Closed

dsl2 port #182

wants to merge 78 commits into from

Conversation

phue
Copy link
Member

@phue phue commented Jan 19, 2021

This is some initial effort on porting nf-core/methylseq to dsl2.

My aim was to retain the functionality that was already there, I think that all features from v1.5 are working now.

A breaking change however is, that the pipeline now requires a samplesheet similar to what is already used in nf-core/nanoseq for example. It is supposed to have 4 columns:

sample fastq_1 fastq_2 genome

The idea behind this change is to enable mapping of samples against different references (#181), something that is very useful for certain use cases.
Bonus: the samplesheet makes the single_end parameter obsolete

Would be great to get some opinions on this @nf-core/core

TODOs:

  • update nf-core/test-datasets to reflect the change to samplesheet input methylseq: add samplesheet.csv test-datasets#214
  • cleanup files that are not needed anymore:
    • Dockerfile, environment.yml
    • markdown_to_html.py (?)
  • update github actions, we don't need docker builds anymore
  • figure out why workflow summary has NAs
  • add new modules to nf-core/modules modules for bisulfite sequencing data modules#129
  • sync with latest version of nf-core modules
  • create local modules where needed (especially with the feature outlined in Mapping samples to multiple reference fasta #181), this will be necessary for many of them
  • figure out the right place to do bismark resource adjustments such as this one (should be passed via options.args to the module):

    methylseq/main.nf

    Lines 585 to 615 in 031be37

    multicore = ''
    if( task.cpus ){
    // Numbers based on recommendation by Felix for a typical mouse genome
    if( params.single_cell || params.zymo || params.non_directional ){
    cpu_per_multicore = 5
    mem_per_multicore = (18.GB).toBytes()
    } else {
    cpu_per_multicore = 3
    mem_per_multicore = (13.GB).toBytes()
    }
    // Check if the user has specified this and overwrite if so
    if(params.bismark_align_cpu_per_multicore) {
    cpu_per_multicore = (params.bismark_align_cpu_per_multicore as int)
    }
    if(params.bismark_align_mem_per_multicore) {
    mem_per_multicore = (params.bismark_align_mem_per_multicore as nextflow.util.MemoryUnit).toBytes()
    }
    // How many multicore splits can we afford with the cpus we have?
    ccore = ((task.cpus as int) / cpu_per_multicore) as int
    // Check that we have enough memory, assuming 13GB memory per instance (typical for mouse alignment)
    try {
    tmem = (task.memory as nextflow.util.MemoryUnit).toBytes()
    mcore = (tmem / mem_per_multicore) as int
    ccore = Math.min(ccore, mcore)
    } catch (all) {
    log.debug "Warning: Not able to define bismark align multicore based on available memory"
    }
    if( ccore > 1 ){
    multicore = "--multicore $ccore"
    }
    }
  • create a local MultiQC module
  • test, test, test

ewels and others added 24 commits November 18, 2020 23:12
This code works for ch_multiqc_custom_config - why not here?
* fastqc
* picard/markduplicates
* preseq/lcextrap
* samtools/flagstat
* samtools/index
* samtools/stats
* samtools/sort
* trimgalore
* multiqc
* bismark/genome_preparation
* bismark/align
* bismark/deduplicate
* bismark/extract
* bismark/report
* bismark/summary

TODO: write tests and add to nf-core/modules
* bwameth/align
* bwameth/index

TODO: write tests and add to nf-core/modules
* methyldackel/extract
* methyldackel/mbias

TODO: write tests and add to nf-core modules
TODO: write tests and add to nf-core/modules
TODO: write tests and add to nf-core/modules
this is inspired by the functionality in nf-core/nanoseq and
nf-core/rnaseq
The idea is to require a samplesheet to run the pipeline, which will
allow for single/paired end auto-detection and mapping samples against
different reference genomes.

addresses nf-core#181
TODO: needs change in nf-core/test-datasets
baseDir is deprecated
@drpatelh
Copy link
Member

Ah man! I will never get tired of seeing these initial DSL2 PRs appearing out of nowhere 😍 Looks great by just looking at the file changes!

The fact that nf-core/modules will get more and more padded out is a huge bonus too!

Nice work 🕺🏽

@phue
Copy link
Member Author

phue commented Jan 20, 2021

@drpatelh Your nf-core/rnaseq port was a very helpful guideline to figure out how to do things! Thanks for that 👍

phue added 16 commits March 22, 2021 15:59
* methyldackel/extract
* methyldackel/mbias

TODO: write tests and add to nf-core modules
TODO: write tests and add to nf-core/modules
TODO: write tests and add to nf-core/modules
this is inspired by the functionality in nf-core/nanoseq and
nf-core/rnaseq
The idea is to require a samplesheet to run the pipeline, which will
allow for single/paired end auto-detection and mapping samples against
different reference genomes.

addresses nf-core#181
TODO: needs change in nf-core/test-datasets
the pipeline now requires a samplesheet
@phue phue added the help wanted Extra attention is needed label Mar 22, 2021
@phue
Copy link
Member Author

phue commented Mar 24, 2021

closing this because there is now a dsl2 branch here

@phue phue closed this Mar 24, 2021
@phue phue mentioned this pull request Mar 25, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants