Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor workflows to follow Nextstrain standards from ncov workflow #58

Open
2 of 7 tasks
huddlej opened this issue Dec 29, 2020 · 2 comments
Open
2 of 7 tasks
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented Dec 29, 2020

Context

In the near future, we would like to support region-specific flu builds and also enable other researchers to use this repository to define their own custom flu builds in the same way the ncov repository allows custom workflows.

Description

Seasonal flu builds should follow the same general pattern as ncov builds where a standard workflow depends on an initial selection of sequences (based on some subsampling logic), inferring and annotating phylogenies, and then running build-specific steps to finalize the analysis.

Users should be able to configure their own workflows by modifying configuration files (e.g., YAML or JSON) and/or defining custom Snakemake profiles.

Existing parallel Nextstrain builds ("live" and "WHO") should be executable through this framework, such that we are users of our own workflow configuration system.

Examples

See Nextstrain's ncov regional builds definitions for a real example of what we would like to support for flu eventually.

Possible solutions

Specific solutions will require more discussion, but the basic first steps toward addressing this issue would be similar to what we had to do for ncov back in March/April:

  • Move all hardcoded parameters out of Snakefiles and into a configuration file.
  • Define a single top-level Snakefile that references component Snakefiles in workflow/*.smk files including a common.smk for shared functions and builds.smk for the main build logic.
  • Rewrite the WHO builds logic using pre- and post-main workflow rules, using dependencies and a custom profile to manage which rules are executed instead of running a separate Snakefile.
  • Organize outputs by named builds with custom parameters in a configuration files instead of using wildcards
  • Add support for running the workflow on a SLURM cluster and through AWS Batch by annotating rule-specific resources (memory, disk, and threads). (4d2cd21)
  • Define per-rule conda environments to allow custom dependency definitions outside of the canonical Nextstain environment or Docker image. (4d2cd21)
  • Document workflow configuration parameters and basic usage through a tutorial (similar to the ncov tutorial).
@huddlej huddlej added the enhancement New feature or request label Dec 29, 2020
@joverlee521
Copy link
Contributor

Bumping this issue because we were asked during Nextstrain office hours (Oct 7, 2021) if we have plans to create a flu workflow similar to the ncov template/tutorial.

@huddlej
Copy link
Contributor Author

huddlej commented Nov 1, 2021

Additional considerations:

  • Do we want to rewrite the whole workflow from scratch or refactor the existing workflow?
  • Should we continue to consider the WHO builds as a separate workflow or will we eventually be able to run those builds with the “live” workflow?
  • Do we need to create separate trees for each combination of collaborating center and assay type or can we build a single tree and run the titer models multiple times for each combination of data with the same tree?
    • Trees would need to prioritize strains based on all available titer measurements.
  • How do we want to deploy "private" trees for collaborating centers?
    • Private Nextstrain Groups seem like an obvious solution, but should we have one group per CC or a single group for all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants