Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Nextclade to assign clade labels in main phylogenetic workflow #131

Open
huddlej opened this issue Nov 28, 2023 · 0 comments
Open

Use Nextclade to assign clade labels in main phylogenetic workflow #131

huddlej opened this issue Nov 28, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented Nov 28, 2023

Context

We currently assign clade labels to trees in our main phylogenetic workflow using the augur clades command and the influenza clade nomenclature TSVs. However, clade assignments vary for some samples between these public/private trees and the Nextclade trees. Clade assignments can vary between different runs of public/private tress from the same time period due to different sample compositions of the trees produced by our random subsampling logic. These mismatches can cause confusion among users who look at both Nextclade outputs for their own data and the public/private Nextstrain trees.

Description

Since Nextclade provides a standard clade label interface already, we should use Nextclade to annotate clades in our main phylogenetic workflow instead of augur clades. This change will ensure that the samples are assigned to the same clade regardless of the sample composition of a given public/private tree.

Possible solutions

In the short term, we could replace our nextalign alignment with nextclade using the corresponding reference's dataset for each subtype. We would need to replace the current augur clades command with functionality like @corneliusroemer proposed in nextstrain/augur#1329 that allows us to assign clades to internal nodes and branches for complete backward compatibility of clade display in Auspice. Instead of inferring clades for internal nodes as a discrete trait, we could consider assigning clades with Nextclade to the inferred ancestral sequences for nodes.

In the long (medium?) term, we could run Nextclade during our "data upload to S3" workflow, upload the alignments and Nextclade annotations joined with metadata, and then start our workflows with those files. This approach would allow us to skip the alignment and clades steps of the current workflow and it would provide useful Nextclade data on S3 that we need for other analyses like flu frequencies, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant