Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run pathogen-embed on HA/NA alignments to flag putative reassortant clades #174

Open
6 tasks
Tracked by #130
huddlej opened this issue Jul 5, 2024 · 0 comments
Open
6 tasks
Tracked by #130
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented Jul 5, 2024

Context

From our work in Nanduri et al., we developed the pathogen-embed tools to project seasonal flu alignments into low-dimensional representations and identify clusters of genetically related sequences. We can use these tools to jointly embed alignments from multiple genes like HA and NA and identify putative reassortment events. The pathogen-embed package is now part of the Nextstrain Docker and Conda environments, so we can easily run these tools from our seasonal flu workflows.

Description

Add rules to the core seasonal flu workflow to annotate HA and NA trees with t-SNE embedding coordinates (tsne_x and tsne_y) using pathogen-distance and pathogen-embed and labels of clusters identified with pathogen-cluster (tsne_label). Calculate distances for each gene segment individually and produce a t-SNE embedding from all distances and alignments together using the optimal settings from Nanduri et al. Then, produce clusters using optimal settings for Nextstrain clades from the same work.

  • Calculate genetic distances per gene alignment with pathogen-distance
  • Generate t-SNE embedding with all gene alignments and distances with pathogen-embed
  • Generate clusters from t-SNE embedding with pathogen-cluster
  • Convert clusters and embedding TSV to node data JSON
  • Annotate all gene trees with clusters and embeddings
  • Update Auspice config JSONs to include colorings for the cluster label and embedding fields
@huddlej huddlej added the enhancement New feature or request label Jul 5, 2024
@huddlej huddlej mentioned this issue Jul 5, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant