Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strains with ambiguous dates are assigned two different dates in auspice output #48

Open
huddlej opened this issue Jan 10, 2020 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@huddlej
Copy link
Contributor

huddlej commented Jan 10, 2020

Fewer than 1% of strains sampled between 2017—2019 have ambiguous dates, but when these strains are included in a seasonal flu build, the pipeline assigns them two different effective dates. The first date is inferred by TreeTime. The second date is read dates from the metadata and converted to numerical values prior to frequency estimation.

These differences lead to examples like A/Malaysia/RP0118/2019 which has an ambiguous date of 2019-XX-XX, a TreeTime-inferred date of January 1, 2019, and a brute-force guess from metadata of June 2019. For this example, the user interface confusingly displays two different dates as shown below. This problem manifests in the forecasting pipeline where tips with non-zero frequencies are used to project the population one year into the future. In the example below, the strain with an ambiguous date is included in the forecast because of its frequency estimation date even though it was most likely circulating 6 months earlier.

image

We could fix this issue in a couple ways:

  1. Exclude all strains with ambiguous dates
  2. Use TreeTime inferred dates for frequency estimation

The first solution seems to be the simplest and since fewer than 1% of recent sequences have ambiguous dates, this filter shouldn’t adversely affect the quality of the flu builds. This solution only requires a change to the seasonal flu build.

The second solution requires a change to a core augur interface. It might be nice in the long run to estimate frequencies from the most accurate information available, though. Tip frequency estimation already requires the Newick tree from augur refine, so adding the branch_lengths.json as an input to frequencies wouldn’t require any major rewiring of existing pipelines.

My preference is for the first solution.

@huddlej huddlej added the bug Something isn't working label Jan 10, 2020
@huddlej huddlej self-assigned this Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant