Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix age group rate aggregation example #591

Open
wants to merge 3 commits into
base: lcb/key_colnames-revision_summary-age_agg-updates
Choose a base branch
from

Conversation

brookslogan
Copy link
Contributor

@brookslogan brookslogan commented Jan 10, 2025

Checklist

This is part 1 of a 3-part PR. (Checks and lints connect it to the other parts.) A recombined PR will be made to dev. Some of these process things will be handled there. I don't want to do more git surgery. Probably should have just had a couple reviewers for the original PR and pointed to different files.

Please:

  • [-] Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • [-] Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • [-] Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • [-] Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@brookslogan brookslogan changed the base branch from dev to lcb/key_colnames-revision_summary-age_agg-updates January 10, 2025 19:36
@brookslogan
Copy link
Contributor Author

brookslogan commented Jan 10, 2025

CHECK or lints were yelling at me about something in these lines in another PR, but there were also issues with how we did rate aggregation. @nmdefries or @dshemetov are you able to relatively quickly review this so it can unblock the other PRs depending on it? If not I guess I could attempt some more git surgery but I think I may have already done too much trying to isolate this + other parts of that the original PR. [Other PRs' review will probably be slower, so this probably isn't the blocker.]

@brookslogan
Copy link
Contributor Author

brookslogan commented Jan 10, 2025

  • after merging: open PR here to hold the recombined PR with the intended changes and various CHECK+lint+adjacent fixes that were split off into dependent PRs. [just making this one merge to dev; it doesn't need to be part of the recombined PR]

@brookslogan brookslogan changed the base branch from lcb/key_colnames-revision_summary-age_agg-updates to dev January 10, 2025 19:59
@brookslogan brookslogan mentioned this pull request Jan 10, 2025
7 tasks
@brookslogan brookslogan changed the base branch from dev to lcb/key_colnames-revision_summary-age_agg-updates January 10, 2025 20:10
Copy link
Contributor

@nmdefries nmdefries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to summing population-weighted rates in b569131 because I think it's clearer. Please feel free to revert if you don't like it.

I also separated the code into more chunks to better indicate the spot where we actually use sum_groups_epi_df and where we do the check against the reported rate_overall. We should keep the more-separated format whichever calculation approach we use.

group_by(geo_value, time_value) %>%
mutate(count = rate * pop / 100e3) %>%
ungroup() %>%
sum_groups_epi_df(c("count", "pop"), group_cols = group_cols) %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Since the point of this section in the vignette is to give an example use of sum_groups_epi_df, I think this line should be at the beginning of a new code chunk so it's easier for the reader to see.

Second, this approach seems pretty roundabout. Why not calculate the pop-fraction rate for each age group and then sum?

Comment on lines +309 to +317
# compare to published overall rates:
inner_join(
flu_data_api %>%
select(geo_value = location, time_value = epiweek, rate_overall),
by = c("geo_value", "time_value"),
relationship = "one-to-one", unmatched = "error"
)
# What's our maximum error vs. the official overall estimates?
max(abs(rate_overall_recalc_edf$rate_overall - rate_overall_recalc_edf$rate_overall_recalc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: also separate this out into another chunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

epi_df calls key_colnames incorrectly and aggregates rates incorrectly
2 participants