-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix age group rate aggregation example #591
base: lcb/key_colnames-revision_summary-age_agg-updates
Are you sure you want to change the base?
Fix age group rate aggregation example #591
Conversation
CHECK or lints were yelling at me about something in these lines in another PR, but there were also issues with how we did rate aggregation. @nmdefries or @dshemetov are you able to |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to summing population-weighted rates in b569131 because I think it's clearer. Please feel free to revert if you don't like it.
I also separated the code into more chunks to better indicate the spot where we actually use sum_groups_epi_df
and where we do the check against the reported rate_overall
. We should keep the more-separated format whichever calculation approach we use.
vignettes/epi_df.Rmd
Outdated
group_by(geo_value, time_value) %>% | ||
mutate(count = rate * pop / 100e3) %>% | ||
ungroup() %>% | ||
sum_groups_epi_df(c("count", "pop"), group_cols = group_cols) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Since the point of this section in the vignette is to give an example use of sum_groups_epi_df
, I think this line should be at the beginning of a new code chunk so it's easier for the reader to see.
Second, this approach seems pretty roundabout. Why not calculate the pop-fraction rate for each age group and then sum?
# compare to published overall rates: | ||
inner_join( | ||
flu_data_api %>% | ||
select(geo_value = location, time_value = epiweek, rate_overall), | ||
by = c("geo_value", "time_value"), | ||
relationship = "one-to-one", unmatched = "error" | ||
) | ||
# What's our maximum error vs. the official overall estimates? | ||
max(abs(rate_overall_recalc_edf$rate_overall - rate_overall_recalc_edf$rate_overall_recalc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: also separate this out into another chunk.
Checklist
This is part 1 of a 3-part PR. (Checks and lints connect it to the other parts.) A recombined PR will be made to dev. Some of these process things will be handled there. I don't want to do more git surgery. Probably should have just had a couple reviewers for the original PR and pointed to different files.
Please:
PR).
brookslogan, nmdefries.
DESCRIPTION
. Always incrementthe patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
process.
Change explanations for reviewer
epi_df
callskey_colnames
incorrectly and aggregates rates incorrectly #587Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch
epi_df
callskey_colnames
incorrectly and aggregates rates incorrectly #587