Update `revision_summary` #540

brookslogan · 2024-10-10T17:01:35Z

Checklist

Please:

[-] Make sure this PR is against "dev", not "main" (unless this is a release
PR).
[-] Request a review from one of the current main reviewers:
brookslogan, nmdefries.
[-] Makes sure to bump the version number in DESCRIPTION. Always increment
the patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
[-] Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
See DEVELOPMENT.md for more information on the development
process.

Change explanations for reviewer

Update revision_summary to use new key_colnames.epi_archive.
Fix&tweak some tidyeval stuff in revision_summary.
Tweak some naming in revision_summary.

Other work

todo: tests for additional revision_summary() adjustments
todo: break into 2 dependent PRs

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

to ease epipredict transition.

* Produce error rather than default selection when user provides a tidyselection in ... but it selects zero columns. * Change time_within_x_latest to take `values` as a vector * Use `.data` instead of `pick` etc. in some places

So it is not misinterpreted as "the amount of time that it has been near the latest".

and avoid unnecessary `abs()`

* This may fix some behaviors and emit more sensible error messages on yearmonths given yearmonth-incompatible settings. * This should express time differences for weekly data in terms of weeks, and may emit errors given weekly-"incompatible" settings. * This appears to be computationally faster (vs. `as.integer(version) - as.integer(time_value)`).

It's probably best to immediately ungroup after performing grouped operations in our documentation, as leaving things grouped accidentally is a source of errors. Sometime we should consider an overhaul to use `by =` and `.by =` where appropriate (sorting effects not needed) and available (not all operations support this syntax yet). There were already 0s in the example data, so "highlight" with words the effects of completion + note one potential surprise in other applications.

- Note `...` optional in args of slide comp fn. - Push toward computations returning tibbles rather than vanilla data.frames. - Highlight `na.rm = TRUE`'s operation (not the only type of 7dsum/7dav), mention we also show sum. - Immediately ungroup output + save a line using new autogroup-ungroup behavior of epi_slide_opt&co.

So that naming, docs, and implementation all match.

… approx

dsweber2 · 2025-01-10T21:10:17Z

yes, I'm no longer in 100% forecast pipeline mode. I'll put this as high priority

dsweber2

I added a commit with some minor logic refactors and comments for things I found unclear (some of which came from stuff I wrote long enough ago). lag is definitely a better name for a bunch of these columns than time.

I think committing to using weeks or months for version comes with some assumptions about versioning behavior that we may not want to be making. What happens with weekly data released at a daily cadence, for example?

I do think we could probably put these utilities to use elsewhere, but I'm not sure this is the PR for them.

R/archive.R

R/time-utils.R

R/revision_analysis.R

R/time-utils.R

R/utils.R

tests/testthat/test-revision-latency-functions.R

brookslogan · 2025-01-13T19:15:14Z

I think committing to using weeks or months for version comes with some assumptions about versioning behavior that we may not want to be making. What happens with weekly data released at a daily cadence, for example?

Definitely. But that's what we've already assumed in the archive construction. I was hoping to just make multiple time types work given our current assumptions and handle relaxing this assumption later. Maybe there are two options here:

Try to excise some of the time type stuff from this PR, relax the assumptions first, then re-introduce time type stuff to this function after we've done that.
Work with this PR as-is, and as we relax the assumptions later, revisit the design of some of the time helpers / their usage here.

Preferences / other options?

dsweber2 · 2025-01-13T19:47:27Z

oh right, I forgot that typically when I'm working with weekly data I'm actually working with "daily" data with 7 day gaps. Given that, I don't see strong reasons to not include it.

Fix length -> vec_size. Combining (logical/generic) NA with Dates is apparently slow. Slice with NA_integer_ index instead. Fix docs: dplyr 1.1.0 should also have a generalized dplyr::lag. Removing some dplyr::lag features for speed might be another motivation, but we seem to be faster for some ptypes x sizes and slower for others. Also, don't export this function; we don't need to.

…dices

Various iterations of vec_position_lag seem to be trading off performance and whether they beat dplyr::lag for different classes. dplyr::lag appears to be better-performing than many/all variants tested so far for lagging very long Date and character vectors, like we would do during compactification. We might try speeding up compactification by iterating on some of these variants, something inspired by `check_ukey_unique()`, etc., but that's not the present goal, so just use `dplyr::lag()` for now.

e.g., `distribution`s

Plus add some pluralization and capitalization features.

dsweber2

looks good to me! A few docs suggestions you can take or leave

DESCRIPTION

R/key_colnames.R

Co-authored-by: David Weber <david.weber2@pm.me>

…ary-age_agg-updates' into lcb/update-key_colnames.epi_archive

brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch from 97fdc29 to 052854f Compare October 10, 2024 19:27

brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch from 592c3a2 to 1353df9 Compare October 22, 2024 21:24

brookslogan mentioned this pull request Oct 28, 2024

pass-through on revision_analysis docs #557

Open

brookslogan mentioned this pull request Nov 15, 2024

key_colnames returns wrong values for archive data #565

Closed

brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch 7 times, most recently from 95b3c02 to 02a679c Compare December 20, 2024 06:13

brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch 3 times, most recently from e9b86d4 to bde7d83 Compare January 9, 2025 00:54

brookslogan added 16 commits January 10, 2025 11:30

Make extra_keys = into soft "deprecation" of a different behavior

da28a28

to ease epipredict transition.

fix(revision_summary): use selected value col, not last col

8b5d905

Clarify time_near_latest -> lag_near_latest

8c13f31

So it is not misinterpreted as "the amount of time that it has been near the latest".

fix(revision_summary): consider units&class in lag filter

1a74d75

and avoid unnecessary `abs()`

Fix compactification with dist_quantiles columns

074e0a4

docs(epi_df.Rmd): editing pass on flusurv aggregation update

09bd354

feat(revision_summary): don't autoselect first var if != 1 var

4473acb

Add time_delta_to_approx_difftime() utils function

638b117

fix(revision_summary): make min_waiting_period strict like docs

2fe9b1a

feat(revision_summary)!: make min_waiting_period nonstrict

be19854

So that naming, docs, and implementation all match.

Add difftime_approx_ceiling_time_delta helper, adj yearmonth difftime…

826b927

… approx

Migrate time utils to new file

3509ebc

minor code annotations, some logic ordering

d850083

dsweber2 requested changes Jan 10, 2025

View reviewed changes

brookslogan added 2 commits January 13, 2025 16:37

fix(vec_position_lag): finish incorporating n parameter

c99125d

brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch from 075b608 to 0208a96 Compare January 14, 2025 00:44

brookslogan added 8 commits January 13, 2025 16:47

Experiment with vec_rep vctr NA instead of slicing rep NA_integer_ in…

c89afae

…dices

fix(as_epi_archive): make compactification support more general vctrs

b03fe32

e.g., `distribution`s

refactor(key_colnames.epi_df): NULL -> "actual" default for other_keys

5257ece

feat(key_colnames.data.frame)!: make geo_keys & time_keys mandatory

5f737be

docs(difftime_approx_ceiling_time_delta): correct inequality in title

89f27e5

test(difftime_approx_ceiling_time_delta): add tests

0b77dde

fix(revision_summary): generalize units() usage

a4f498b

Plus add some pluralization and capitalization features.

brookslogan force-pushed the lcb/update-key_colnames.epi_archive branch from db4688e to a4f498b Compare January 15, 2025 23:23

brookslogan requested a review from dsweber2 January 16, 2025 00:06

test(revision_summary): more comments on min_waiting_period default

c4acd18

dsweber2 approved these changes Jan 16, 2025

View reviewed changes

DESCRIPTION Show resolved Hide resolved

R/key_colnames.R Outdated Show resolved Hide resolved

R/key_colnames.R Outdated Show resolved Hide resolved

R/key_colnames.R Outdated Show resolved Hide resolved

Base automatically changed from lcb/rework-key_colnames to lcb/key_colnames-revision_summary-age_agg-updates January 24, 2025 07:46

brookslogan and others added 6 commits January 24, 2025 00:02

docs(key_colnames.R): insert paragraph break

b73bbed

Co-authored-by: David Weber <david.weber2@pm.me>

docs: document (GHA)

04fac65

style: styler (GHA)

0010bf3

docs(key_colnames): give better idea of possible xs

d35394d

docs(key_colnames): mention classes not supporting *_keys args

0bb9d57

Merge remote-tracking branch 'upstream/lcb/key_colnames-revision_summ…

860281f

…ary-age_agg-updates' into lcb/update-key_colnames.epi_archive

brookslogan merged commit 81582cb into lcb/key_colnames-revision_summary-age_agg-updates Jan 24, 2025
1 check passed

brookslogan deleted the lcb/update-key_colnames.epi_archive branch January 24, 2025 09:32

brookslogan mentioned this pull request Jan 24, 2025

Update epi_df.Rmd rate aggregation, key_colnames(), revision_summary() #599

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `revision_summary` #540

Update `revision_summary` #540

brookslogan commented Oct 10, 2024 •

edited

Loading

dsweber2 commented Jan 10, 2025

dsweber2 left a comment

brookslogan commented Jan 13, 2025

dsweber2 commented Jan 13, 2025

dsweber2 left a comment

Update revision_summary #540

Update revision_summary #540

Conversation

brookslogan commented Oct 10, 2024 • edited Loading

Checklist

Change explanations for reviewer

Other work

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dsweber2 commented Jan 10, 2025

dsweber2 left a comment

Choose a reason for hiding this comment

brookslogan commented Jan 13, 2025

dsweber2 commented Jan 13, 2025

dsweber2 left a comment

Choose a reason for hiding this comment

Update `revision_summary` #540

Update `revision_summary` #540

brookslogan commented Oct 10, 2024 •

edited

Loading