Update `epi_df.Rmd` rate aggregation, `key_colnames()`, `revision_summary()` #599

brookslogan · 2025-01-24T07:33:40Z

Checklist

Please:

Make sure this PR is against "dev", not "main" (unless this is a release
PR).
[-] Request a review from one of the current main reviewers:
brookslogan, nmdefries.
- Contributing PRs were reviewed individually.
Makes sure to bump the version number in DESCRIPTION. Always increment
the patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
See DEVELOPMENT.md for more information on the development
process.

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

to ease epipredict transition.

* Produce error rather than default selection when user provides a tidyselection in ... but it selects zero columns. * Change time_within_x_latest to take `values` as a vector * Use `.data` instead of `pick` etc. in some places

So it is not misinterpreted as "the amount of time that it has been near the latest".

and avoid unnecessary `abs()`

* This may fix some behaviors and emit more sensible error messages on yearmonths given yearmonth-incompatible settings. * This should express time differences for weekly data in terms of weeks, and may emit errors given weekly-"incompatible" settings. * This appears to be computationally faster (vs. `as.integer(version) - as.integer(time_value)`).

It's probably best to immediately ungroup after performing grouped operations in our documentation, as leaving things grouped accidentally is a source of errors. Sometime we should consider an overhaul to use `by =` and `.by =` where appropriate (sorting effects not needed) and available (not all operations support this syntax yet). There were already 0s in the example data, so "highlight" with words the effects of completion + note one potential surprise in other applications.

- Note `...` optional in args of slide comp fn. - Push toward computations returning tibbles rather than vanilla data.frames. - Highlight `na.rm = TRUE`'s operation (not the only type of 7dsum/7dav), mention we also show sum. - Immediately ungroup output + save a line using new autogroup-ungroup behavior of epi_slide_opt&co.

So that naming, docs, and implementation all match.

… approx

So we can crossref in other internal roxygen without CHECK warning.

* Make `key_colnames.epi_archive` output epikey-time-version rather than just epikey-time. * Make `key_colnames.data.frame` require `other_keys` be provided. * Remove `key_colnames.default`. * Make `key_colnames` forbid passing `exclude` positionally. * Update downstream `revision_summary`.

Fix length -> vec_size. Combining (logical/generic) NA with Dates is apparently slow. Slice with NA_integer_ index instead. Fix docs: dplyr 1.1.0 should also have a generalized dplyr::lag. Removing some dplyr::lag features for speed might be another motivation, but we seem to be faster for some ptypes x sizes and slower for others. Also, don't export this function; we don't need to.

…dices

Various iterations of vec_position_lag seem to be trading off performance and whether they beat dplyr::lag for different classes. dplyr::lag appears to be better-performing than many/all variants tested so far for lagging very long Date and character vectors, like we would do during compactification. We might try speeding up compactification by iterating on some of these variants, something inspired by `check_ukey_unique()`, etc., but that's not the present goal, so just use `dplyr::lag()` for now.

e.g., `distribution`s

Plus add some pluralization and capitalization features.

…ample Fix age group rate aggregation example

Rework `key_colnames`

Co-authored-by: David Weber <david.weber2@pm.me>

…ary-age_agg-updates' into lcb/update-key_colnames.epi_archive

…rchive Update `revision_summary`

brookslogan added 30 commits January 10, 2025 11:30

Fix age group rate aggregation example

c03221e

Make extra_keys = into soft "deprecation" of a different behavior

da28a28

to ease epipredict transition.

fix(revision_summary): use selected value col, not last col

8b5d905

Clarify time_near_latest -> lag_near_latest

8c13f31

So it is not misinterpreted as "the amount of time that it has been near the latest".

fix(revision_summary): consider units&class in lag filter

1a74d75

and avoid unnecessary `abs()`

Fix compactification with dist_quantiles columns

074e0a4

docs(epi_df.Rmd): editing pass on flusurv aggregation update

09bd354

feat(revision_summary): don't autoselect first var if != 1 var

4473acb

Add time_delta_to_approx_difftime() utils function

638b117

fix(revision_summary): make min_waiting_period strict like docs

2fe9b1a

feat(revision_summary)!: make min_waiting_period nonstrict

be19854

So that naming, docs, and implementation all match.

Add difftime_approx_ceiling_time_delta helper, adj yearmonth difftime…

826b927

… approx

Migrate time utils to new file

3509ebc

Add unit_time_delta_fast and time add/sub helpers

ee00d42

Add time_delta standardization helpers

4405ab9

Change *_friendly and *_fast functions to an extra argument

5de2eec

Refactor some time_step <-> n_steps usage for clarity

a1ebd09

feat+fix(revision_summary): expand time_type support + fix helpers

0707e71

Add internal docs for additional time helpers

1558f69

Add tests for default min_waiting_period x several time_types

deb2e8b

Add internal roxygen stub of validate_slide_window_arg

257f69c

So we can crossref in other internal roxygen without CHECK warning.

Fill in some missing @param entries, links in time utils

046fb70

fix: complete partial rename (time_to -> lag_to in globalVariables)

2e86b84

key_colnames: +flexible on dfs, +rigid on edfs, +tsibble, +archive

416a7e8

Fix potential _ formatting issues + update (un)grouping in README.Rmd

2692843

brookslogan and others added 21 commits January 13, 2025 16:43

Experiment with vec_rep vctr NA instead of slicing rep NA_integer_ in…

c89afae

…dices

fix(as_epi_archive): make compactification support more general vctrs

b03fe32

e.g., `distribution`s

refactor(key_colnames.epi_df): NULL -> "actual" default for other_keys

5257ece

feat(key_colnames.data.frame)!: make geo_keys & time_keys mandatory

5f737be

docs(difftime_approx_ceiling_time_delta): correct inequality in title

89f27e5

test(difftime_approx_ceiling_time_delta): add tests

0b77dde

fix(revision_summary): generalize units() usage

a4f498b

Plus add some pluralization and capitalization features.

test(revision_summary): more comments on min_waiting_period default

c4acd18

Merge pull request #591 from cmu-delphi/lcb/fix-age-group-rate-agg-ex…

28ccafb

…ample Fix age group rate aggregation example

Merge pull request #592 from cmu-delphi/lcb/rework-key_colnames

16e1f46

Rework `key_colnames`

style: styler (GHA)

46a348a

docs: document (GHA)

53bf12c

docs(key_colnames.R): insert paragraph break

b73bbed

Co-authored-by: David Weber <david.weber2@pm.me>

docs: document (GHA)

04fac65

style: styler (GHA)

0010bf3

docs(key_colnames): give better idea of possible xs

d35394d

docs(key_colnames): mention classes not supporting *_keys args

0bb9d57

Merge remote-tracking branch 'upstream/lcb/key_colnames-revision_summ…

860281f

…ary-age_agg-updates' into lcb/update-key_colnames.epi_archive

Merge pull request #540 from cmu-delphi/lcb/update-key_colnames.epi_a…

81582cb

…rchive Update `revision_summary`

brookslogan changed the title ~~Update key_colnames(), revision_summary()~~ Update epi_df.Rmd rate aggregation, key_colnames(), revision_summary() Jan 24, 2025

brookslogan and others added 6 commits January 24, 2025 02:27

Update NEWS.md, Version: in DESCRIPTION

3e3c0e4

lint: adjust indentation, nolint objects only in cli interpolation

214e8e3

Update key_colnames tests for new mandatory args

1b263be

Fix missing library(dplyr) in tests

e08dd6a

Update NEWS.md

7eb0e4a

style: styler (GHA)

7571bf3

brookslogan merged commit 47eb129 into dev Jan 24, 2025

brookslogan deleted the lcb/key_colnames-revision_summary-age_agg-updates branch January 24, 2025 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `epi_df.Rmd` rate aggregation, `key_colnames()`, `revision_summary()` #599

Update `epi_df.Rmd` rate aggregation, `key_colnames()`, `revision_summary()` #599

brookslogan commented Jan 24, 2025 •

edited

Loading

Update epi_df.Rmd rate aggregation, key_colnames(), revision_summary() #599

Update epi_df.Rmd rate aggregation, key_colnames(), revision_summary() #599

Conversation

brookslogan commented Jan 24, 2025 • edited Loading

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Update `epi_df.Rmd` rate aggregation, `key_colnames()`, `revision_summary()` #599

Update `epi_df.Rmd` rate aggregation, `key_colnames()`, `revision_summary()` #599

brookslogan commented Jan 24, 2025 •

edited

Loading