Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update epi_df.Rmd rate aggregation, key_colnames(), revision_summary() #599

Merged
merged 67 commits into from
Jan 24, 2025

Conversation

brookslogan
Copy link
Contributor

@brookslogan brookslogan commented Jan 24, 2025

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • [-] Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
    • Contributing PRs were reviewed individually.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

See #591, #592, #540.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

* Produce error rather than default selection when user provides a tidyselection
  in ... but it selects zero columns.
* Change time_within_x_latest to take `values` as a vector
* Use `.data` instead of `pick` etc. in some places
So it is not misinterpreted as "the amount of time that it has been near the latest".
* This may fix some behaviors and emit more sensible error messages on
  yearmonths given yearmonth-incompatible settings.
* This should express time differences for weekly data in terms of weeks,
  and may emit errors given weekly-"incompatible" settings.
* This appears to be computationally faster (vs. `as.integer(version) -
  as.integer(time_value)`).
It's probably best to immediately ungroup after performing grouped operations in
our documentation, as leaving things grouped accidentally is a source of errors.
Sometime we should consider an overhaul to use `by =` and `.by =` where
appropriate (sorting effects not needed) and available (not all operations
support this syntax yet).

There were already 0s in the example data, so "highlight" with words the effects
of completion + note one potential surprise in other applications.
- Note `...` optional in args of slide comp fn.
- Push toward computations returning tibbles rather than vanilla data.frames.
- Highlight `na.rm = TRUE`'s operation (not the only type of 7dsum/7dav),
  mention we also show sum.
- Immediately ungroup output + save a line using new autogroup-ungroup behavior
  of epi_slide_opt&co.
So that naming, docs, and implementation all match.
So we can crossref in other internal roxygen without CHECK warning.
* Make `key_colnames.epi_archive` output epikey-time-version rather than just
  epikey-time.
* Make `key_colnames.data.frame` require `other_keys` be provided.
* Remove `key_colnames.default`.
* Make `key_colnames` forbid passing `exclude` positionally.
* Update downstream `revision_summary`.
brookslogan and others added 21 commits January 13, 2025 16:43
Fix length -> vec_size.

Combining (logical/generic) NA with Dates is apparently slow.  Slice with
NA_integer_ index instead.

Fix docs: dplyr 1.1.0 should also have a generalized dplyr::lag. Removing some
dplyr::lag features for speed might be another motivation, but we seem to be
faster for some ptypes x sizes and slower for others.

Also, don't export this function; we don't need to.
Various iterations of vec_position_lag seem to be trading off performance and
whether they beat dplyr::lag for different classes. dplyr::lag appears to be
better-performing than many/all variants tested so far for lagging very long
Date and character vectors, like we would do during compactification. We might
try speeding up compactification by iterating on some of these variants,
something inspired by `check_ukey_unique()`, etc., but that's not the present
goal, so just use `dplyr::lag()` for now.
Plus add some pluralization and capitalization features.
…ample

Fix age group rate aggregation example
Co-authored-by: David Weber <david.weber2@pm.me>
…ary-age_agg-updates' into lcb/update-key_colnames.epi_archive
@brookslogan brookslogan changed the title Update key_colnames(), revision_summary() Update epi_df.Rmd rate aggregation, key_colnames(), revision_summary() Jan 24, 2025
@brookslogan brookslogan merged commit 47eb129 into dev Jan 24, 2025
@brookslogan brookslogan deleted the lcb/key_colnames-revision_summary-age_agg-updates branch January 24, 2025 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants