Skip to content

Releases: LudvigOlsen/groupdata2

groupdata 2.0.3

18 Jun 12:58
Compare
Choose a tag to compare
  • Fixes some warnings.

  • Fixes rounding error issue on PowerPC (#10). Thanks @barracuda156.

groupdata2 2.0.2

24 Nov 10:59
Compare
Choose a tag to compare
  • Makes use of suggested packages conditional.
  • Makes testing conditional on the availability of xpectr.
  • Fixes tidyselect-related warnings.
  • Removes hydroGOF from suggested packages.

groupdata2 2.0.1

28 Aug 16:15
Compare
Choose a tag to compare
  • Regenerates documentation.

groupdata 2.0.0

24 Oct 15:30
Compare
Choose a tag to compare

Summary

This version introduces collapse_groups() and friends, as well as summarize_balances() and ranked_balances(). It also improves numerical balancing in fold() which breaks reproducibility.

Changes

  • Breaking: The numerical balancing (num_col) in fold() gets multiple improvements. This breaks reproducibility in some contexts.

    • Fixes bug with selection of groups to redistribute when extreme_pairing_levels > 1. The groupings were likely to be fine, but the fix should give better groupings on average.

    • When possible, it redistributes the smallest and/or largest group if they are 1 standard deviation from the second smallest/largest group to avoid imbalances due to very small/large scores.

    • Adds use of extreme triplet grouping when too few grouping columns are created with extreme pairing. This can lead to an increase in the number of created fold columns. In some cases, these groupings may be more balanced than with extreme pairing, but on average extreme pairing leads to more balanced groupings. See rearrr::triplet_extremes() for more on extreme triplet grouping.

    • Adds argument use_of_triplets in fold() to allow using extreme triplet grouping instead of extreme pairing or disabling it completely.

  • Adds collapse_groups() for collapsing a set of existing groups into a smaller set of groups. Can balance the
    new groups by size and by numeric, categorical and ID columns. The more of these you balance at a time, the less balanced each will tend to be. Compare settings by summarizing the balances with summarize_balances() afterwards. For creating the most balanced groups, enable auto_tune.

  • Adds collapse_groups_by_size(), collapse_groups_by_numeric(), collapse_groups_by_levels(), and collapse_groups_by_ids(). These are wrappers of collapse_groups() for a simplified interface.

  • Adds summarize_balances() for inspecting the balance of numeric, categorical, and ID columns in-and-between groups.

  • Adds ranked_balances() for extracting the across-group standard deviations of balances from the output of summarize_balances(). The standard deviations are a measure of how balanced a split is.

  • Adds "every" method to grouping functions. Groups every n data points together.

  • Prepares package's tests for checkmate 2.1.0.

groupdata2 1.5.0

03 Jul 13:03
Compare
Choose a tag to compare
  • Breaking: Rewrites large parts of the numerical balancing engine used in fold() and partition(). This produces different groups in some cases. Outsources extreme pairing to rearrr::pair_extremes(). Now uses hierarchical shuffling (rearrr::shuffle_hierarchy()) in partition() and some cases of fold() (relevant when extreme_pairing_levels > 1).
    If you need reproducibility, the last version prior to this breaking change can be installed with devtools::install_github("ludvigolsen/groupdata2@v1.4.2").

  • Imports rearrr for use in numerical balancing.

  • Minor improvements to vignettes.

groupdata2 1.4.2

19 Jun 20:03
Compare
Choose a tag to compare
  • Improves documentation for core grouping functions.

groupdata2 1.4.1

06 Mar 20:58
Compare
Choose a tag to compare
  • Adds summarize_group_cols() for finding the number of groups per fold column along with statistics about the number of rows per group.

  • Breaking: Fixes internal sorting of fold columns. This sometimes changes the order of fold columns, compared to the previous version.

  • Adds tidyr as a required dependency. Previously, it was suggested.

groupdata2 1.4.0

20 Feb 16:11
Compare
Choose a tag to compare
  • Breaking: In fold(), the k argument can now be a multi-element vector with one k (number of folds) per fold column. This functionality required a minor rewrite, why you might see interchanged fold column names in comparison to the previous versions.

  • Bug fix: In fold() and partition(), when specifying multiple cat_col columns and num_col in the same call, it would fail. This now works.

groupdata2 1.3.0

15 Jun 15:55
Compare
Choose a tag to compare
  • Breaking: The following functions now work with grouped data.frames (meaning that they are applied group-wise): fold(), partition(), group(), group_factor(), splt(), balance(), upsample(), downsample(), differs_from_previous(), and find_missing_starts(). A message is generated once per session, when the input is grouped, to help users understand why their code is breaking.

groupdata2 1.2.1

06 Jun 18:42
Compare
Choose a tag to compare
  • checkmate compatibility.

  • Small speed up of n_dist grouping method.