Skip to content

New features and improvements

Compare
Choose a tag to compare
@LudvigOlsen LudvigOlsen released this 24 Jun 21:50
· 300 commits to master since this release
  • New main function: balance() used for up- and downsampling of data to balance sample size within categories and IDs.
    Thanks for the request from @jjesusfilho (#3).

  • New wrapper function: upsample() wraps balance() with size="max".

  • New wrapper function: downsample() wraps balance() with size="min".

  • Adds parameter "num_col" to fold() and partition() for balancing on a numerical column.

  • Adds parameter "id_aggregation_fn" to fold() and partition(). Used when balancing on both id_col and num_col.

  • Adds helper tool 'differs_from_previous'. Finds values in a vector that differs from the previous value by some threshold. Similar to find_starts().

  • Adds parameter "num_fold_cols" to fold(). Useful for creating multiple fold columns for repeated cross-validation.

  • Adds parameter "unique_fold_cols_only" to fold(). Whether to only include unique fold columns or not.

  • Adds parameter "max_iters" to fold(). How many times to attempt creating unique fold columns. Note that it is possible to get fewer fold columns than specified in "num_fold_cols".

  • Adds parameter "parallel" to fold(). When creating multiple unique fold columns, we can run the column comparisons in parallel. Requires registered parallel backend.

  • Adds parameter "handle_existing_fold_cols" to fold(). When calling fold() on a data frame that already contains columns with names starting with ".folds", we can either keep them and add more, or replace them.

  • Fixed behavior in fold() when k is a percentage (between 0-1). It is now interpreted as the approximate size of each fold and used to calculate the number of folds. E.g. k=0.2 will lead to 5 folds.