Skip to content

Digory and his Uncle Are Both in Trouble

Compare
Choose a tag to compare
@njtierney njtierney released this 06 Jun 14:03
· 397 commits to master since this release

New Features

  • Added all_miss() / all_na() equivalent to all(is.na(x))

  • Added any_complete() equivalent to all(complete.cases(x))

  • Added any_miss() equivalent to anyNA(x)

  • Added common_na_numbers and finalised common_na_strings - to provide a
    list of commonly used NA values
    #168

  • Added miss_var_which, to lists the variable names with missings

  • Added as_shadow_upset which gets the data into a format suitable for
    plotting as an UpSetR plot:

    airquality %>%
      as_shadow_upset() %>%
      UpSetR::upset()
  • Added some imputation functions to assist with exploring missingness
    structure and visualisation:

    • impute_below Perfoms as for shadow_shift, but performs on all columns.
      This means that it imputes missing values 10% below the range of the
      data (powered by shadow_shift), to facilitate graphical exloration of
      the data. Closes #145
      There are also scoped variants that work for specific named columns:
      impute_below_at, and for columns that satisfy some predicate function:
      impute_below_if.
    • impute_mean, imputes the mean value, and scoped variants
      impute_mean_at, and impute_mean_if.
  • impute_below and shadow_shift gain arguments prop_below and jitter
    to control the degree of shift, and also the extent of jitter.

  • Added complete_{case/var}_{pct/prop}, which complement
    miss_{var/case}_{pct/prop}
    #150

  • Added unbind_shadow and unbind_data as helpers to remove shadow columns
    from data, and data from shadows, respectively.

  • Added is_shadow and are_shadow to determine if something contains a
    shadow column. simimlar to rlang::is_na and rland::are_na, is_shadow
    this returns a logical vector of length 1, and are_shadow returns a logical
    vector of length of the number of names of a data.frame. This might be
    revisited at a later point (see any_shade in add_label_shadow).

  • Aesthetics now map as expected in geom_miss_point(). This means you can write
    things like geom_miss_point(aes(colour = Month)) and it works appropriately.
    Fixed by Luke Smith in Pull request
    #144, fixing
    #137.

Minor Changes

  • miss_var_summary and miss_case_summary now return use order = TRUE by
    default, so cases and variables with the most missings are presented in
    descending order. Fixes #163

  • Changes for Visualisation:

    • Changed the default colours used in gg_miss_case and gg_miss_var to
      lorikeet purple (from ochRe package: https://github.com/ropenscilabs/ochRe)
    • gg_miss_case
      • The y axis label is now ...
      • Default presentation is with order_cases = TRUE.
      • Gains a show_pct option to be consistent with gg_miss_var
        #153
    • gg_miss_which is rotated 90 degrees so it is easier to read variable names
    • gg_miss_fct uses a minimal theme and tilts the axis labels
      #118.
  • imported is_na and are_na from rlang.

  • Added common_na_strings, a list of common NA values
    #168.

  • Added some detail on alternative methods for replacing with NA in the
    vignette "replacing values with NA".