Skip to content

Commit

Permalink
Merge pull request #158 from atorus-research/gh_issue_152
Browse files Browse the repository at this point in the history
Fix `Inf` and `-Inf` results from `min` and `max` in descriptive statistic layers.
  • Loading branch information
asbates authored Dec 14, 2023
2 parents 8dfda81 + 8a9546d commit 8aa428f
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 3 deletions.
7 changes: 5 additions & 2 deletions R/desc.R
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,9 @@ process_formatting.desc_layer <- function(x, ...) {
env_get(x, "formatted_data")
}

# Small helper function to help with builtins
inf_to_na <- function(x) if_else(is.infinite(x), NA, x)

#' Get the summaries to be passed forward into \code{dplyr::summarize()}
#'
#' @param e the environment summaries are stored in.
Expand All @@ -203,8 +206,8 @@ get_summaries <- function(e = caller_env()) {
sd = sd(.var, na.rm=TRUE),
median = median(.var, na.rm=TRUE),
var = var(.var, na.rm=TRUE),
min = min(.var, na.rm=TRUE),
max = max(.var, na.rm=TRUE),
min = inf_to_na(min(.var, na.rm=TRUE)),
max = inf_to_na(max(.var, na.rm=TRUE)),
iqr = IQR(.var, na.rm=TRUE, type=getOption('tplyr.quantile_type')),
q1 = quantile(.var, na.rm=TRUE, type=getOption('tplyr.quantile_type'))[[2]],
q3 = quantile(.var, na.rm=TRUE, type=getOption('tplyr.quantile_type'))[[4]],
Expand Down
21 changes: 21 additions & 0 deletions tests/testthat/test-desc.R
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,24 @@ test_that("Stats as columns properly transposes the built data", {
expect_snapshot(as.data.frame(d2))

})

test_that("Infinites aren't produced from min/max", {
dat <- tibble::tribble(
~x1, ~x2,
'a', 1,
'a', 2,
'b', NA,
)

t <- tplyr_table(dat, x1) %>%
add_layer(
group_desc(x2) %>%
set_format_strings(
"Min, Max" = f_str("xx, xx", min, max)
)
)

x <- suppressWarnings(build(t))

expect_equal(x$var1_b, "")
})
4 changes: 3 additions & 1 deletion vignettes/desc.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,9 @@ x %>%

### Notes About Built-in's

Note that the only non-default option being used in any of the function calls above is `na.rm=TRUE`. For most of the functions, this is likely fine - but with IQR, Q1, and Q3 note that there are several different quantile algorithms available in R. The default we chose to use is the R default of Type 7:
Note that the only non-default option being used in any of the function calls above is `na.rm=TRUE`. It's important to note that for `min` and `max`, when `na.rm=TRUE` is used with a vector that is all `NA`, these functions return `Inf` and `-Inf` respectively. When formatting the numbers, this is unexpected and also inconsistent with how other descriptive statistic functions, which return `NA`. Therefore, just for `min` and `max`, `Inf`'s are converted to `NA` so that they'll align with the behavior of the `empty` parameter in `f_str()`.

Using default settings of most descriptive statistic functions is typically fine, but with IQR, Q1, and Q3 note that there are several different quantile algorithms available in R. The default we chose to use is the R default of Type 7:

$$
m = 1-p. p[k] = (k - 1) / (n - 1). \textrm{In this case, } p[k] = mode[F(x[k])]. \textrm{This is used by S.}
Expand Down

0 comments on commit 8aa428f

Please sign in to comment.