Skip to content

collapse version 2.0.4

Compare
Choose a tag to compare
@SebKrantz SebKrantz released this 31 Oct 13:01
· 409 commits to master since this release
  • In fnth()/fquantile(), there has been a slight change to the weighted quantile algorithm. As outlined in the documentation, this algorithm gives weighted versions for all continuous quantile methods (type 7-9) in R by replacing sample quantities with their weighted counterparts. E.g., for the default quantile type 7, the continuous (lower) target element is (n - 1) * p. In the weighted algorithm, this became (sum(w) - mean(w)) * p and was compared to the cumulative sum of ordered (by x) weights, to preserve equivalence of the algorithms in cases where the weights are all equal. However, upon a second thought, the use of mean(w) does not really reflect a standard interpretation of the weights as frequencies. I have reasoned that using min(w) instead of mean(w) better reflects such an interpretation, as the minimum (non-zero) weight reflects the size of the smallest sampled unit. So the weighted quantile type 7 target is now (sum(w) - min(w)) * p, and also the other methods have been adjusted accordingly (note that zero weight observations are ignored in the algorithm).

  • This is more a Note than a change to the package: there is an issue with vctrs that users can encounter using collapse together with the tidyverse (especially ggplot2), which is that collapse internally optimizes computations on factors by giving them an additional "na.included" class if they are known to not contain any missing values. For example pivot(mtcars) gives a "variable" factor which has class c("factor", "na.included"), such that grouping on "variable" in subsequent operations is faster. Unfortunately, pivot(mtcars) |> ggplot(aes(y = value)) + geom_histogram() + facet_wrap( ~ variable) currently gives an error produced by vctrs, because vctrs does not implement a standard S3 method dispatch and thus does not ignore the "na.included" class. It turns out that the only way for me to deal with this is would be to swap the order of classes i.e. c("na.included", "factor"), import vctrs, and implement vec_ptype2 and vec_cast methods for "na.included" objects. This will never happen, as collapse is and will remain independent of the tidyverse. There are two ways you can deal with this: The first way is to remove the "na.included" class for ggplot2 e.g. facet_wrap( ~ set_class(variable, "factor")) or
    facet_wrap( ~ factor(variable)) will both work. The second option is to define a function vec_ptype2.factor.factor <- function(x, y, ...) x in your global environment, which avoids vctrs performing extra checks on factor objects.