collapse version 2.0.4
-
In
fnth()/fquantile()
, there has been a slight change to the weighted quantile algorithm. As outlined in the documentation, this algorithm gives weighted versions for all continuous quantile methods (type 7-9) in R by replacing sample quantities with their weighted counterparts. E.g., for the default quantile type 7, the continuous (lower) target element is(n - 1) * p
. In the weighted algorithm, this became(sum(w) - mean(w)) * p
and was compared to the cumulative sum of ordered (byx
) weights, to preserve equivalence of the algorithms in cases where the weights are all equal. However, upon a second thought, the use ofmean(w)
does not really reflect a standard interpretation of the weights as frequencies. I have reasoned that usingmin(w)
instead ofmean(w)
better reflects such an interpretation, as the minimum (non-zero) weight reflects the size of the smallest sampled unit. So the weighted quantile type 7 target is now(sum(w) - min(w)) * p
, and also the other methods have been adjusted accordingly (note that zero weight observations are ignored in the algorithm). -
This is more a Note than a change to the package: there is an issue with vctrs that users can encounter using collapse together with the tidyverse (especially ggplot2), which is that collapse internally optimizes computations on factors by giving them an additional
"na.included"
class if they are known to not contain any missing values. For examplepivot(mtcars)
gives a"variable"
factor which has classc("factor", "na.included")
, such that grouping on"variable"
in subsequent operations is faster. Unfortunately,pivot(mtcars) |> ggplot(aes(y = value)) + geom_histogram() + facet_wrap( ~ variable)
currently gives an error produced by vctrs, because vctrs does not implement a standard S3 method dispatch and thus does not ignore the"na.included"
class. It turns out that the only way for me to deal with this is would be to swap the order of classes i.e.c("na.included", "factor")
, import vctrs, and implementvec_ptype2
andvec_cast
methods for"na.included"
objects. This will never happen, as collapse is and will remain independent of the tidyverse. There are two ways you can deal with this: The first way is to remove the"na.included"
class for ggplot2 e.g.facet_wrap( ~ set_class(variable, "factor"))
or
facet_wrap( ~ factor(variable))
will both work. The second option is to define a functionvec_ptype2.factor.factor <- function(x, y, ...) x
in your global environment, which avoids vctrs performing extra checks on factor objects.