Skip to content

Commit

Permalink
Discarding boxplot outliers (#5379)
Browse files Browse the repository at this point in the history
* Add `outliers` param to boxplot

* Add test

* Redocument

* Add news bullet
  • Loading branch information
teunbrand committed Aug 7, 2023
1 parent 466344a commit df1b901
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 14 deletions.
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# ggplot2 (development version)

* `geom_boxplot()` gains an `outliers` argument to switch outliers on or off,
in a manner that does affects the scale range. For hiding outliers that does
not affect the scale range, you can continue to use `outlier.shape = NA`
(@teunbrand, #4892).

* Binned scales now treat `NA`s in limits the same way continuous scales do
(#5355).

Expand All @@ -9,6 +14,7 @@
deprecated. The `hjust` setting of the `legend.text` and `legend.title`
elements continues to fulfil the role of text alignment (@teunbrand, #5347).


* Integers are once again valid input to theme arguments that expect numeric
input (@teunbrand, #5369)

Expand Down
21 changes: 14 additions & 7 deletions R/geom-boxplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,19 +33,19 @@
#' @inheritParams geom_bar
#' @param geom,stat Use to override the default connection between
#' `geom_boxplot()` and `stat_boxplot()`.
#' @param outliers Whether to display (`TRUE`) or discard (`FALSE`) outliers
#' from the plot. Hiding or discarding outliers can be useful when, for
#' example, raw data points need to be displayed on top of the boxplot.
#' By discarding outliers, the axis limits will adapt to the box and whiskers
#' only, not the full data range. If outliers need to be hidden and the axes
#' needs to show the full data range, please use `outlier.shape = NA` instead.
#' @param outlier.colour,outlier.color,outlier.fill,outlier.shape,outlier.size,outlier.stroke,outlier.alpha
#' Default aesthetics for outliers. Set to `NULL` to inherit from the
#' aesthetics used for the box.
#'
#' In the unlikely event you specify both US and UK spellings of colour, the
#' US spelling will take precedence.
#'
#' Sometimes it can be useful to hide the outliers, for example when overlaying
#' the raw data points on top of the boxplot. Hiding the outliers can be achieved
#' by setting `outlier.shape = NA`. Importantly, this does not remove the outliers,
#' it only hides them, so the range calculated for the y-axis will be the
#' same with outliers shown and outliers hidden.
#'
#' @param notch If `FALSE` (default) make a standard box plot. If
#' `TRUE`, make a notched box plot. Notches are used to compare groups;
#' if the notches of two boxes do not overlap, this suggests that the medians
Expand Down Expand Up @@ -109,6 +109,7 @@
geom_boxplot <- function(mapping = NULL, data = NULL,
stat = "boxplot", position = "dodge2",
...,
outliers = TRUE,
outlier.colour = NULL,
outlier.color = NULL,
outlier.fill = NULL,
Expand All @@ -133,6 +134,7 @@ geom_boxplot <- function(mapping = NULL, data = NULL,
position$preserve <- "single"
}
}
check_bool(outliers)

layer(
data = data,
Expand All @@ -143,6 +145,7 @@ geom_boxplot <- function(mapping = NULL, data = NULL,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list2(
outliers = outliers,
outlier.colour = outlier.color %||% outlier.colour,
outlier.fill = outlier.fill,
outlier.shape = outlier.shape,
Expand All @@ -167,7 +170,7 @@ GeomBoxplot <- ggproto("GeomBoxplot", Geom,

# need to declare `width` here in case this geom is used with a stat that
# doesn't have a `width` parameter (e.g., `stat_identity`).
extra_params = c("na.rm", "width", "orientation"),
extra_params = c("na.rm", "width", "orientation", "outliers"),

setup_params = function(data, params) {
params$flipped_aes <- has_flipped_aes(data, params)
Expand All @@ -180,6 +183,10 @@ GeomBoxplot <- ggproto("GeomBoxplot", Geom,
data$width <- data$width %||%
params$width %||% (resolution(data$x, FALSE) * 0.9)

if (isFALSE(params$outliers)) {
data$outliers <- NULL
}

if (!is.null(data$outliers)) {
suppressWarnings({
out_min <- vapply(data$outliers, min, numeric(1))
Expand Down
16 changes: 9 additions & 7 deletions man/geom_boxplot.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions tests/testthat/test-geom-boxplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@ test_that("geom_boxplot range includes all outliers", {

expect_true(miny <= min(dat$y))
expect_true(maxy >= max(dat$y))

# Unless specifically directed not to
p <- ggplot_build(ggplot(dat, aes(x, y)) + geom_boxplot(outliers = FALSE))

miny <- p$layout$panel_params[[1]]$y.range[1]
maxy <- p$layout$panel_params[[1]]$y.range[2]

expect_lte(maxy, max(dat$y))
expect_gte(miny, min(dat$y))
})

test_that("geom_boxplot works in both directions", {
Expand Down

0 comments on commit df1b901

Please sign in to comment.