Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA drops out of the legend where a named pal is added to scale_colour_manual #5214

Closed
davidhodge931 opened this issue Mar 2, 2023 · 6 comments · Fixed by #5288
Closed

NA drops out of the legend where a named pal is added to scale_colour_manual #5214

davidhodge931 opened this issue Mar 2, 2023 · 6 comments · Fixed by #5288

Comments

@davidhodge931
Copy link

library(tidyverse)
library(palmerpenguins)

penguins |>
  ggplot() +
  geom_point(aes(
    x = bill_depth_mm, 
    y = body_mass_g, 
    col = sex
  )) +
  scale_color_manual(values = c("red", "blue"))
#> Warning: Removed 2 rows containing missing values (`geom_point()`).

penguins |>
  ggplot() +
  geom_point(aes(
    x = bill_depth_mm, 
    y = body_mass_g, 
    col = sex
  )) +
  scale_color_manual(values = c("male" = "red", "female" = "blue"))
#> Warning: Removed 2 rows containing missing values (`geom_point()`).

Created on 2023-03-03 with reprex v2.0.2

@linzi-sg
Copy link

linzi-sg commented Apr 2, 2023

I thought this is expected behaviour. Since no explicit breaks are specified in either case, scale limits are used to infer default breaks, and ```ggplot2:::manual_scale, which is called by scale_colour_manual` underneath the hood, defines limits differently when values are named:

> ggplot2:::manual_scale
function (aesthetic, values = NULL, breaks = waiver(), ..., limits = NULL) 
{
   # omitted for brevity
    if (is.null(limits) && !is.null(names(values))) {
        limits <- function(x) intersect(x, names(values)) %||%     
            character()
    }
   # omitted for brevity
    discrete_scale(aesthetic, "manual", pal, breaks = breaks, 
        limits = limits, ...)
}

If the values vector wasn't named, limits (& hence breaks) will be inferred from the full range of mapped values, in this case c("female", "male", "NA"). When the vector is named, only the names found in the vector will be retained, i.e. c("female", "male").

@teunbrand
Copy link
Collaborator

Indeed, named values imply populating the limits argument when it hasn't been set to something. So I think everything is working as intended.

@davidhodge931
Copy link
Author

But surely the above plot is not desirable where you end up with grey values in the plot, but nothing in the legend to tell you what those grey values represent?

@teunbrand
Copy link
Collaborator

Well, no that plot is not too desirable, but so would a plot where you set limits = c("male", "female"). If you want to allow for NA in the legend, you can set limits = identity in your second example to take the limits as-is instead of taking them from the values vector. The reason why the heuristic excludes levels that may be in the data but are not present as names in values is because of issues like #4511 and #4534.

@jeraldnoble
Copy link

I thought this is expected behaviour. Since no explicit breaks are specified in either case, scale limits are used to infer default breaks, and ```ggplot2:::manual_scale, which is called by scale_colour_manual` underneath the hood, defines limits differently when values are named:

> ggplot2:::manual_scale
function (aesthetic, values = NULL, breaks = waiver(), ..., limits = NULL) 
{
   # omitted for brevity
    if (is.null(limits) && !is.null(names(values))) {
        limits <- function(x) intersect(x, names(values)) %||%     
            character()
    }
   # omitted for brevity
    discrete_scale(aesthetic, "manual", pal, breaks = breaks, 
        limits = limits, ...)
}

If the values vector wasn't named, limits (& hence breaks) will be inferred from the full range of mapped values, in this case c("female", "male", "NA"). When the vector is named, only the names found in the vector will be retained, i.e. c("female", "male").

This behavior seems to have changed in ggplot2 3.5.1

if (is.null(limits) && !is.null(names(values))) {
    force(aesthetic)
    limits <- function(x) {
        x <- intersect(x, c(names(values), NA)) %||% character()
        if (length(x) < 1) {
            cli::cli_warn(paste0("No shared levels found between {.code names(values)} of the manual ",
              "scale and the data's {.field {aesthetic}} values."))
        }
        x
    }
}

I now get NA in my legend for points that are uncolored in my plot. I need these points in my plot.

@teunbrand
Copy link
Collaborator

@jeraldnoble if you think that is a bug, could you open a new issue with an example/reprex?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants