Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent ggplot2 from reversing the order of categories in horizontal legends. #5368

Closed
ELICHOS opened this issue Jul 27, 2023 · 11 comments
Closed

Comments

@ELICHOS
Copy link

ELICHOS commented Jul 27, 2023

Hello,
I have noticed that ggplot2 reverses the order of categories when the legend is horizontal (e.g., when legend.position = "top", see repex below). This has been raised as a previous issue (#4715), but I find it quite bothersome in practice (as I frequently need to modify the order of categories in the legend using guide_legend()).

Moreover, I believe that the order of categories in the legend should follow a coherent pattern with the order of categories in the graph, just as it does for vertical legends. This would make more sense.

What do you think about this?
I feel that this should not cause too significant a disruption for ggplot2 users.

# repex : 

iris%>%
  group_by(Species)%>%
  count()%>%
  ggplot(aes(x = n, y = "y", fill = Species))+
  geom_col(position = position_stack())+
  theme(legend.position = "top") 
  

image

@smouksassi
Copy link

you can reverse the legend with guides(fill = guide_legend(reverse = TRUE))
ggplot cannot predict in what order your data will be if you are not using a factor x value that has the order you want

library(ggplot2)
library(dplyr)
library(forcats)
library(patchwork)

a <- iris%>%
       group_by(Species)%>%
       count()%>%
  ggplot(aes(x = Species, y = n, fill = Species))+
  geom_col()

b <- iris%>%
  group_by(Species)%>%
  count()%>%
  ggplot(aes(x = Species, y = n, fill = Species))+
  geom_col()+
  theme(legend.position = "top") 


c <- iris%>%
  group_by(Species)%>%
  count()%>%
  ggplot(aes(x = Species, y = n, fill = Species))+
  geom_col()+
  theme(legend.position = "bottom") +
  guides(fill = guide_legend(reverse = TRUE))


a / b / c

Created on 2023-07-27 with reprex v2.0.2

@ELICHOS
Copy link
Author

ELICHOS commented Jul 27, 2023

You're welcome, and precisely, that's the issue: manual adjustment (e.g., using guide_legend()) is required for horizontal legends, but not for vertical legends.

In fact, in the case of a vertical legend, ggplot2 actually infers the order of categories. In this situation, the order of categories remains the same in both the graph and the legend. It happens automatically, requiring no manual actions. There is a default order of categories, the same for plot and legend, even in case of characters variables.

iris%>%
  mutate(GROUPE = sample(size = nrow(iris), c("a", "b"), replace = T, ))%>%
  group_by(GROUPE, Species)%>%
  count()%>%
  ggplot(aes(x = GROUPE, y = n, fill = Species))+
  geom_col(position = position_stack())+
  theme(legend.position = "right")

image

However, when the legend is horizontal, the order of categories is no longer consistent between the legend and the graph. It's true even in case of a vertical plot with horizontal legend :

iris%>%
  mutate(GROUPE = sample(size = nrow(iris), c("a", "b"), replace = T, ))%>%
  group_by(GROUPE, Species)%>%
  count()%>%
  ggplot(aes(x = GROUPE, y = n, fill = Species))+
  geom_col(position = position_stack())+
  theme(legend.position = "top")
  

image

@smouksassi
Copy link

I would argue that order has human preferences like top to down , or left to right etc.
I dont see how the software can know what order you want rather it enables you to specify the order that you see appropirate for your plot and application.

for your last plot I read the legend left to right and the bar top to bottom and they match

@ELICHOS
Copy link
Author

ELICHOS commented Jul 27, 2023

In the last plot, I read the bar bottom to top, following the y-axis.

The issue is not that ggplot2 guesses the order of categories, but rather that it should provide the same order between the legend and the graph. Regardless of the specific order, they should match, like for vertical legends.

@teunbrand
Copy link
Collaborator

I sympathise with the idea, but to me it is not immediately obvious how this would interact with layers:

  • Without any position adjustments
  • Without any inherent order along the x or y axis
  • With groupings that don't perfectly match the aesthetic to be displayed
  • With other layers in the same plot

To me it seems clear that the legend should follow the order as determined by the scale, so if you want a particular order, you'd set the breaks or limits accordingly.

@ELICHOS
Copy link
Author

ELICHOS commented Jul 28, 2023

Okay, I'm not entirely sure about all the interactions and implications, sorry, but I am 100% in agreement with this, and it's the crucial point of this discussion: the legend should follow the order determined by the scale.

Currently, this is the case for vertical legends; they follow the order of categories along the y-axis. I believe that horizontal legends should adhere to the same rule: follow the order of categories. It would be simpler and more logical.

In the case of the first example provided in my initial message, the inversion is evident.

I want to emphasize that it's not about asking ggplot2 to guess an implicit order that would be the correct order; it's about having a consistent order between the graph and the legend.

@Ductmonkey
Copy link

Ductmonkey commented Aug 10, 2023

I don't think the root of your complaint is related to legend position. When vertical, the categories are arranged top to bottom (explicitly if already a factor or alphabetically if coerced into a factor by ggplot2), and when horizontal they are arranged left to right.

However, with a simplified version of your example, I'm also confused about why when given a factor the first level is furthest from the origin (e.g. y = 0). I would have expected the opposite. Here "setosa" is the first level in the factor but is positioned furthest from the axis. The figure and legend are aligned in this orientation because the axis has 0 at bottom and the legend has the first level at top. However, when the bar and legend get flipped horizontal, then they both start from left to right and become antiparallel which is the symptom.

library(ggplot2)

levels(iris$Species)
#> [1] "setosa"     "versicolor" "virginica"

iris |> 
  ggplot(aes("x", fill = Species)) +
  geom_bar(position = "stack")

Created on 2023-08-10 with reprex v2.0.2

@teunbrand
Copy link
Collaborator

So the key piece of documentation to this issue is this from ?position_stack:

position_fill() and position_stack() automatically stack values in reverse order of the group aesthetic, which for bar charts is usually defined by the fill aesthetic (the default group aesthetic is formed by the combination of all discrete aesthetics except for x and y). This default ensures that bar colours align with the default legend.

One can argue that the stacking should occur in the opposite direction when the aesthetic is flipped, but this brings about a larger issue where the visualisation isn't equivalent anymore when flipped, so I'm hesitant to change anything for this reason.

But these are all defaults that you can tweak. The defaults are sensible, but aren't perfect in every situation.

@ELICHOS
Copy link
Author

ELICHOS commented Aug 22, 2023

I agree with @Ductmonkey: the "setosa" category should be at the bottom, both in the legend and in the graph. We don't read a graph from top to bottom, but always from bottom to top, following the y-axis.

To me, the inversion of categories mentioned in ?position_stack is counterintuitive. I don't understand why it would be necessary to reverse the order of categories for it to be consistent with the default legend order. On the contrary, why shouldn't the legend order be determined by the order of categories?

I don't have a deep understanding of how ggplot functions work internally, but I wonder why, when we create a graph, there isn't a default ordered vector of categories defined from the start and then reused by all subsequent layers?

In ?position_stack, I see that it's suggested to use position_stack(reverse = TRUE) to produce a horizontal plot. Indeed, that's convenient and quicker than modifying the legend order. I will use this option from now on.

@Ductmonkey
Copy link

Thanks @teunbrand for highlighting that. Obviously a choice has to be made on how to display the order of the group aesthetic so it's just a question of what is more intuitive. FWIW - I've used ggplot2 a lot and never even noticed it so I think that's a very intuitive and sensible default. Given that most people are plotting vertical rather than horizontal and it's so easy to set reverse = TRUE, I'd argue that this doesn't merit a change to ggplot2.

@teunbrand
Copy link
Collaborator

I'm going to close this issue because the current default does a 'good enough' job and can be changed, whereas a change in the default might cause visual changes in people's plots, so I'll err on the conservative side of this issue.

@teunbrand teunbrand closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants