Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning plots: Implementation of drop = FALSE for position_dodge #3988

Closed
char4816 opened this issue May 6, 2020 · 8 comments
Closed

Aligning plots: Implementation of drop = FALSE for position_dodge #3988

char4816 opened this issue May 6, 2020 · 8 comments
Labels
feature a feature request or enhancement positions 🥇

Comments

@char4816
Copy link

char4816 commented May 6, 2020


As others have pointed out (#688), having a position_dodge option such as drop = FALSE would be really helpful for plots like this:

image

I know Hadley has given some helpful workaround suggestions (e.g. faceting) and has mentioned that this isn't within the scope of position_dodge but I am hoping that one more person raising the issue might help to make the change.

What's happening for some of my plots' x-categories is that there are only 2 data points (e.g. cases in Homozygous Alt in my plot), which is enough to generate a boxplot, but not a violin plot

geom_violin(alpha=0.4, position = position_dodge(width=0.9))

Because the 3rd green violin plot doesn't have a counterpart to "dodge", the violin plot is centered and therefore wider and not aligned with its corresponding boxplot which is quite sad to look at.

While faceting works to an extent:
image

It would look much cleaner for publication if I could do position = position_dodge(width=0.9), drop = FALSE)

I hope this suggestion is taken seriously and I really appreciate your help & time.

Chris

@thomasp85
Copy link
Member

@karawoo can you comment on the amount of work this would require, seeing that you were the last to dive head-first into dodging 🙂

@thomasp85 thomasp85 added feature a feature request or enhancement positions 🥇 labels Aug 31, 2020
@karawoo
Copy link
Member

karawoo commented Aug 31, 2020

In the dev version of ggplot2 I can't reproduce this specific issue; having only two observations still generates a violin:

library("ggplot2")

set.seed(5)
dat <- data.frame(x = rep(1, times = 5), y = rnorm(5, 1), z = c("A", "A", "A", "B", "B"))

ggplot(dat, aes(x = x, y = y, fill = z)) +
  geom_boxplot(position = position_dodge(width = 0.9)) +
  geom_violin(alpha = 0.5)

Created on 2020-08-31 by the reprex package (v0.3.0.9001)

That said, this is a similar question to others that have come up before (#3022 (comment), #2076, #2813). The short answer is that having more control over placement of a geom within a group is not very easy. It may be possible, and it would be nice if we could improve the situation, but right now it's not straightforward.

@char4816
Copy link
Author

Thank you all for looking into this.

I was doing this on R version 3.6.2 using ggplot2 3.3.2. On these versions, your example code produces the following plot:

image

@clauswilke
Copy link
Member

I still think the fix in PR #2813 should be applied, by the way. The request here for a drop = FALSE is the same as the current option preserve = "single", I believe. And preserve = "single" can be made to work (somewhat) with violins using the code in #2813.

@karawoo
Copy link
Member

karawoo commented Aug 31, 2020

@char4816 can you see if you can reproduce my results with the development version of ggplot2? You can install it with remotes::install_github("tidyverse/ggplot2").

@clauswilke I'm not against merging #2813 since it would make the preserve = "single" behavior for violins consistent with other geoms. That still leaves the problem of everything getting moved to the left though, and I expect we'll keep getting variations on this issue until we can do something about that.

@clauswilke
Copy link
Member

@karawoo Agreed. I suspect a proper fix for position_dodge() would require some sort of aesthetic mapping and a scale, and I haven't been able to come up with a reasonable way to implement this.

@char4816
Copy link
Author

Using the development version of ggplot2, that example works great now. However, the overarching issue is bigger than just violin plots (and whether or not two points is enough to create the violin plot). For example, if we wanted to plot the following dataframe with boxplots,

  cat ind values
1   A   x      4
2   A   x      2
3   B   x     NA
4   B   x     NA
5   A   y      4
6   A   y      5
7   B   y      6
8   B   y      9

the two missing values will make the dodging look bad. Here is some reproducible code:

data <- data.frame(
  cat=c('A','A','B','B','A','A','B','B'), 
  ind=c('x','x','x','x','y','y','y','y'),
  values=c(4,2,NA,NA,4,5,6,9)
)

p  <- ggplot() +
  scale_colour_hue(guide='none') +
  geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
               position=position_dodge(width=.90), 
               data=data,
               outlier.size = 1.2,
               na.rm=T)

print(p) #produces undesired result where the second blue boxplot spans the full width

image

The best solution I have seen so far for this problem is to make one of the NAs equal to zero so a box plot is generated, and then to cover up that boxplot... (not a very pretty solution).

data <- data.frame(
  cat=c('A','A','B','B','A','A','B','B'), 
  ind=c('x','x','x','x','y','y','y','y'),
  values=c(4,2,NA,0,4,5,6,9) #notice I changed the second NA to now be zero
  )

p  <- ggplot() +
  scale_colour_hue(guide='none') +
  geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
               position=position_dodge(width=.90), 
               data=data,
               outlier.size = 1.2,
               na.rm=T) +
  #Someone suggested this workaround where you use geom_line to cover up the unwanted boxplot with a thick white line
  geom_line(aes(x=x, y=y), 
            data=data.frame(x=c(0,3),y=c(0,0)), 
            size = 1.5, 
            col='white')
print(p) #this looks more like what we wanted

image

@teunbrand
Copy link
Collaborator

I'm closing this issue for the following reasons.
The example in #3988 (comment) can be fixed by setting preserve = 'single'. Granted, the box is not on the right hand side of the B-gridline, but that is unrelated to the issue of the box's width.

library(ggplot2)

data <- data.frame(
  cat=c('A','A','B','B','A','A','B','B'), 
  ind=c('x','x','x','x','y','y','y','y'),
  values=c(4,2,NA,NA,4,5,6,9)
)

ggplot(data) +
  geom_boxplot(
    aes(x=as.factor(cat), y=values, fill=ind),
    position=position_dodge(width=.90, preserve = 'single'),
    na.rm=T
  )

The original issue of misalignment between boxplots and violin plots with <2 observations can be mitigateed using drop = FALSE.

set.seed(5)
dat <- data.frame(x = rep(1, times = 4), y = rnorm(4, 1), z = c("A", "A", "A", "B"))

ggplot(dat, aes(x = x, y = y, fill = z)) +
  geom_boxplot(position = position_dodge(width = 0.9)) +
  geom_violin(alpha = 0.5, drop = FALSE)
#> Warning: Cannot compute density for groups with fewer than two datapoints.

Created on 2024-07-24 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement positions 🥇
Projects
None yet
Development

No branches or pull requests

5 participants