Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_boxplot with geom_jitter produces double dots #5388

Closed
frezza-metabolomics opened this issue Aug 11, 2023 · 3 comments
Closed

geom_boxplot with geom_jitter produces double dots #5388

frezza-metabolomics opened this issue Aug 11, 2023 · 3 comments

Comments

@frezza-metabolomics
Copy link

Tips for a helpful bug report:


I found a problem with ggplot2::geom_boxplot will create a second dot of different size but identical. not all values are being replicated

Here is the code to reproduce the bug:
this is my data

 data_box_sel
         1-methylnicotinamide Conditions
CM04_026              4097964   CM4N_DKO
CM04_027              5484845   CM4N_DKO
CM04_028              4131174   CM4N_DKO
CM04_029              6466447    CM4N_WT
CM04_030              5580998    CM4N_WT
CM04_031              4524057    CM4N_WT
CM04_032              6371849   CM4N_DKO
CM04_033              5440818    CM4N_WT
CM04_034              6050115   CM4N_DKO
CM04_035              5349607    CM4N_WT
CM04_036              5361647    CM4N_WT
CM04_037              6405902   CM4N_SKO
CM04_038             13378067   CM4N_SKO
CM04_039              7068394   CM4N_SKO
CM04_040              6495043   CM4N_SKO
CM04_041              6965436    CM4N_WT
CM04_042              4719839    CM4N_WT
CM04_043              3743428    CM4N_WT
CM04_044              3424181   CM4N_SKO
CM04_045              4871067   CM4N_TKO
CM04_046              4175500   CM4N_TKO
CM04_047              4153753   CM4N_TKO
CM04_048              3797616   CM4N_TKO
CM04_049              6204417   CM4N_TKO
CM04_050              7399935   CM4N_SKO

this is the code:

     mdatBOX2 <- reshape2::melt(data_box_sel)  ## convert to long format
mdatBOX2$value <- log10(mdatBOX2$value)
    mdatBOX2 <- mdatBOX2[order(mdatBOX2$Conditions),]
#the plot
      ggplot(mdatBOX2, aes(x=factor(Conditions), y=value, fill=Conditions)) +
          geom_boxplot() +
          #viridis::scale_fill_viridis(discrete = TRUE, alpha=0.6) +
          geom_jitter(shape=5, alpha=0.7)

notice the black circular point. those should not be there.
box

same if I add a violin in there:

     mdatBOX2 <- reshape2::melt(data_box_sel)  ## convert to long format
mdatBOX2$value <- log10(mdatBOX2$value)
    mdatBOX2 <- mdatBOX2[order(mdatBOX2$Conditions),]
#the plot
      ggplot(mdatBOX2, aes(x=factor(Conditions), y=value, fill=Conditions)) +
      ggplot2::geom_violin(size = 0.1)+
          geom_boxplot() +
          #viridis::scale_fill_viridis(discrete = TRUE, alpha=0.6) +
          geom_jitter(shape=5, alpha=0.7)

boxAndVIOLOIN

and my session info

sessionInfo()    
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggfortify_0.4.16    multtest_2.54.0     Biobase_2.58.0      BiocGenerics_0.44.0
 [5] readxl_1.4.2        lubridate_1.9.2     forcats_1.0.0       stringr_1.5.0      
 [9] dplyr_1.1.2         purrr_1.0.1         readr_2.1.4         tidyr_1.3.0        
[13] tibble_3.2.1        ggplot2_3.4.2       tidyverse_2.0.0    

loaded via a namespace (and not attached):
 [1] ggrepel_0.9.3          Rcpp_1.0.10            svglite_2.1.1          lattice_0.21-8        
 [5] gtools_3.9.4           utf8_1.2.3             R6_2.5.1               cellranger_1.1.0      
 [9] plyr_1.8.8             backports_1.4.1        stats4_4.2.2           pillar_1.9.0          
[13] rlang_1.1.0            qcc_2.7                rstudioapi_0.14        car_3.1-1             
[17] EnhancedVolcano_1.16.0 hash_2.2.6.2           Matrix_1.5-3           textshaping_0.3.6     
[21] labeling_0.4.2         splines_4.2.2          munsell_0.5.0          broom_1.0.4           
[25] compiler_4.2.2         systemfonts_1.0.4      pkgconfig_2.0.3        tidyselect_1.2.0      
[29] gridExtra_2.3          viridisLite_0.4.1      fansi_1.0.4            crayon_1.5.2          
[33] tzdb_0.3.0             withr_2.5.0            ggpubr_0.6.0           MASS_7.3-58.3         
[37] grid_4.2.2             gtable_0.3.3           lifecycle_1.0.3        factoextra_1.0.7      
[41] magrittr_2.0.3         scales_1.2.1           writexl_1.4.2          cli_3.6.0             
[45] stringi_1.7.12         carData_3.0-5          inflection_1.3.6       farver_2.1.1          
[49] ggsignif_0.6.4         reshape2_1.4.4         viridis_0.6.2          ragg_1.2.5            
[53] generics_0.1.3         vctrs_0.6.0            cowplot_1.1.1          tools_4.2.2           
[57] glue_1.6.2             hms_1.1.3              abind_1.4-5            parallel_4.2.2        
[61] survival_3.4-0         timechange_0.2.0       colorspace_2.1-0       ggConvexHull_0.1.0    
[65] rstatix_0.7.2      
@teunbrand
Copy link
Collaborator

Does it work as intended when setting geom_boxplot(outlier.shape = NA)?

@frezza-metabolomics
Copy link
Author

frezza-metabolomics commented Aug 14, 2023

Does it work as intended when setting geom_boxplot(outlier.shape = NA)?

Thank you for your recommendation.This one solves the issue. So, in my understanding geom_boxplot() will not plot extreme outliers, and to properly plot them I will need to manually extend the yaxis to that point.

But, shouldn't this be the default state so that no artefacts being produced or give a warning?

@frezza-metabolomics frezza-metabolomics changed the title geom_boxplot with geom_jitter produces double lines geom_boxplot with geom_jitter produces double dots Aug 14, 2023
@teunbrand
Copy link
Collaborator

The boxplot will draw outliers by default, unless directed otherwise. If you use the outlier.shape = NA, you don't have to extend the y-axis to include the total range of the data by default. In the development version you can use outliers = FALSE to have the default axis range fit the summarised data instead of the full data.

In any case, I think everything is working as intended here, so I'm going to close this issue.

@teunbrand teunbrand closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants