Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when a group is all infinite #745

Open
lazappi opened this issue Jul 22, 2024 · 2 comments
Open

Error when a group is all infinite #745

lazappi opened this issue Jul 22, 2024 · 2 comments

Comments

@lazappi
Copy link

lazappi commented Jul 22, 2024

Hi

I just came across this error when grouping results in a variable having only infinite values. It seems to be coming from a {glue} call but I suspect the root error may be something to do with calculating statistics and then processing that error message.

library(dplyr)
library(skimr)

df <- data.frame(group = c("A", "B"), a = c(1, Inf))

# Skimming the whole data frame works (with a  warning)
skim(df)
#> Warning: There was 1 warning in `dplyr::summarize()`.
#> ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
#>   mangled_skimmers$funs)`.
#> ℹ In group 0: .
#> Caused by warning:
#> ! There was 1 warning in `dplyr::summarize()`.
#> ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
#>   mangled_skimmers$funs)`.
#> Caused by warning in `inline_hist()`:
#> ! Variable contains Inf or -Inf value(s) that were converted to NA.
Name df
Number of rows 2
Number of columns 2
_______________________
Column type frequency:
character 1
numeric 1
________________________
Group variables None

Data summary

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
group 0 1 1 1 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
a 0 1 Inf NaN 1 Inf Inf Inf Inf ▁▁▇▁▁
# Grouping then skimming gives an error
df |> group_by(group) |> skim()
#> Error:
#> ! Failed to evaluate glue component {label}
#> Caused by error in `vapply()`:
#> ! values must be length 1,
#>  but FUN(X[[1]]) result is length 0

Created on 2024-07-22 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       macOS Sonoma 14.5
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2024-07-22
#>  pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version date (UTC) lib source
#>  P base64enc     0.1-3   2015-07-28 [?] CRAN (R 4.4.0)
#>  P cli           3.6.3   2024-06-21 [?] CRAN (R 4.4.0)
#>  P digest        0.6.36  2024-06-23 [?] CRAN (R 4.4.0)
#>  P dplyr       * 1.1.4   2023-11-17 [?] CRAN (R 4.4.0)
#>  P evaluate      0.24.0  2024-06-10 [?] CRAN (R 4.4.0)
#>  P fansi         1.0.6   2023-12-08 [?] CRAN (R 4.4.0)
#>  P fastmap       1.2.0   2024-05-15 [?] CRAN (R 4.4.0)
#>  P fs            1.6.4   2024-04-25 [?] CRAN (R 4.4.0)
#>  P generics      0.1.3   2022-07-05 [?] CRAN (R 4.4.0)
#>  P glue          1.7.0   2024-01-09 [?] CRAN (R 4.4.0)
#>  P htmltools     0.5.8.1 2024-04-04 [?] CRAN (R 4.4.0)
#>  P jsonlite      1.8.8   2023-12-04 [?] CRAN (R 4.4.0)
#>  P knitr         1.48    2024-07-07 [?] CRAN (R 4.4.0)
#>  P lifecycle     1.0.4   2023-11-07 [?] CRAN (R 4.4.0)
#>  P magrittr      2.0.3   2022-03-30 [?] CRAN (R 4.4.0)
#>  P pillar        1.9.0   2023-03-22 [?] CRAN (R 4.4.0)
#>  P pkgconfig     2.0.3   2019-09-22 [?] CRAN (R 4.4.0)
#>  P purrr         1.0.2   2023-08-10 [?] CRAN (R 4.4.0)
#>  P R6            2.5.1   2021-08-19 [?] CRAN (R 4.4.0)
#>  P repr          1.1.7   2024-03-22 [?] CRAN (R 4.4.0)
#>  P reprex        2.1.1   2024-07-06 [?] CRAN (R 4.4.0)
#>  P rlang         1.1.4   2024-06-04 [?] CRAN (R 4.4.0)
#>  P rmarkdown     2.27    2024-05-17 [?] CRAN (R 4.4.0)
#>  P rstudioapi    0.16.0  2024-03-24 [?] CRAN (R 4.4.0)
#>  P sessioninfo   1.2.2   2021-12-06 [?] CRAN (R 4.4.0)
#>  P skimr       * 2.1.5   2022-12-23 [?] CRAN (R 4.4.0)
#>  P stringi       1.8.4   2024-05-06 [?] CRAN (R 4.4.0)
#>  P stringr       1.5.1   2023-11-14 [?] CRAN (R 4.4.0)
#>  P tibble        3.2.1   2023-03-20 [?] CRAN (R 4.4.0)
#>  P tidyr         1.3.1   2024-01-24 [?] CRAN (R 4.4.0)
#>  P tidyselect    1.2.1   2024-03-11 [?] CRAN (R 4.4.0)
#>  P utf8          1.2.4   2023-10-22 [?] CRAN (R 4.4.0)
#>  P vctrs         0.6.5   2023-12-01 [?] CRAN (R 4.4.0)
#>  P withr         3.0.0   2024-01-16 [?] CRAN (R 4.4.0)
#>  P xfun          0.45    2024-06-16 [?] CRAN (R 4.4.0)
#>  P yaml          2.3.9   2024-07-05 [?] CRAN (R 4.4.0)
#> 
#>  [1] /Users/luke.zappia/Documents/Projects/20240710-NewScreenings-MyeloidLuminex/renv/library/macos/R-4.4/aarch64-apple-darwin20
#>  [2] /Users/luke.zappia/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815
#>  [3] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#> 
#>  P ── Loaded and on-disk path mismatch.
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@davi-dmittelstadt
Copy link

Hi

Just adding that the same error also occurs for Date variables when groups only have missing observations (NA).

library(dplyr)
library(skimr)

df <- data.frame(
  group = c("A", "B"), 
  b = c(Sys.Date(), NA))

skim(df)
Name df
Number of rows 2
Number of columns 2
_______________________
Column type frequency:
character 1
Date 1
________________________
Group variables None

Data summary

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
group 0 1 1 1 0 2 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
b 1 0.5 2024-11-07 2024-11-07 2024-11-07 1
df |> group_by(group) |> skim()
#> Error in vapply(.x, .f, .mold, ..., USE.NAMES = FALSE): values must be length 1,
#>  but FUN(X[[1]]) result is length 0

Created on 2024-11-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31 ucrt)
#>  os       Windows 10 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.utf8
#>  ctype    English_United Kingdom.utf8
#>  tz       America/Chicago
#>  date     2024-11-07
#>  pandoc   3.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 4.2.2)
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.2.2)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
#>  highr         0.10    2022-12-22 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
#>  jsonlite      1.8.4   2022-12-06 [1] CRAN (R 4.2.2)
#>  knitr         1.41    2022-11-18 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
#>  repr          1.1.7   2024-03-22 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
#>  skimr       * 2.1.5   2022-12-23 [1] CRAN (R 4.2.2)
#>  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
#>  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.2.2)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.2.2)
#>  tidyr         1.3.1   2024-01-24 [1] CRAN (R 4.2.2)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.2)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.2.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.36    2022-12-21 [1] CRAN (R 4.2.2)
#>  yaml          2.3.6   2022-10-18 [1] CRAN (R 4.2.2)
#> 
#>  [1] C:/Users/david/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@elinw
Copy link
Collaborator

elinw commented Nov 26, 2024

Sorry for the long delay. I think we should definitely be handling this, similar to how we handle some other specific situations. THat is to say, we should fix the function where this is an issue. I want to look at other situations with is.NaN () == TRUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants