Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[write] table column names have to be characters #1071

Merged
merged 1 commit into from
Jul 2, 2024
Merged

Conversation

JanMarvin
Copy link
Owner

@JanMarvin JanMarvin commented Jul 2, 2024

The condition is somehow triggered by factor variables. Our logic tries to convert this to strings, but it somehow goes a little further and tries to write column headers as numbers too. This breaks in a table, because table headers require character type colnames.

@JanMarvin JanMarvin force-pushed the fix_table_colname branch 2 times, most recently from e7b2c74 to 6ed4340 Compare July 2, 2024 21:25
@@ -524,8 +526,13 @@ void wide_to_long(
if (ref_str.compare("0") == 0)
ref_str = col + row;

// factors can be numeric or string or both
if (vtyp == factor) string_nums = true;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we never reset string_nums when a vtype factor was present.

@JanMarvin
Copy link
Owner Author

Example from this SO

library(openxlsx2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# workbook for output
wb2 <- wb_workbook()$add_worksheet("PERS35")

# fl <- "https://duo.nl/open_onderwijsdata/images/02.-onderwijspersoneel-po-in-fte-2011-2023.xlsx"
fl <- "~/Downloads/02.-onderwijspersoneel-po-in-fte-2011-2023.xlsx"

# maybe this sheet?
df_fte <- wb_to_df(fl, sheet = "owtype-best-instelling-functie") %>% 
  mutate(across(where(is.character), ~ na_if(., "*")))

# some data wrangling. Throws a warning ... ???
df_fte <- df_fte %>% 
  mutate_at(vars(2:274), as.numeric)%>% 
  select(1:222) %>% 
  tidyr::pivot_longer(cols = matches("\\d{4}$"), names_to = "JAAR_", values_to = "AANTAL") %>%
  mutate(CATEGORIE = stringr::str_sub(JAAR_, end=-5)) %>% 
  mutate(JAAR = stringr::str_sub(JAAR_, start=-4)) %>% 
  group_by(ONDERWIJSTYPE, CATEGORIE, JAAR) %>% 
  summarise(AANTAL = sum(AANTAL, na.rm=TRUE))
#> Warning: There were 2 warnings in `mutate()`.
#> The first warning was:
#> ℹ In argument: `INSTELLINGSCODE = .Primitive("as.double")(INSTELLINGSCODE)`.
#> Caused by warning:
#> ! NAs introduced by coercion
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> `summarise()` has grouped output by 'ONDERWIJSTYPE', 'CATEGORIE'. You can
#> override using the `.groups` argument.

# some more data wrangling
df_pers35 <- 
  df_fte %>% 
  mutate(CATEGORIE=
           factor(
             case_when(
               (CATEGORIE=="FTE'S PERSONEN JONGER DAN 15 JAAR "| 
                  CATEGORIE=="FTE'S PERSONEN 15 - 25 JAAR ")           ~ "JONGER DAN 25 JAAR",
               CATEGORIE=="FTE'S PERSONEN 25 - 35 JAAR "            ~ "25 - 35 JAAR",
               CATEGORIE=="FTE'S PERSONEN 35 - 45 JAAR "            ~ "35 - 45 JAAR",
               CATEGORIE=="FTE'S PERSONEN 45 - 55 JAAR "            ~ "45 - 55 JAAR",
               CATEGORIE=="FTE'S PERSONEN 55 - 65 JAAR "            ~ "55 - 65 JAAR",
               CATEGORIE=="FTE'S PERSONEN 65 JAAR EN OUDER "        ~ "65 JAAR EN OUDER"), 
             levels=c("JONGER DAN 25 JAAR", "25 - 35 JAAR", "35 - 45 JAAR", "45 - 55 JAAR", "55 - 65 JAAR", "65 JAAR EN OUDER")
           )
  ) %>% 
  filter(CATEGORIE=="JONGER DAN 25 JAAR"|
           CATEGORIE=="25 - 35 JAAR"|
           CATEGORIE=="35 - 45 JAAR"|
           CATEGORIE=="45 - 55 JAAR"|
           CATEGORIE=="55 - 65 JAAR"|
           CATEGORIE=="65 JAAR EN OUDER") %>%
  group_by(CATEGORIE, JAAR) %>% 
  summarise(aantal=round(sum(AANTAL, na.rm=TRUE),1), .groups = 'drop') %>% 
  tidyr::spread(JAAR, aantal)

if (is.null(wb_to_df(wb2, sheet = "PERS35", dims = wb_dims(cols = "B", rows = 2)))){
  wb2 <- wb_add_data_table(
    wb = wb2,
    x = df_pers35,
    dims = "B2",
    banded_rows = TRUE,
    table_style = "TableStyleLight16"
  ) %>% 
    wb_add_fill(sheet = "PERS35", dims = "B2:B8", color = wb_color("green"))%>% 
    wb_add_fill(sheet = "PERS35", dims = "K2:O8", color = wb_color("green"))
}
#> sheet found, but contains no data

if (interactive()) wb2$open()

@JanMarvin
Copy link
Owner Author

I merge this PR, but maybe if you want to do tests prior to the release, handling factors and string_nums option might be something to toy around with, @olivroy . I tend to avoid factors wherever I can, therefore probably this one slipped through.

@JanMarvin JanMarvin merged commit e639220 into main Jul 2, 2024
9 checks passed
@JanMarvin JanMarvin deleted the fix_table_colname branch July 2, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant