Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for Tables - Lab Shell #29

Open
Pierre-Wallet opened this issue Dec 6, 2021 · 9 comments
Open

Call for Tables - Lab Shell #29

Pierre-Wallet opened this issue Dec 6, 2021 · 9 comments

Comments

@Pierre-Wallet
Copy link

Hello all,
sharing here a template for lab output that is, according to me, gathering the most common difficulties in terms of formatting such as: one page per parameter, column spanners and cell merges.

image

I hope it helps!
Pierre.

@gmbecker
Copy link
Collaborator

gmbecker commented Dec 6, 2021

Hi @Pierre-Wallet this is great, one point of clarification. Can you tell us what the various percentages are out of?

For context in my work on rtables I've seen percentages that are out of column total, that are out of overall total, that are out of alternative column totals (ie column "totals" that do not reflect the data subsets corresponding to the actual columns), out of row group total, etc.

Thanks!

@Pierre-Wallet
Copy link
Author

Pierre-Wallet commented Dec 6, 2021

Hi @gmbecker,
baseline percentage is based on N.
Percentage for worst post-baseline value is based on Baseline n.

@elong0527
Copy link
Collaborator

elong0527 commented Jan 6, 2022

If the challenge is only for table format, that will be great if you could share reprex R code in preparing the data that's ready for output.

If we could all based on the same data (e.g. https://cran.r-project.org/web/packages/safetyData/index.html). That will be helpful for us to understand the challenge in format this table using different R packages

@hughjonesd
Copy link

hughjonesd commented Jan 12, 2022

I took a shot at this with huxtable.

library(huxtable)

set.seed(123)
dat <- data.frame(
  Treatment        = rep(c("A", "B"), 100),
  `Baseline Grade` = sample(c(0:4, NA), 200, replace = TRUE),
  PostGrade        = sample(c(0:4, NA), 200, replace = TRUE),
  check.names      =  FALSE
)

tbl <- xtabs( ~ PostGrade + `Baseline Grade` + Treatment, data = dat, addNA = TRUE)

ptbl <- proportions(tbl)
tbl <- addmargins(tbl, 1:3)
ptbl <- addmargins(ptbl, 1:3)
ptbl <- ptbl*100

joined <- tbl
joined[] <- paste0(tbl, " (", ptbl, ")")

output <- ftable(joined, col.vars = "PostGrade", row.vars = c("Treatment", "Baseline Grade"))

output <- as_hux(output)

# put 'Sum' column early, remove one empty column
output <- output[c(1:2, 10, 4:9)]
# tweak some headers
output[2, 1] <- ""
output[2, 2] <- ""
output[1, 3] <- ""

output |> 
      insert_row("", "Baseline", "", "Worst post-baseline value", fill = "") |> 
      set_colspan(1, 4, 6) |> 
      set_bottom_border(1, 4:9) |> 
      set_colspan(1, 2, 2) |> 
      set_align(1, everywhere, "center") |> 
      set_bold(1, everywhere) |> 
      insert_row("Treatment", "", fill = "n (%)", after = 2) |> 
      set_bottom_border(3, everywhere) |> 
      set_number_format(NA) |>
      map_contents(by_regex("NA" = "Missing")) 

which gives

                                Baseline                          Worst post-baseline value                                
                                              ─────────────────────────────────────────────────────────────────────────
                                                0           1           2           3           4           Missing    
          Treatment                 n (%)       n (%)       n (%)       n (%)       n (%)       n (%)       n (%)      
        ───────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                                                                       
          A           0             17 (8.5)    3 (1.5)     3 (1.5)     2 (1)       3 (1.5)     4 (2)       2 (1)      
                      1             16 (8)      3 (1.5)     3 (1.5)     5 (2.5)     2 (1)       3 (1.5)     0 (0)      
                      2             14 (7)      1 (0.5)     3 (1.5)     2 (1)       2 (1)       5 (2.5)     1 (0.5)    
                      3             16 (8)      3 (1.5)     0 (0)       3 (1.5)     5 (2.5)     3 (1.5)     2 (1)      
                      4             19 (9.5)    5 (2.5)     3 (1.5)     7 (3.5)     2 (1)       0 (0)       2 (1)      
                      Missing       18 (9)      3 (1.5)     2 (1)       1 (0.5)     6 (3)       4 (2)       2 (1)      
                      Sum           100 (50)    18 (9)      14 (7)      20 (10)     20 (10)     19 (9.5)    9 (4.5)    
          B           0             16 (8)      4 (2)       2 (1)       1 (0.5)     0 (0)       3 (1.5)     6 (3)      
                      1             15 (7.5)    5 (2.5)     3 (1.5)     2 (1)       0 (0)       3 (1.5)     2 (1)      
                      2             18 (9)      3 (1.5)     5 (2.5)     3 (1.5)     1 (0.5)     5 (2.5)     1 (0.5)    
                      3             12 (6)      6 (3)       3 (1.5)     0 (0)       1 (0.5)     1 (0.5)     1 (0.5)    
                      4             15 (7.5)    1 (0.5)     8 (4)       1 (0.5)     2 (1)       1 (0.5)     2 (1)      
                      Missing       24 (12)     3 (1.5)     6 (3)       1 (0.5)     5 (2.5)     6 (3)       3 (1.5)    
                      Sum           100 (50)    22 (11)     27 (13.5)   8 (4)       9 (4.5)     19 (9.5)    15 (7.5)   
          Sum         0             33 (16.5)   7 (3.5)     5 (2.5)     3 (1.5)     3 (1.5)     7 (3.5)     8 (4)      
                      1             31 (15.5)   8 (4)       6 (3)       7 (3.5)     2 (1)       6 (3)       2 (1)      
                      2             32 (16)     4 (2)       8 (4)       5 (2.5)     3 (1.5)     10 (5)      2 (1)      
                      3             28 (14)     9 (4.5)     3 (1.5)     3 (1.5)     6 (3)       4 (2)       3 (1.5)    
                      4             34 (17)     6 (3)       11 (5.5)    8 (4)       4 (2)       1 (0.5)     4 (2)      
                      Missing       42 (21)     6 (3)       8 (4)       2 (1)       11 (5.5)    10 (5)      5 (2.5)    
                      Sum           200 (100)   40 (20)     41 (20.5)   28 (14)     29 (14.5)   38 (19)     24 (12)    

Column names: V1, V2, V10, V4, V5, V6, V7, V8, V9   

Indeed, not easy.

  • Getting proportions and addmargins right was very hard. These two commands seem not to know about each other. It's easy to end up with a "grand total" of e.g. 600%. A single command to handle both might be useful.
  • The trick with setting the contents of joined[] is quite advanced R and most users shouldn't have to know about it. But without it there's no quick way to get a table of the right structure.
  • And huxtable is finicky... my bad. I wonder if tableone or similar might be useful here.

@elong0527
Copy link
Collaborator

Here is an example using r2rtf that allow you to display one parameter per page. (I simply copy the same number for different parameters for illustration purpose)

  • I just use default border type and font type in r2rtf. They can be changed if necessary.
  • github does not allow me to attach RTF file. So I just transfer it to PDF.

output file: tmp.pdf

library(dplyr)
library(tidyr)
library(r2rtf)

set.seed(123)
dat <- data.frame(
  Treatment        = rep(c("A", "B"), 100),
  `Baseline Grade` = sample(c(paste("Grade", 0:4), "Missing"), 200, replace = TRUE),
  PostGrade        = sample(c(paste("Grade", 0:4), "Missing"), 200, replace = TRUE),
  check.names      =  FALSE
)

tbl <- xtabs( ~ PostGrade + `Baseline Grade` + Treatment, data = dat, addNA = TRUE)
ptbl <- proportions(tbl)
tbl <- addmargins(tbl, 1:3)
ptbl <- addmargins(ptbl, 1:3)
ptbl <- ptbl*100 


t1 <- as.data.frame(tbl) %>%
  pivot_wider(id_cols = c("Treatment", "Baseline.Grade"), 
              names_from = "PostGrade", 
              values_from = "Freq")

t2 <- as.data.frame(ptbl) %>%
  mutate(Freq = paste0("(", formatC(Freq, digits = 1, width = 5, format = "f"), ")")) %>%
  pivot_wider(id_cols = c("Treatment", "Baseline.Grade"), 
              names_prefix = "pct_", 
              names_from = "PostGrade", 
              values_from = "Freq")

tbl0 <- merge(t1, t2) %>%
  rename(baseline = `Baseline.Grade`) %>%
  select(Treatment, baseline, ends_with("Sum"), ends_with(as.character(c(0:4))), ends_with("Missing"))

# Create multiple parameters
tbl1 <- tbl 
tbl1$parameter <- "Parameter A"

tbl2 <- tbl 
tbl2$parameter <- "Parameter B"

tbl <- rbind(tbl1, tbl2)



tbl %>% 
  rtf_page(orientation = "landscape", 
           col_width = 10,
           nrow = 30) %>%
  rtf_title("Shift Table Example") %>%
  rtf_colheader(c("| | Baseline | Worst post-baseline value"), 
                col_rel_width = c(4, 2, 2, 10)) %>%
  rtf_colheader("| Total | Grade 0 | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Missing", 
                border_top = c("", "",rep("single", 6)), 
                col_rel_width = c(4, rep(2, 7))) %>% 
  rtf_colheader("Treatment | Grade | n | (%) | n | (%) | n | (%) | n | (%) | n | (%) | n | (%) | n | (%)",
                col_rel_width = c(2, 2, rep(1, 14)), 
                border_left = c("single","single", rep(c("single", ""), 7))) %>%
  rtf_body(col_rel_width = c(2, 2, rep(1, 14), 1), 
           text_justification = c("l","l", rep("c", 14), "l"), 
           border_left = c("single","single", rep(c("single", ""), 7), "single"), 
           group_by = "Treatment", 
           subline_by = "parameter") %>%
  rtf_encode() %>%
  write_rtf("tmp.rtf")

@ianmoran11
Copy link

ianmoran11 commented Jan 14, 2022

Hi,

I've been keeping an eye on this repo since noticing my package, mmtable2, was mentioned in your call for table examples.

Below is some code outlining how the table above would be constructed with mmtable2.

I'm still working on this package, so any suggestions would be more than welcome!

# Load packages ---------------------------------------------------------------------
library(tidyverse)
library(mmtable2)
set.seed(123)

# Create frequency data frame ----------------------------------------------------
freq_df <-
  crossing(
    treatment = c("A", "B"),
    baseline = c(paste("Grade", 0:4), "Missing"),
    postbaseline = c(paste("Grade", 0:4), "Missing", " ")
  ) %>%
  mutate(n = sample(0:20, nrow(.), replace = T)) %>%
  mutate(`(%)` = 100 * n / sum(n)) %>%
  mutate(prepost = ifelse(postbaseline == " ", "Baseline", "Worst post-baseline value"))

# Calculate totals within treatments --------------------------------------------
total_df <-
  freq_df %>%
  group_by(treatment, prepost, postbaseline) %>%
  summarise(n = sum(n), `(%)` = sum(`(%)`)) %>%
  mutate(baseline = "Total")

# Calculate sums across all subjects --------------------------------------------
sum_df <-
  bind_rows(freq_df, total_df) %>%
  group_by(baseline, postbaseline, prepost) %>%
  summarise(n = sum(n), `(%)` = sum(`(%)`)) %>%
  mutate(treatment = "Sum")

# Preare data for table --------------------------------------------------------------
final_df <-
  bind_rows(freq_df,total_df,sum_df) %>%
  gather(unit, val, n, `(%)`) %>%
  mutate(unit = fct_relevel(unit, "n", "(%)")) %>%
  mutate(val = ifelse(unit == "(%)", sprintf("(%.1f)", val), val))

# Print dataframe used to construct main table -------------------------------
final_df
## # A tibble: 294 x 6
##    treatment baseline postbaseline prepost                   unit  val  
##    <chr>     <chr>    <chr>        <chr>                     <fct> <chr>
##  1 A         Grade 0  " "          Baseline                  n     14   
##  2 A         Grade 0  "Grade 0"    Worst post-baseline value n     18   
##  3 A         Grade 0  "Grade 1"    Worst post-baseline value n     13   
##  4 A         Grade 0  "Grade 2"    Worst post-baseline value n     2    
##  5 A         Grade 0  "Grade 3"    Worst post-baseline value n     9    
##  6 A         Grade 0  "Grade 4"    Worst post-baseline value n     17   
##  7 A         Grade 0  "Missing"    Worst post-baseline value n     10   
##  8 A         Grade 1  " "          Baseline                  n     4    
##  9 A         Grade 1  "Grade 0"    Worst post-baseline value n     19   
## 10 A         Grade 1  "Grade 1"    Worst post-baseline value n     13   
## # … with 284 more rows

# Create table ----------------------------------------------------------------------------

final_df %>%
  mmtable(cells = val) +
    header_top(unit) +
    header_top_left(postbaseline) +
    header_left(baseline) +
    header_left_top(treatment) +
    header_top_left(prepost) +
    header_merged_cols() +
    header_format(prepost,  list(cell_text(align = "center"))) +
    cells_format(cell_predicate = unit == "(%)", list(cell_text(align = "left")))

image

@Pierre-Wallet
Copy link
Author

Pierre-Wallet commented Feb 4, 2022

Hi all,

@elong0527 @hughjonesd @gmbecker @ianmoran11 @davidgohel

sorry for my late answer.
Thanks for the answers you provided, it actually helped me to optimize my solution.

In the solutions above, the percentages are not correctly calculated. The baseline percentage (1st column) is out of the number of patient in each treatment arm, and not out of the total number of patients.
It was not straightforward to get it, but the arrays are quite nice to perform such calculations.

I did not focused on the sum of the 2 treatment arms, as it is optional in the display.

I tried my best to get it with flextable+officer:
reprex_report.docx

Here is my code ( I added 2 parameters, the goal is to have one parameter per page)

library(haven)
library(tidyverse)
library(abind)
library(flextable) 
library(officer)
library(magrittr)

set.seed(123)
dat <- data.frame(
  Treatment        = rep(c("A", "B"), 100),
  Parameter        = sample(c("Param1", "Param2"), 200, replace = TRUE),
  `Baseline Grade` = sample(c(paste0("Grade ", 0:4) , NA), 200, replace = TRUE),
  PostGrade        = sample(c(paste0("Grade ", 0:4) , NA), 200, replace = TRUE),
  check.names      =  FALSE
)


dat1 <- dat %>%
  mutate(Baseline = factor(`Baseline Grade`, levels = c("Grade 0", "Grade 1", "Grade 2", "Grade 3", "Grade 4") ),
         Wpbaseline = factor(PostGrade, levels = c("Grade 0", "Grade 1", "Grade 2", "Grade 3", "Grade 4") ),
         TRT01AN = factor(Treatment, levels = c("A","B") ) )

# Calculation part ----

# Store frequencies in an array:
tbl <- xtabs( ~ Baseline + Wpbaseline + TRT01AN + Parameter, data = dat1, addNA = TRUE)

# add sum over first dimension (Baseline):
tbl2 <- addmargins(A = tbl, margin = c(1), FUN = sum)

# Get percentages in rows: freeze all dimensions except the second (Wpbaseline):
ptbl2 <- proportions(tbl2, margin = c(1,3,4))

# at this stage tbl2 and ptbl have same dimensions
joined <- tbl2
joined[] <- paste0(tbl2, " (", round(100*ptbl2,digits = 1), ")")

# add sum over second dimension (Wpbaseline) to get the total for each grade at baseline:
tbl3 <- addmargins(A = tbl, margin = c(2), FUN = sum)

# Get total and percentages for Totals at Baseline:
ptbl3 <- proportions(tbl3, margin = c(2,3,4))

tbl4 <- addmargins(A = tbl3, margin = c(1), FUN = sum)
ptbl4 <- addmargins(A = ptbl3, margin = c(1), FUN = sum)

# Get total and percentages at Baseline together
joined2 <- tbl4
joined2[] <- paste0(tbl4, " (", sprintf("%1.1f",round(100*ptbl4,digits = 1)), ")")

# retrieve the info concerning the sum and percentages at baseline
baseline <- joined2[,7,,]
dim(baseline)<-c(7,1,2,2)

# concatenate the baseline information with the previous array thanks to abind package
joined3 <- abind(baseline, joined, along = 2)

# Cosmetic
dimnames(joined3)
names(dimnames(joined3))<-c('Baseline', 'Wpbaseline', 'Treatment', 'Parameter')

dimnames(joined3)$Baseline[which(dimnames(joined3)$Baseline == "sum")] <- "Total"
dimnames(joined3)$Baseline[is.na(dimnames(joined3)$Baseline)] <- "Missing"

dimnames(joined3)$Wpbaseline[which(dimnames(joined3)$Wpbaseline == "")] <- "Total"
dimnames(joined3)$Wpbaseline[is.na(dimnames(joined3)$Wpbaseline)] <- "Missing"

joined3[which(joined3=="0 (0)")]<-"0"
joined3[which(joined3=="0 (0.0)")]<-"0"
joined3[which(joined3=="0 (NaN)")]<-"0"

# Build back a data frame to do the reporting part
for (j in 1:dim(joined3)[4]) {
  for (i in 1:dim(joined3)[3]) {
    df_current <- 
      data.frame(joined3[,,i,j]) %>% 
      as_tibble() %>%
      # Adding the the lab parameter, treatment and Baseline grade in the newly created df
      add_column(Parameter = dimnames(joined3)[[4]][j],
                 TRT01AN = dimnames(joined3)[[3]][i],
                 Bgrade = rownames(joined3[,,i,j]),
                 .before = "Total") %>%
      remove_rownames()
    
    #Concatenation of df created at each step
    if (i==1 & j==1){ df_final<-df_current}
    else df_final %<>% bind_rows(df_current)
    
  }
}
  
# Reporting part----
doc <- read_docx() 
for(i in as.vector(unique(df_final$Parameter))){
  doc <- body_add_par(doc, "Table 14.3_3_3.4", style = "centered") %>%                               # Title
    body_add_par("Hematology shift table based on CTC grade by treatment", style = "centered") %>%    # Title
    body_add_par("Safety Set", style = "centered") %>%                                                # Title
    body_add_par(paste0("Parameter =", i), style = "Normal") %>%                                      # Line for laboratory parameter names
    body_add_par(" ") %>%                                                                             # Line break          
    body_add_flextable( flextable(df_final[df_final$Parameter==i,-1]) %>%  # remove parameter variable
                          # Change width of TRT01AN variable
                          width(j = ~ TRT01AN, width = 3) %>% 
                          # Merge cell of variable Parameter and TRT01AN vertically 
                          merge_v( j = ~   TRT01AN , part = "body") %>%
                          # Add header to specified Baseline and Worst post baseline value
                          add_header( TRT01AN = "Treatment", 
                                      Bgrade = "Baseline",
                                      Total = "Baseline",
                                      Grade.0 = "Worst post baseline value",
                                      Grade.1 = "Worst post baseline value",
                                      Grade.2 = "Worst post baseline value",
                                      Grade.3 = "Worst post baseline value",
                                      Grade.4 = "Worst post baseline value",
                                      Missing = "Worst post baseline value",
                                      top =TRUE) %>% 
                          # Rename variable
                          set_header_labels(TRT01AN = "Treatment",
                                            Bgrade = " ",
                                            Total = "n (%)",
                                            Grade.0 = "Grade 0",
                                            Grade.1 = "Grade 1",
                                            Grade.2 = "Grade 2",
                                            Grade.3 = "Grade 3",
                                            Grade.4 = "Grade 4",
                                            Missing = "Missing") %>%
                          # Add footer section
                          add_footer_lines(values = c(" - Baseline is defined as the last non-missing value prior to or the date of the first dose of study treatment \n - Baseline percentages based on N. Percentage for worst post-baseline values is based on Baseline n. \n - CTC grading version 4.03 is used")) %>%
                          # Merge cell horizontally of header
                          merge_h( i = 1, part = "header") %>% 
                          # Merge cell vertically of header
                          merge_v(j = ~ TRT01AN, part = "header") %>% 
                          # Apply vanilla theme
                          theme_vanilla() %>% 
                          # Add  white line between Baseline and Worst post baseline value of header to separate 
                          border( i = 2, j = 4, border.left = fp_border(color = "white", width = 10), part = "header") %>% 
                          # Remove black line of the top header
                          border( border.top = fp_border(color = "white"), part = "header") %>%
                          # vertical alignment for treatment
                          valign(j=1, valign="top", part="body") %>%
                          # Center text of column 3 to 9 of body 
                          align( j = 3:9, align = "center", part = "body") %>%
                          # Left text of column 1 to 3 of body and footer
                          align( j = 1:2, align = "left", part = "body") %>%
                          align( align = "left", part = "footer") %>%
                          # Center text of header 
                          align( align = "center", part = "header") %>% 
                          # Choose fontsize
                          fontsize(size = 8, part = "all") %>%
                          # Update rows height
                          height_all(height = 0.25, part="body") %>%
                          hrule(rule="exact") %>%
                          # add a border under Baseline and Wpbaseline
                          hline(i = 1, j = c(2,3,4), border = fp_border(color="black"), part = "header")
    ) %>% 
    body_end_section_landscape()    # orientation page : landscape 
  
}
print(doc, target = "~/reprex_report.docx")

@NNaikp
Copy link

NNaikp commented Feb 5, 2022

Hi @Pierre-Wallet !

I just tried it using our table package NNtable.

    library(dplyr)
    library(tidyr)
    library(NNtable)
    
    set.seed(123)
    
    # Create frequency data frame ----------------------------------------------------
    freq_df <-
      crossing(
        TRTA = c("A", "B"),
        baseline = c(paste("Grade", 0:4), "Missing"),
        postbaseline = c(paste("Grade", 0:4), "Missing")
      ) %>%
      mutate(N = sample(0:20, nrow(.), replace = T)) 
    
    
    # Totals ----------------------------------------------------------
    tr_tot <- freq_df %>% 
      group_by(TRTA) %>% 
      summarise(N_tot = sum(N))
    
    base_tot <- freq_df %>% 
      group_by(TRTA,baseline) %>% 
      summarise(N_base = sum(N), .groups = "drop") %>% 
      left_join(tr_tot, by = "TRTA") %>% 
      mutate(P_base = 100*(N_base/N_tot),
             TRTANEW = paste0(TRTA," (N=",N_tot,")"),
             lab = "Baseline")
    
    # Update for outputting
    base_tot_f <- base_tot %>% 
      rename("N" = "N_base",
             "P" = "P_base") %>% 
      mutate(postbaseline = " ")
    
    # worst stats
    sum_stats <- freq_df %>% 
      left_join(base_tot, by = c("TRTA","baseline"), copy = FALSE) %>% 
      mutate(P = 100*(N/N_base),
             TRTANEW = paste0(TRTA," (N=",N_tot,")"),
             lab = "Worst post-baseline value") %>% 
      select(colnames(base_tot_f))
    
    all_out <- rbind(base_tot_f,sum_stats) 
    
    
    # Create table ----------------------------------------------------------
    nntable <- NNTable(all_out,"TRTANEW","lab"," " = "baseline","postbaseline",
                       "N","(%)" = "(P)") %>% 
      addTransWide(lab = list(postbaseline = c("N","(%)"))) %>% 
      addUnderScore() %>% 
      addGroupedColumns("TRTANEW") %>% 
      addFormat(format_data = c(N = "%.0f", P = "%.1f"))
      
    
    nntable

Which gives:

    ————————————————————————————————————————————————————————————————————————————————————————————————————
                        Baseline                      Worst post-baseline value                   
                       —————————  ————————————————————————————————————————————————————————————————
                                   Grade 0    Grade 1    Grade 2    Grade 3    Grade 4    Missing 
                       —————————  —————————  —————————  —————————  —————————  —————————  —————————
                        N   (%)    N   (%)    N   (%)    N   (%)    N   (%)    N   (%)    N   (%) 
    ————————————————————————————————————————————————————————————————————————————————————————————————————
                                                                                                  
    A (N=345) Grade 0  71 (20.6)   7 ( 9.9)   6 ( 8.5)  14 (19.7)  18 (25.4)  16 (22.5)  10 (14.1)
              Grade 1  40 (11.6)   6 (15.0)  18 (45.0)   2 ( 5.0)   5 (12.5)   6 (15.0)   3 ( 7.5)
              Grade 2  69 (20.0)   9 (13.0)  16 (23.2)  15 (21.7)   7 (10.1)  20 (29.0)   2 ( 2.9)
              Grade 3  66 (19.1)  11 (16.7)  20 (30.3)  13 (19.7)   4 ( 6.1)  13 (19.7)   5 ( 7.6)
              Grade 4  62 (18.0)   8 (12.9)   0 ( 0.0)  19 (30.6)  13 (21.0)  18 (29.0)   4 ( 6.5)
              Missing  37 (10.7)   1 ( 2.7)   0 ( 0.0)  19 (51.4)   3 ( 8.1)  10 (27.0)   4 (10.8)
                                                                                                  
    B (N=315) Grade 0  33 (10.5)   3 ( 9.1)  18 (54.5)   2 ( 6.1)   0 ( 0.0)   6 (18.2)   4 (12.1)
              Grade 1  60 (19.0)   8 (13.3)  16 (26.7)  20 (33.3)   4 ( 6.7)  11 (18.3)   1 ( 1.7)
              Grade 2  50 (15.9)  17 (34.0)   3 ( 6.0)   9 (18.0)  14 (28.0)   7 (14.0)   0 ( 0.0)
              Grade 3  61 (19.4)  10 (16.4)   4 ( 6.6)  10 (16.4)  14 (23.0)  14 (23.0)   9 (14.8)
              Grade 4  69 (21.9)  14 (20.3)  14 (20.3)  19 (27.5)  11 (15.9)   8 (11.6)   3 ( 4.3)
              Missing  42 (13.3)   2 ( 4.8)   6 (14.3)   3 ( 7.1)   7 (16.7)  20 (47.6)   4 ( 9.5)
    ————————————————————————————————————————————————————————————————————————————————————————————————————

@davidgohel
Copy link
Collaborator

Hi!

I took time to do some work on flextable for this:

package needs to be installed from github for these new functions.

remotes::install_github("davidgohel/flextable")

few format functions

library(data.table)
library(flextable)

multi_fun <- function(x) {
  list(mean = mean(x),
       sd = sd(x))
}
myformat <- function(z){
  x <- sprintf("%.1f", z)
  x[is.na(z)] <- ""
  x
}
n_format <- function(z) {
  x <- sprintf("%.0f", z)
  x[is.na(z)] <- ""
  x
}

data prep

Aggregations need to be prepared before using flextable.

dat <- as.data.table(ggplot2::diamonds)
dat <- dat[cut %in% c("Fair", "Good", "Very Good")]
dat <- dat[clarity %in% c("I1", "SI1", "VS2")]

nstat <- dat[, list(n = .N),
             by = c("cut")]

dat <- dat[, unlist(lapply(.SD, multi_fun),
                    recursive = FALSE),
           .SDcols = c("z", "y"),
           by = c("cut", "color", "clarity")]
dat
#>           cut color clarity   z.mean      z.sd   y.mean      y.sd
#>  1: Very Good     H     SI1 3.817221 0.6664327 6.171371 1.0754836
#>  2:      Fair     E     VS2 3.529286 0.5245551 5.449524 0.7673392
#>  3:      Good     J     SI1 4.019773 0.7004727 6.430795 1.0808269
#>  4: Very Good     J     SI1 4.017088 0.6838668 6.510000 1.1041829
#>  5: Very Good     E     VS2 3.363797 0.5530067 5.466143 0.8988358
#>  6: Very Good     J     VS2 4.017500 0.6964981 6.513750 1.1436226
#>  7: Very Good     D     VS2 3.326278 0.5282660 5.382913 0.8421964
#>  8:      Good     H     SI1 3.736894 0.6377995 5.987319 1.0611947
#>  9:      Good     D     VS2 3.467596 0.5185969 5.538365 0.8182376
#> 10: Very Good     F     SI1 3.599481 0.5245868 5.827370 0.8484646
#> 11: Very Good     I     SI1 3.936257 0.6808834 6.356983 1.1074043
#> 12:      Good     I     SI1 3.853758 0.7443514 6.147697 1.2262727
#> 13: Very Good     G     SI1 3.560823 0.5933639 5.761730 0.9487383
#> 14:      Good     E     VS2 3.505000 0.5103667 5.677625 0.8019399
#> 15: Very Good     E     SI1 3.471629 0.5479509 5.619233 0.8996010
#> 16: Very Good     D     SI1 3.447955 0.5314568 5.594636 0.8764198
#> 17: Very Good     F     VS2 3.488906 0.5884464 5.661953 0.9465291
#> 18:      Fair     F     VS2 3.593774 0.4990191 5.626226 0.6970742
#> 19: Very Good     J      I1 4.433750 0.5231754 7.186250 0.8307473
#> 20:      Good     I     VS2 4.009364 0.6571069 6.468455 1.0810768
#> 21: Very Good     G     VS2 3.590731 0.6208030 5.814739 0.9953374
#> 22:      Good     G     VS2 3.622865 0.5710884 5.819115 0.8951742
#> 23:      Good     F     VS2 3.531576 0.5507295 5.663641 0.8695877
#> 24: Very Good     H     VS2 3.690452 0.6885978 5.972926 1.1102574
#> 25:      Fair     G     SI1 3.852319 0.4371907 5.922754 0.6362248
#> 26:      Fair     E      I1 4.008889 0.3876030 6.061111 0.4297221
#> 27:      Good     F     SI1 3.568425 0.5056869 5.720879 0.8231311
#> 28:      Good     E     SI1 3.486592 0.5638868 5.567803 0.9138209
#> 29:      Fair     I     SI1 4.102000 0.4462905 6.402667 0.6600676
#> 30:      Fair     G      I1 4.233962 0.7219374 6.434906 1.0849683
#> 31:      Fair     F      I1 4.000857 0.7043536 6.040571 1.0820350
#> 32:      Good     G     SI1 3.735024 0.5568696 5.952464 0.9221867
#> 33:      Fair     J     VS2 3.960000 0.6183629 6.163478 1.0095976
#> 34: Very Good     E      I1 3.955000 0.5511784 6.436818 0.8837549
#> 35:      Fair     J     SI1 4.135357 0.5361142 6.549643 0.8858454
#> 36:      Good     J     VS2 4.007444 0.6286289 6.446000 1.0575940
#> 37: Very Good     I     VS2 3.917372 0.6876389 6.368686 1.1444792
#> 38:      Fair     F     SI1 3.726506 0.5419464 5.871325 0.8438367
#> 39:      Good     I      I1 4.305556 0.5688390 6.978889 1.0471920
#> 40:      Good     H     VS2 3.690000 0.6633151 5.895725 1.0695806
#> 41:      Good     H      I1 4.187857 0.5881807 6.745714 0.7694425
#> 42: Very Good     H      I1 4.600833 0.5658374 7.437500 0.8752259
#> 43:      Fair     H     SI1 4.110533 0.5221341 6.355467 0.8221269
#> 44:      Fair     D     VS2 3.704800 0.4685449 5.886800 0.7054828
#> 45:      Good     D     SI1 3.450844 0.5407419 5.509409 0.8788101
#> 46:      Fair     E     SI1 3.739077 0.5040763 5.903692 0.8057694
#> 47:      Fair     H      I1 4.549231 0.7059102 6.858462 1.0348428
#> 48: Very Good     I      I1 4.628750 0.9238806 7.452500 1.4414749
#> 49:      Fair     I     VS2 3.851875 0.4773342 6.154375 0.7318379
#> 50:      Fair     I      I1 4.405294 0.5181508 6.653529 0.8454222
#> 51: Very Good     G      I1 4.050000 0.5254585 6.525000 0.8003749
#> 52:      Fair     J      I1 4.990435 0.9209850 7.457391 1.3570832
#> 53:      Good     E      I1 4.263913 0.5527760 6.898696 0.7825152
#> 54:      Good     G      I1 3.869474 1.1021720 6.554737 0.9806255
#> 55:      Fair     H     VS2 3.975122 0.5877462 6.161220 0.9045750
#> 56:      Good     F      I1 3.861053 0.5899708 6.120000 0.9517820
#> 57:      Fair     D     SI1 3.864828 0.4383792 5.957069 0.6556823
#> 58:      Fair     G     VS2 3.955556 0.4868617 6.137333 0.7873442
#> 59: Very Good     F      I1 4.125385 0.5468793 6.714615 0.9804218
#> 60: Very Good     D      I1 3.886000 0.2823650 6.242000 0.4232257
#> 61:      Good     D      I1 3.841250 0.6805552 6.293750 1.2610533
#> 62:      Good     J      I1 4.310000 0.5615455 6.982500 0.7329563
#> 63:      Fair     D      I1 4.905000 0.9796768 7.422500 1.4170009
#>           cut color clarity   z.mean      z.sd   y.mean      y.sd
setDF(dat)
setDF(nstat)

flextable creation

grey_txt <- fp_text_default(color = "gray")

cft_2 <- tabulator(
  x = dat, rows = c("cut", "color"),
  cols = "clarity",
  hidden_data = nstat,
  row_compose = list(
    cut = as_paragraph(as_chunk(cut), " (N=", as_chunk(n, formatter = n_format), ")")
  ),
  `z stats` = as_paragraph(
    as_chunk(z.mean, formatter = myformat)),
  `y stats` = as_paragraph(
    as_chunk(y.mean, formatter = myformat),
    as_chunk(" (\u00B1 ", props = grey_txt),
    as_chunk(y.sd, formatter = myformat, props = grey_txt),
    as_chunk(")", props = grey_txt)
  )
)
ft_2 <- as_flextable(cft_2)
ft_2 <- autofit(x = ft_2, add_w = .05)
ft_2 <- color(ft_2, i = ~ cut %in% "Very Good", color = "blue")
ft_2 <- add_header_lines(x = ft_2, "blah blah blah blah blah blah blah blah blah blah")
ft_2

Capture d’écran 2022-02-11 à 19 21 30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants