Is there a way to shade or recode_shadow on the whole df? #249

jzadra · 2020-01-27T19:14:14Z

Is there any way to do a shade() or a recode_shadow() on the entire df to handle special missings like -99 for every column? Both seem to only operate on vectors currently.

The text was updated successfully, but these errors were encountered:

njtierney · 2020-06-10T05:40:16Z

No currently but that is a great suggestion!

hhp94 · 2023-02-10T16:56:42Z

Thank you for {naniar}, please excuse me for bumping this feature request!

A common use case I can see for {naniar} is metabolism panel data where, in wide form, each column is a metabolite, metal, or chemical. These values have particular types of missings called "limit of detection (LOD)" or "limit of quantitation (LOQ)". It would be great if we can do recode_shadow() for all these columns using across() or the _if _at format. A typical panel data looks like the one below.

n_people <- 5
n_chemicals <- 5
prob_missing <- 0.5
chemical_names <- paste0("chemical_", seq_len(n_chemicals))

lod_fns <- function(n) {
  flip_coins <- runif(n)
  value <- round(rnorm(n), 3)
  lod_loq <- sample(c("NA_LOD", "NA_LOQ"), size = n, replace = TRUE)
  ifelse(flip_coins <= prob_missing, lod_loq, as.character(value))
}

panel_long <- data.frame(
  id = rep(seq_len(n_people), n_chemicals),
  chemicals = rep(chemical_names, each = n_people),
  value = lod_fns(n_people * n_chemicals)
  )
  
panel_wide <- panel_long |>
  tidyr::pivot_wider(id_cols = id,
                     names_from = "chemicals",
                     values_from = "value")

panel_wide

# id    chemical_1 chemical_2 chemical_3 chemical_4 chemical_5
# <int> <chr>      <chr>      <chr>      <chr>      <chr>     
# 1     1 NA_LOD     NA_LOQ     NA_LOQ     NA_LOQ     NA_LOQ    
# 2     2 NA_LOQ     NA_LOQ     NA_LOQ     NA_LOQ     NA_LOD    
# 3     3 -0.843     NA_LOQ     -0.275     NA_LOQ     -0.767    
# 4     4 1          NA_LOD     0.244      0.788      -0.532    
# 5     5 -1.823     NA_LOD     0.313      1.426      0.196

For this special type of data, since the chemicals are usually the same type of data. I can see two solutions.

Implement across() or the _if _at format for recode_shadow() or
Something like "shadow_wider()" where you would do recode_shadow() on the "value" column of the long data (panel_long) and then pivot to wide data while keeping the shadow matrix

I am not too familiar with {naniar} source codes but I would love to take a crack at this. I would love some pointers to where I should start reading.

Cheers!

EDIT: typos

njtierney added this to the V0.6.0 milestone May 19, 2020

antondutoit mentioned this issue Sep 24, 2020

Feature request: scoped shadow_recode variants #274

Open

njtierney modified the milestones: V0.6.0, V0.7.0, V0.8.0 Oct 14, 2022

njtierney added the Priority 2 label Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to shade or recode_shadow on the whole df? #249

Is there a way to shade or recode_shadow on the whole df? #249

jzadra commented Jan 27, 2020

njtierney commented Jun 10, 2020

hhp94 commented Feb 10, 2023 •

edited

Loading

Is there a way to shade or recode_shadow on the whole df? #249

Is there a way to shade or recode_shadow on the whole df? #249

Comments

jzadra commented Jan 27, 2020

njtierney commented Jun 10, 2020

hhp94 commented Feb 10, 2023 • edited Loading

hhp94 commented Feb 10, 2023 •

edited

Loading