Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to shade or recode_shadow on the whole df? #249

Open
jzadra opened this issue Jan 27, 2020 · 2 comments
Open

Is there a way to shade or recode_shadow on the whole df? #249

jzadra opened this issue Jan 27, 2020 · 2 comments
Milestone

Comments

@jzadra
Copy link

jzadra commented Jan 27, 2020

Is there any way to do a shade() or a recode_shadow() on the entire df to handle special missings like -99 for every column? Both seem to only operate on vectors currently.

@njtierney njtierney added this to the V0.6.0 milestone May 19, 2020
@njtierney
Copy link
Owner

No currently but that is a great suggestion!

@hhp94
Copy link

hhp94 commented Feb 10, 2023

Thank you for {naniar}, please excuse me for bumping this feature request!

A common use case I can see for {naniar} is metabolism panel data where, in wide form, each column is a metabolite, metal, or chemical. These values have particular types of missings called "limit of detection (LOD)" or "limit of quantitation (LOQ)". It would be great if we can do recode_shadow() for all these columns using across() or the _if _at format. A typical panel data looks like the one below.

n_people <- 5
n_chemicals <- 5
prob_missing <- 0.5
chemical_names <- paste0("chemical_", seq_len(n_chemicals))

lod_fns <- function(n) {
  flip_coins <- runif(n)
  value <- round(rnorm(n), 3)
  lod_loq <- sample(c("NA_LOD", "NA_LOQ"), size = n, replace = TRUE)
  ifelse(flip_coins <= prob_missing, lod_loq, as.character(value))
}

panel_long <- data.frame(
  id = rep(seq_len(n_people), n_chemicals),
  chemicals = rep(chemical_names, each = n_people),
  value = lod_fns(n_people * n_chemicals)
  )
  
panel_wide <- panel_long |>
  tidyr::pivot_wider(id_cols = id,
                     names_from = "chemicals",
                     values_from = "value")

panel_wide

# id    chemical_1 chemical_2 chemical_3 chemical_4 chemical_5
# <int> <chr>      <chr>      <chr>      <chr>      <chr>     
# 1     1 NA_LOD     NA_LOQ     NA_LOQ     NA_LOQ     NA_LOQ    
# 2     2 NA_LOQ     NA_LOQ     NA_LOQ     NA_LOQ     NA_LOD    
# 3     3 -0.843     NA_LOQ     -0.275     NA_LOQ     -0.767    
# 4     4 1          NA_LOD     0.244      0.788      -0.532    
# 5     5 -1.823     NA_LOD     0.313      1.426      0.196 

For this special type of data, since the chemicals are usually the same type of data. I can see two solutions.

  • Implement across() or the _if _at format for recode_shadow() or
  • Something like "shadow_wider()" where you would do recode_shadow() on the "value" column of the long data (panel_long) and then pivot to wide data while keeping the shadow matrix

I am not too familiar with {naniar} source codes but I would love to take a crack at this. I would love some pointers to where I should start reading.

Cheers!

EDIT: typos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants