-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decide and document if remove_constants()
should remove rows or cols first
#171
Comments
... or if this process should be done iteratively until getting no empty row/col or constant column #load library
library(tidyverse)
#create dataset
df <- tibble(
x = c(1,2),
y = c(1,3)
) %>%
mutate(invariant = rep("a",nrow(.))) %>%
mutate(invariant2 = rep("b",nrow(.))) %>%
mutate(empty1 = rep(NA_Date_, nrow(.))) %>%
mutate(empty2 = rep(NA_Date_, nrow(.))) %>%
mutate(empty3 = rep(NA_Date_, nrow(.))) %>%
add_row(x = NA_integer_, invariant = "a") %>%
add_row(x = NA_integer_, invariant = "a") %>%
add_row(x = NA_integer_, invariant = "a") %>%
add_row(x = NA_integer_)
df
#> # A tibble: 6 × 7
#> x y invariant invariant2 empty1 empty2 empty3
#> <dbl> <dbl> <chr> <chr> <date> <date> <date>
#> 1 1 1 a b NA NA NA
#> 2 2 3 a b NA NA NA
#> 3 NA NA a <NA> NA NA NA
#> 4 NA NA a <NA> NA NA NA
#> 5 NA NA a <NA> NA NA NA
#> 6 NA NA <NA> <NA> NA NA NA
df %>%
janitor::remove_constant(na.rm = TRUE) %>%
janitor::remove_empty(which = "rows")
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 2 3
df %>%
janitor::remove_empty(which = "rows") %>%
janitor::remove_constant(na.rm = TRUE) %>%
janitor::remove_empty(which = "rows")
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 2 3
df %>%
janitor::remove_empty(which = "cols") %>%
janitor::remove_empty(which = "rows") %>%
janitor::remove_constant(na.rm = TRUE) %>%
janitor::remove_empty(which = "rows")
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 2 3
packageVersion("cleanepi")
#> [1] '1.0.2' Created on 2024-09-30 with reprex v2.1.0 |
@Bisaloo - the order no longer matters. We introduced an iterative constant data removal process when solving issue #178. In the current x <- cars
x[nrow(x)+1, ] <- NA
x$year <- 2020
tail(x)
#> speed dist year
#> 46 24 70 2020
#> 47 24 92 2020
#> 48 24 93 2020
#> 49 24 120 2020
#> 50 25 85 2020
#> 51 NA NA 2020
dim(x)
#> [1] 51 3
xx = cleanepi::remove_constants(x)
#> Constant data was removed after 2 iterations. See the report for more details.
dim(xx)
#> [1] 50 2
tail(xx)
#> speed dist
#> 45 23 54
#> 46 24 70
#> 47 24 92
#> 48 24 93
#> 49 24 120
#> 50 25 85 Created on 2024-10-18 with reprex v2.1.0 |
I agree to have this documented and tested. Now, I am wondering if there may be a scenario when iteratively running |
If cols are removed first, then a row might suddenly turn constant.
See the last row in this reprex:
Created on 2024-08-07 with reprex v2.1.1
Should it be removed or not? Should this choice be left to the user as an extra argument?
In all cases, the chosen behaviour should be documented and tested.
The text was updated successfully, but these errors were encountered: