Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using visdat with a tbl_dbi database object #76

Open
ablack3 opened this issue Apr 5, 2018 · 3 comments
Open

Using visdat with a tbl_dbi database object #76

ablack3 opened this issue Apr 5, 2018 · 3 comments
Milestone

Comments

@ablack3
Copy link

ablack3 commented Apr 5, 2018

Is it possible to use visdat with a database table (referenced by a tbl_dbi object in R)?

@ablack3
Copy link
Author

ablack3 commented Apr 6, 2018

Here is an idea. Can I supply visdat with a table containing the missing value pattern in the database table and get a nice plot?

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(wakefield))

df <- r_data_frame(n= 100, id, age, height) %>% 
     r_na(prob = .4)

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")


tbl_dbi <- copy_to(con, df, "df")
tbl_dbi
#> # Source:   table<df> [?? x 3]
#> # Database: sqlite 3.19.3 []
#>    ID      Age Height
#>    <chr> <int>  <dbl>
#>  1 001      NA   71.0
#>  2 002      21   68.0
#>  3 003      NA   62.0
#>  4 004      34   NA  
#>  5 005      NA   NA  
#>  6 006      20   NA  
#>  7 007      31   NA  
#>  8 008      24   72.0
#>  9 009      20   74.0
#> 10 010      NA   70.0
#> # ... with more rows

# function to pull info from database to feed into visdat
count_missing <- function(tbl_dbi, ...){
     dots <- rlang::enquos(...)
     tbl_dbi %>% 
          ungroup() %>% 
          mutate_at(vars(!!!dots), funs(case_when(
               . == "" ~ "blank",
               is.na(.) ~ "NA",
               TRUE ~ "not blank/NA"
          ))) %>% 
          count(!!!dots) 
}

# pull info from database needed to construct visdat plot
count_missing(tbl_dbi, Age, Height)
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.19.3 []
#> # Groups:   Age
#>   Age          Height           n
#>   <chr>        <chr>        <int>
#> 1 NA           NA              16
#> 2 NA           not blank/NA    24
#> 3 not blank/NA NA              24
#> 4 not blank/NA not blank/NA    36
Created on 2018-04-06 by the reprex package (v0.2.0).

@njtierney njtierney added V0.6.0 and removed V0.5.0 labels Jun 4, 2018
@njtierney
Copy link
Collaborator

Thanks for filing an issue, and for the nice illustrative example!

I would like to have visdat work with objects "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl" - the way that I think would be best to address this would be to create separate functions for each of the vis_ family to operate on tbl_dbi objects.

So currently there is vis_dat() which checks if you have a dataframe, but then there would be vis_dat.tbl_dbi, which would "just work" with tbl_dbi - so you'd type vis_dat(tbl_dbi) and you'd get a plot that you would expect.

This might be made easier by some of the ideas in retrieving the data, described in #78.

At the moment I don't think that this will make it into the next release of visdat (submitting today or tomorrow to CRAN), but I would be really interested to discuss this in more detail - would you be interested in discussing this further?

@njtierney njtierney added this to the V0.6.0 milestone Jun 6, 2018
@njtierney njtierney removed the V0.6.0 label Jun 6, 2018
@njtierney njtierney modified the milestones: V0.6.0, V0.7.0 Jun 6, 2018
@ablack3
Copy link
Author

ablack3 commented Jun 7, 2018

Sure! I think that it would be handy to easily get a missing data plot from a database table. The application I'm thinking of would be for use with large datasets that are too big to get into R. Your idea of making the vis_dat function generic and creating a tbl_dbi method seems like a good one to me. For column oriented databases creating a viz based on a few columns should be very fast even for a huge number of rows.

Would you be able to create a lower level internal function that would accept something like the table produced by the count_missing function I defined above and produce a viz?

@njtierney njtierney modified the milestones: 0.8.0, 0.9.0 Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants