Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as.list method for use with writexl::write_xlsx #665

Open
jbkunst opened this issue Jun 2, 2021 · 13 comments
Open

as.list method for use with writexl::write_xlsx #665

jbkunst opened this issue Jun 2, 2021 · 13 comments

Comments

@jbkunst
Copy link

jbkunst commented Jun 2, 2021

Hi everyone, thank so much for this package.

I want to know your opinion about implement a as.list or similar for the skim_df class.

For example the next function transform a skim_df object into a list so you can export all elements (partition(x)) to a excel file
using writexl::write_xlsx. And have something like this:

image

 library(skimr)
 
 sk <- skim(head(iris))
 
 as.list.skim_df <- function(x, ...){
   
   tsummary <- as.data.frame(summary(x))
   tsummary <- tibble::as_tibble(tsummary)
   tsummary <- dplyr::select(tsummary, 1, 3)
   tsummary <- setNames(tsummary, c("",""))
   
   tdetails <- skimr::partition(x)
   tdetails <- purrr::map(tdetails, tibble::as_tibble)
   
   out <- c(list(summary = tsummary), tdetails)
   
   out
   
 }
 
 as.list(sk)
#> $summary
#> # A tibble: 9 x 2
#>   ``                           ``          
#>   <fct>                        <fct>       
#> 1 "Name"                       "head(iris)"
#> 2 "Number of rows "            "6"         
#> 3 "Number of columns "         "5"         
#> 4 "_______________________ "   " "         
#> 5 "Column type frequency: "    " "         
#> 6 "  factor"                   "1"         
#> 7 "  numeric"                  "4"         
#> 8 "________________________  " " "         
#> 9 "Group variables"            "None"      
#> 
#> $factor
#> # A tibble: 1 x 6
#>   skim_variable n_missing complete_rate ordered n_unique top_counts            
#>   <chr>             <int>         <dbl> <lgl>      <int> <chr>                 
#> 1 Species               0             1 FALSE          1 set: 6, ver: 0, vir: 0
#> 
#> $numeric
#> # A tibble: 4 x 11
#>   skim_variable n_missing complete_rate  mean     sd    p0   p25   p50   p75
#>   <chr>             <int>         <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Sepal.Length          0             1 4.95  0.288    4.6  4.75  4.95  5.07
#> 2 Sepal.Width           0             1 3.38  0.343    3    3.12  3.35  3.58
#> 3 Petal.Length          0             1 1.45  0.138    1.3  1.4   1.4   1.48
#> 4 Petal.Width           0             1 0.233 0.0816   0.2  0.2   0.2   0.2 
#> # ... with 2 more variables: p100 <dbl>, hist <chr>
 
 # so you can do:
 # writexl::write_xlsx(as.list(sk), path = "/some/path/excel_file.xlsx")

Created on 2021-06-02 by the reprex package (v2.0.0)

@michaelquinn32
Copy link
Collaborator

Thanks for the suggestion!

We handle a lot of this within reshape.R. Have a look:
https://github.com/ropensci/skimr/blob/master/R/reshape.R

We could expand partition with an argument include.summary to do something like that. We would need to be a little careful about the partition/bind round trip behavior, but I think this is reasonable.

@elinw, what do you think?

@elinw
Copy link
Collaborator

elinw commented Jun 27, 2021

I was thinking about this. I think that if the goal is to add the summary as an element of the list specifically for the purpose of being able to write it to an Excel file I think it's better to start with that idea rather than starting with the solution of putting summary into a list with everything else. We should think about what the general use case for a list of skim objects would be. First, there are other packages that support writing to Excel files (e.g. openxlsx) so we should try to be general enough to support all of them. Second, in terms of data, just one thought, why is it that we would put summary in the same sheet with everything else? Maybe it would make more sense to default to one partition per worksheet and then put the summary on its own sheet. It might make sense even to have two different methods, write.skim_df() and then write.summary_skim_df().

@michaelquinn32
Copy link
Collaborator

Yes @elinw I would really like a generic for this too from other packages. Should we

  • decide which we support
  • make sure we have generics for this as well

Then we could make changes on our side as well to accommodate this.

@elinw
Copy link
Collaborator

elinw commented Dec 4, 2021

Because we are talking about lists of skimmed objects ere we should also be conscious of any interaction with purrr #671 and print.

@elinw
Copy link
Collaborator

elinw commented Jan 9, 2022

This actually works fine for me:

writel::write_xlsx(partition(skim(iris)), path = "irisskim.xls")

I that would work for opensxl::write.xlsx but it expects data frames.

@elinw
Copy link
Collaborator

elinw commented Jan 9, 2022

Okay actually ... the list is skim_list and not a listbut if you add the "list" as a class openxlsx also works but with warnings.

Warning messages:
1: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded
2: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded

@elinw
Copy link
Collaborator

elinw commented Jan 17, 2022

@michaelquinn32 Are you fine with adding the "list" class in a second slot of skim_list objects? Do you see any downside?

@michaelquinn32
Copy link
Collaborator

FWICT, reading and writing with writexl and readxl is working fine, as long as you partition first.
https://colab.research.google.com/drive/1osM9l78MtsR8af_XONn-utauGtDEzCAm?usp=sharing

I don't think we need to inherit from a list with a skim_list. I believe that it's always implied.
https://stackoverflow.com/questions/19607652/why-doesnt-classdata-frame-show-list-inheritance

iris %>%
  skim() %>%
  partition() %>%
  is.list()
#> TRUE

One change I would like to see, though, is for the updated summaries to be included as a frame in the skim_list.

@elinw
Copy link
Collaborator

elinw commented Jan 18, 2022

But for openxlsx it isn't working because it expects a list. But I tried adding list to the class and it didn't work as expected (it put the whole skim data frame in both tabs. But coercing to a list works fine, so I think we should simply add documentation.

@michaelquinn32
Copy link
Collaborator

We should open a bug with openxlsx on this.

  • Their check for inherits("list") could be improved
    • They could check for more specific types (data frames), before more general types
    • They could switch to S3

There also seems to be an issue with the output that they're writing. Instead of preserving the list structure from skimr, they seem to be collapsing everything into a single table. See here:
https://colab.research.google.com/drive/1osM9l78MtsR8af_XONn-utauGtDEzCAm#scrollTo=KDyDfpdCnonk

@michaelquinn32
Copy link
Collaborator

It's also worth nothing that list() is not really a class in S3; it's a type. That's why a data frame returns TRUE for is.list(). That's mostly explained by this cryptic message.

Here, we describe the so called “S3” classes (and methods). For “S4” classes (and methods), see
‘Formal classes’ below.

Many R objects have a class attribute, a character vector giving the names of the classes from
which the object inherits. (Functions oldClass and oldClass<- get and set the attribute, which can
also be done directly.)

If the object does not have a class attribute, it has an implicit class, notably "matrix", "array",
"function" or "numeric" or the result of typeof(x) (which is similar to mode(x)), but for type "language"
and mode "call", where the following extra classes exist for the corresponding function calls: if,
while, for, =, <-, (, {, call.

https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html

@elinw
Copy link
Collaborator

elinw commented Jan 2, 2023

Okay so it's almost a year later. The good think is that I tried openxlsx with partition and it works so that is one problem taken care of.

The issues I see are

  1. Getting summary into a frame structure
  2. Document using partition() for this purpose.

@elinw
Copy link
Collaborator

elinw commented Jul 21, 2023

Just one update on this which is that openxlsx2 now exists and is more tidyverse focused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants