`as.list` method for use with `writexl::write_xlsx` #665

jbkunst · 2021-06-02T20:32:21Z

Hi everyone, thank so much for this package.

I want to know your opinion about implement a as.list or similar for the skim_df class.

For example the next function transform a skim_df object into a list so you can export all elements (partition(x)) to a excel file
using writexl::write_xlsx. And have something like this:

 library(skimr)
 
 sk <- skim(head(iris))
 
 as.list.skim_df <- function(x, ...){
   
   tsummary <- as.data.frame(summary(x))
   tsummary <- tibble::as_tibble(tsummary)
   tsummary <- dplyr::select(tsummary, 1, 3)
   tsummary <- setNames(tsummary, c("",""))
   
   tdetails <- skimr::partition(x)
   tdetails <- purrr::map(tdetails, tibble::as_tibble)
   
   out <- c(list(summary = tsummary), tdetails)
   
   out
   
 }
 
 as.list(sk)
#> $summary
#> # A tibble: 9 x 2
#>   ``                           ``          
#>   <fct>                        <fct>       
#> 1 "Name"                       "head(iris)"
#> 2 "Number of rows "            "6"         
#> 3 "Number of columns "         "5"         
#> 4 "_______________________ "   " "         
#> 5 "Column type frequency: "    " "         
#> 6 "  factor"                   "1"         
#> 7 "  numeric"                  "4"         
#> 8 "________________________  " " "         
#> 9 "Group variables"            "None"      
#> 
#> $factor
#> # A tibble: 1 x 6
#>   skim_variable n_missing complete_rate ordered n_unique top_counts            
#>   <chr>             <int>         <dbl> <lgl>      <int> <chr>                 
#> 1 Species               0             1 FALSE          1 set: 6, ver: 0, vir: 0
#> 
#> $numeric
#> # A tibble: 4 x 11
#>   skim_variable n_missing complete_rate  mean     sd    p0   p25   p50   p75
#>   <chr>             <int>         <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Sepal.Length          0             1 4.95  0.288    4.6  4.75  4.95  5.07
#> 2 Sepal.Width           0             1 3.38  0.343    3    3.12  3.35  3.58
#> 3 Petal.Length          0             1 1.45  0.138    1.3  1.4   1.4   1.48
#> 4 Petal.Width           0             1 0.233 0.0816   0.2  0.2   0.2   0.2 
#> # ... with 2 more variables: p100 <dbl>, hist <chr>
 
 # so you can do:
 # writexl::write_xlsx(as.list(sk), path = "/some/path/excel_file.xlsx")

^{Created on 2021-06-02 by the reprex package (v2.0.0)}

The text was updated successfully, but these errors were encountered:

michaelquinn32 · 2021-06-03T17:26:11Z

Thanks for the suggestion!

We handle a lot of this within reshape.R. Have a look:
https://github.com/ropensci/skimr/blob/master/R/reshape.R

We could expand partition with an argument include.summary to do something like that. We would need to be a little careful about the partition/bind round trip behavior, but I think this is reasonable.

@elinw, what do you think?

elinw · 2021-06-27T15:18:59Z

I was thinking about this. I think that if the goal is to add the summary as an element of the list specifically for the purpose of being able to write it to an Excel file I think it's better to start with that idea rather than starting with the solution of putting summary into a list with everything else. We should think about what the general use case for a list of skim objects would be. First, there are other packages that support writing to Excel files (e.g. openxlsx) so we should try to be general enough to support all of them. Second, in terms of data, just one thought, why is it that we would put summary in the same sheet with everything else? Maybe it would make more sense to default to one partition per worksheet and then put the summary on its own sheet. It might make sense even to have two different methods, write.skim_df() and then write.summary_skim_df().

michaelquinn32 · 2021-06-28T20:31:11Z

Yes @elinw I would really like a generic for this too from other packages. Should we

decide which we support
make sure we have generics for this as well

Then we could make changes on our side as well to accommodate this.

elinw · 2021-12-04T05:27:47Z

Because we are talking about lists of skimmed objects ere we should also be conscious of any interaction with purrr #671 and print.

elinw · 2022-01-09T04:30:58Z

This actually works fine for me:

writel::write_xlsx(partition(skim(iris)), path = "irisskim.xls")

I that would work for opensxl::write.xlsx but it expects data frames.

elinw · 2022-01-09T05:16:38Z

Okay actually ... the list is skim_list and not a listbut if you add the "list" as a class openxlsx also works but with warnings.

Warning messages:
1: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded
2: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded

elinw · 2022-01-17T14:58:36Z

@michaelquinn32 Are you fine with adding the "list" class in a second slot of skim_list objects? Do you see any downside?

michaelquinn32 · 2022-01-18T02:10:26Z

FWICT, reading and writing with writexl and readxl is working fine, as long as you partition first.
https://colab.research.google.com/drive/1osM9l78MtsR8af_XONn-utauGtDEzCAm?usp=sharing

I don't think we need to inherit from a list with a skim_list. I believe that it's always implied.
https://stackoverflow.com/questions/19607652/why-doesnt-classdata-frame-show-list-inheritance

iris %>%
  skim() %>%
  partition() %>%
  is.list()
#> TRUE

One change I would like to see, though, is for the updated summaries to be included as a frame in the skim_list.

elinw · 2022-01-18T04:58:51Z

But for openxlsx it isn't working because it expects a list. But I tried adding list to the class and it didn't work as expected (it put the whole skim data frame in both tabs. But coercing to a list works fine, so I think we should simply add documentation.

michaelquinn32 · 2022-01-18T05:40:42Z

We should open a bug with openxlsx on this.

Their check for inherits("list") could be improved
- They could check for more specific types (data frames), before more general types
- They could switch to S3

There also seems to be an issue with the output that they're writing. Instead of preserving the list structure from skimr, they seem to be collapsing everything into a single table. See here:
https://colab.research.google.com/drive/1osM9l78MtsR8af_XONn-utauGtDEzCAm#scrollTo=KDyDfpdCnonk

michaelquinn32 · 2022-01-18T05:51:51Z

It's also worth nothing that list() is not really a class in S3; it's a type. That's why a data frame returns TRUE for is.list(). That's mostly explained by this cryptic message.

Here, we describe the so called “S3” classes (and methods). For “S4” classes (and methods), see
‘Formal classes’ below.

Many R objects have a class attribute, a character vector giving the names of the classes from
which the object inherits. (Functions oldClass and oldClass<- get and set the attribute, which can
also be done directly.)

If the object does not have a class attribute, it has an implicit class, notably "matrix", "array",
"function" or "numeric" or the result of typeof(x) (which is similar to mode(x)), but for type "language"
and mode "call", where the following extra classes exist for the corresponding function calls: if,
while, for, =, <-, (, {, call.

https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html

elinw · 2023-01-02T09:59:52Z

Okay so it's almost a year later. The good think is that I tried openxlsx with partition and it works so that is one problem taken care of.

The issues I see are

Getting summary into a frame structure
Document using partition() for this purpose.

elinw · 2023-07-21T23:54:36Z

Just one update on this which is that openxlsx2 now exists and is more tidyverse focused.

michaelquinn32 mentioned this issue Jan 18, 2022

partition() should have an argument to include the summary as a frame #685

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`as.list` method for use with `writexl::write_xlsx` #665

`as.list` method for use with `writexl::write_xlsx` #665

jbkunst commented Jun 2, 2021 •

edited

Loading

michaelquinn32 commented Jun 3, 2021

elinw commented Jun 27, 2021

michaelquinn32 commented Jun 28, 2021

elinw commented Dec 4, 2021

elinw commented Jan 9, 2022

elinw commented Jan 9, 2022

elinw commented Jan 17, 2022

michaelquinn32 commented Jan 18, 2022

elinw commented Jan 18, 2022

michaelquinn32 commented Jan 18, 2022

michaelquinn32 commented Jan 18, 2022

elinw commented Jan 2, 2023

elinw commented Jul 21, 2023

as.list method for use with writexl::write_xlsx #665

as.list method for use with writexl::write_xlsx #665

Comments

jbkunst commented Jun 2, 2021 • edited Loading

michaelquinn32 commented Jun 3, 2021

elinw commented Jun 27, 2021

michaelquinn32 commented Jun 28, 2021

elinw commented Dec 4, 2021

elinw commented Jan 9, 2022

elinw commented Jan 9, 2022

elinw commented Jan 17, 2022

michaelquinn32 commented Jan 18, 2022

elinw commented Jan 18, 2022

michaelquinn32 commented Jan 18, 2022

michaelquinn32 commented Jan 18, 2022

elinw commented Jan 2, 2023

elinw commented Jul 21, 2023

`as.list` method for use with `writexl::write_xlsx` #665

`as.list` method for use with `writexl::write_xlsx` #665

jbkunst commented Jun 2, 2021 •

edited

Loading