-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split skim_df by group feature for reshape.R #576
Comments
I think this is a good idea. Some comments:
Otherwise, we can go into more details during a PR. @elinw How does this sound to you? |
Hi You might want to also look here https://github.com/elinw/skimrExtra The idea being to be able to create a variable table like that in many articles. I also wonder if looking at the codebook package which uses skimr extensively might be a way to go, keeping things more modular. |
One interesting aspect of this is that it is including a separate summary for each group. I have been thinking a lot about summary and why it is that to make it not show in the print you have to currently explicitly use a print function. This is kind of a broad conundrum in R that piping doesn't really play well with at this moment. I'm thinking that this output could be achieved more simply using our current vocabulary, the simple point being that it is approaching the skim by group problem from a different angle, one that was discussed initially at the #unconf actually and that @gshotwell had a lot of ideas about. At the time the whole model was different, but now that we have the one big data frame I do think we should revisit the option of taking the grouped output and returning it in list format bygroup and then maybe giving it a nice print method instead of just using list print. Please keep in mind that mtcars is a terrible sample data set for skimr because it has only one type of variable. Use something like CO2 instead to really understand what an output would look like. |
Are you definitely committed to having the summary by group? Because since we already support calculating the statistics on groups, playing with this a bit I think it is actually not hard to take the skim object for grouped data and pull out the subtables for the top n grouping levels (and leave any other grouping in place for those). Also remember that we already store the group names as an attribute. |
This
Gets you what you are getting now ... and I think that's a start. We'd need @michaelquinn32 to come up with a good verb for this. Updated with cleaner code. |
@michaelquinn32 and I discussed this and he proposed more of a tidyverse style semantic where the skimmed object is piped to a function that identifies the groups to be used for creating the subtables and then it is pipped to a variation of Then maybe we can make a convenience function that combines all the steps. |
I like the idea of including it in |
Wait group names as columns? Can you show what you mean? Just make an example table/rough sketch. I think I'm pretty far along in thinking about how to do one model of this that is more like partition (still rows but broken out by groups. |
Yes, sorry I was unclear! I was trying to end up with a table like this and figured the easiest way would be to split by group into a list and then lapply to make each column. So we are thinking of the same idea (still rows but broken out by groups), but I would then do an extra step to get a table like below |
Way back when we started skimr we played around with this a lot, but because skimr uses data frames and each data frame must be a single data type it is quite complex, which is why there are so many packages for making nice tables. I think what we could to is concentrate on producing data that would be processed by those. |
Often times researchers want to create summary statistics for different groups. In tables, each column tends to be a group and each row the summary stat for that group/var. It would be nice if skimr made it easier to reshape the stats for each group into a list.
I had an initial go at this function, but am not 100% confident I'm not missing something. Here's a reprex with the split_by_group function:
`
library(skimr)
library(tidyverse)
`
This outputs:
`
$"4 cyl - 3 gears"
── Data Summary ────────────────────────
Values
Name Piped data
Number of rows 32
Number of columns 11
_______________________
Column type frequency:
numeric 9
________________________
Group variables None
`
The text was updated successfully, but these errors were encountered: