From 6681fe04dd1d792dff36522ba348f34c2cf63e1b Mon Sep 17 00:00:00 2001 From: Maximilian Frank <48761677+XAM12@users.noreply.github.com> Date: Fri, 29 Nov 2024 00:55:58 +0100 Subject: [PATCH] Update 03-dplyr.Rmd proposed solution to: https://github.com/datacarpentry/r-socialsci/issues/527 moved the line about the advantage of tbl_df to the example where the first tibble based output is created as it is not related to the summarize function. added an explanation on how to identify grouped vs ungrouped tibbles in the output --- episodes/03-dplyr.Rmd | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/episodes/03-dplyr.Rmd b/episodes/03-dplyr.Rmd index bd9394e8..02c448d2 100644 --- a/episodes/03-dplyr.Rmd +++ b/episodes/03-dplyr.Rmd @@ -147,6 +147,10 @@ dataframe to adhere to (e.g. village name is Chirodzo): filter(interviews, village == "Chirodzo") ``` +You may also have noticed that the output from these call doesn't run off the +screen anymore. It's one of the advantages of `tbl_df` (also called tibble), +the central data class in the tidyverse, compared to normal dataframes in R. + We can also specify multiple conditions within the `filter()` function. We can combine conditions using either "and" or "or" statements. In an "and" statement, an observation (row) must meet **every** criteria to be included @@ -365,9 +369,6 @@ interviews %>% summarize(mean_no_membrs = mean(no_membrs)) ``` -You may also have noticed that the output from these calls doesn't run off the -screen anymore. It's one of the advantages of `tbl_df` over dataframe. - You can also group by multiple columns: ```{r, purl=FALSE} @@ -376,7 +377,9 @@ interviews %>% summarize(mean_no_membrs = mean(no_membrs)) ``` -Note that the output is a grouped tibble. To obtain an ungrouped tibble, use the +Note that the output is a grouped tibble of nine rows by three columns +which is indicated by the by two first lines with the `#`. +To obtain an ungrouped tibble, use the `ungroup` function: ```{r, purl=FALSE} @@ -386,6 +389,8 @@ interviews %>% ungroup() ``` +Notice that the second line with the `#` that previously indicated the grouping has +disappeared and we now only have a 9x3-tibble without grouping. When grouping both by `village` and `membr_assoc`, we see rows in our table for respondents who did not specify whether they were a member of an irrigation association. We can exclude those data from our table using a filter step.