diff --git a/DESCRIPTION b/DESCRIPTION
index 2ac79eff..6728d526 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,6 +1,6 @@
Package: Tplyr
Title: A Traceability Focused Grammar of Clinical Data Summary
-Version: 1.0.1
+Version: 1.0.2
Authors@R:
c(
person(given = "Eli",
diff --git a/NEWS.md b/NEWS.md
index 5dbe741a..bbf0ef94 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,3 +1,7 @@
+# Tplyr 1.0.2
+- Bug fixes
+ - Resolve issue with `where` logic when using population data.
+
# Tplyr 1.0.1
- Bug fixes
- Resolve issue where `modify_nested_call()` fails if Tplyr is not loaded (#95)
diff --git a/R/count.R b/R/count.R
index 1368d81f..3bb0063f 100644
--- a/R/count.R
+++ b/R/count.R
@@ -624,9 +624,15 @@ process_count_denoms <- function(x) {
abort("A value(s) were set with 'denom_ignore' but no missing count was set. Your percentages/totals may not have meaning.")
}
- # Logic to determine how to subset target for denominators
+ # Logic to determine how to subset target for denominators.
if (is.null(denom_where)) {
- denom_where <- where
+ # If a pop_data was passed change the denom_where to the pop_data_where
+ if (!isTRUE(try(identical(pop_data, target)))) {
+ denom_where <- quo(TRUE)
+ } else {
+ # Otherwise make denom_where equal to table where
+ denom_where <- where
+ }
}
# Because the missing strings haven't replaced the missing strings, it has to happen here.
@@ -653,7 +659,9 @@ process_count_denoms <- function(x) {
# population dataset. Trigger this by identifying that
# the population dataset was overridden
if (!isTRUE(try(identical(pop_data, target)))) {
- if (deparse(denom_where) != deparse(where)) {
+ # If the denom_where doesn't match the where AND the denom_where isn't true
+ # than the user passed a custom denom_where
+ if (deparse(denom_where) != deparse(where) && !isTRUE(quo_get_expr(denom_where))) {
warning(paste0("A `denom_where` has been set with a pop_data. The `denom_where` has been ignored.",
"You should use `set_pop_where` instead of `set_denom_where`.", sep = "\n"),
immediate. = TRUE)
diff --git a/R/format.R b/R/format.R
index f612521b..92308000 100644
--- a/R/format.R
+++ b/R/format.R
@@ -80,7 +80,7 @@
#' - `mean`
#' - `sd`
#' - `median`
-#' - `variance`
+#' - `var`
#' - `min`
#' - `max`
#' - `iqr`
diff --git a/R/table.R b/R/table.R
index 5ab8a5ee..a2441b3a 100644
--- a/R/table.R
+++ b/R/table.R
@@ -25,7 +25,7 @@
#' individual parameters catered to your analysis. For example, to add a total group, you can use the
#' \code{\link{add_total_group}}.
#'
-#' In future releases, we will provide vigenttes to fully demonstrate these capabilities.
+#' In future releases, we will provide vignettes to fully demonstrate these capabilities.
#'
#' @param target Dataset upon which summaries will be performed
#' @param treat_var Variable containing treatment group assignments. Supply unquoted.
diff --git a/README.Rmd b/README.Rmd
index f4863985..5f1c058c 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -21,7 +21,7 @@ load("vignettes/adae.Rdata")
load("vignettes/adsl.Rdata")
```
-# Tplyr
+# *Tplyr*
[](https://pharmaverse.org)
@@ -32,7 +32,7 @@ load("vignettes/adsl.Rdata")
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
-Welcome to Tplyr! Tplyr is a traceability minded grammar of data format and summary. It's designed to simplify the creation of common clinical summaries and help you focus on how you present your data rather than redundant summaries being performed. Furthermore, for every result Tplyr produces, it also produces the metadata necessary to give your traceability from source to summary.
+Welcome to **Tplyr**! **Tplyr** is a traceability minded grammar of data format and summary. It's designed to simplify the creation of common clinical summaries and help you focus on how you present your data rather than redundant summaries being performed. Furthermore, for every result **Tplyr** produces, it also produces the metadata necessary to give your traceability from source to summary.
As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue through GitHub right [here](https://github.com/atorus-research/Tplyr/issues).
@@ -40,7 +40,7 @@ Take a look at the [cheatsheet!](https://atorus-research.github.io/Tplyr_cheatsh
# Installation
-You can Tplyr install with:
+You can install **Tplyr** with:
```{r install, eval=FALSE}
# Install from CRAN:
@@ -50,11 +50,11 @@ install.packages("Tplyr")
devtools::install_github("https://github.com/atorus-research/Tplyr.git", ref="devel")
```
-# What is Tplyr?
+# What is *Tplyr*?
-[dplyr](https://dplyr.tidyverse.org/) from tidyverse is a grammar of data manipulation. So what does that allow you to do? It gives you, as a data analyst, the capability to easily and intuitively approach the problem of manipulating your data into an analysis ready form. `dplyr` conceptually breaks things down into verbs that allow you to focus on _what_ you want to do more than _how_ you have to do it.
+[dplyr](https://dplyr.tidyverse.org/) from tidyverse is a grammar of data manipulation. So what does that allow you to do? It gives you, as a data analyst, the capability to easily and intuitively approach the problem of manipulating your data into an analysis ready form. [dplyr](https://dplyr.tidyverse.org/) conceptually breaks things down into verbs that allow you to focus on _what_ you want to do more than _how_ you have to do it.
-`Tplyr` is designed around a similar concept, but its focus is on building summary tables common within the clinical world. In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories:
+**Tplyr** is designed around a similar concept, but its focus is on building summary tables common within the clinical world. In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories:
- Counting for event based variables or categories
- Shifting, which is just counting a change in state with a 'from' and a 'to'
@@ -74,7 +74,7 @@ When you look at this table, you can begin breaking this output down into smalle
- Weight - and we're back to descriptive statistics.
So we have one table, with 6 summaries (7 including the next page, not shown) - but only 2 different approaches to summaries being performed.
-In the same way that `dplyr` is a grammar of data manipulation, `Tplyr` aims to be a grammar of data summary. The goal of `Tplyr` is to allow you to program a summary table like you see it on the page, by breaking a larger problem into smaller 'layers', and combining them together like you see on the page.
+In the same way that [dplyr](https://dplyr.tidyverse.org/) is a grammar of data manipulation, **Tplyr** aims to be a grammar of data summary. The goal of **Tplyr** is to allow you to program a summary table like you see it on the page, by breaking a larger problem into smaller 'layers', and combining them together like you see on the page.
Enough talking - let's see some code. In these examples, we will be using data from the [PHUSE Test Data Factory]( https://advance.phuse.global/display/WEL/Test+Dataset+Factory) based on the [original pilot project submission package](https://github.com/atorus-research/CDISC_pilot_replication). Note: You can see our replication of the CDISC pilot using the PHUSE Test Data Factory data [here](https://github.com/atorus-research/CDISC_pilot_replication).
@@ -92,15 +92,15 @@ tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
```
-## 'Tplyr' is Qualified
+## *Tplyr* is Qualified
-We understand how important documentation and testing is within the pharmaceutical world. This is why outside of unit testing 'Tplyr includes an entire user-acceptance testing document, where requirements were established, test-cases were written, and tests were independently programmed and executed. We do this in the hope that you can leverage our work within a qualified programming environment, and that we save you a substantial amount of trouble in getting it there.
+We understand how important documentation and testing is within the pharmaceutical world. This is why outside of unit testing **Tplyr** includes an entire user-acceptance testing document, where requirements were established, test-cases were written, and tests were independently programmed and executed. We do this in the hope that you can leverage our work within a qualified programming environment, and that we save you a substantial amount of trouble in getting it there.
You can find the qualification document within this repository right [here](https://github.com/atorus-research/Tplyr/blob/master/uat/references/output/uat.pdf). The 'uat' folder additionally contains all of the raw files, programmatic tests, specifications, and test cases necessary to create this report.
## The TL;DR
-Here are some of the high level benefits of using `Tplyr`:
+Here are some of the high level benefits of using **Tplyr**:
- Easy construction of table data using an intuitive syntax
- Smart string formatting for your numbers that's easily specified by the user
@@ -108,7 +108,7 @@ Here are some of the high level benefits of using `Tplyr`:
# Where to go from here?
-There's quite a bit more to learn! And we've prepared a number of other vignettes to help you get what you need out of 'Tplyr'.
+There's quite a bit more to learn! And we've prepared a number of other vignettes to help you get what you need out of **Tplyr**.
- The best place to start is with our Getting Started vignette at `vignette("Tplyr")`
- Learn more about table level settings in `vignette("table")`
@@ -117,18 +117,18 @@ There's quite a bit more to learn! And we've prepared a number of other vignette
- Learn more about shift layers in `vignette("shift")`
- Learn more about percentages in `vignette("denom")`
- Learn more about calculating risk differences in `vignette("riskdiff")`
-- Learn more about sorting 'Tplyr' tables in `vignette("sort")`
-- Learn more about using 'Tplyr' options in `vignette("options")`
-- And finally, learn more about producing and outputting styled tables using 'Tplyr' in `vignette("styled-table")`
+- Learn more about sorting **Tplyr** tables in `vignette("sort")`
+- Learn more about using **Tplyr** options in `vignette("options")`
+- And finally, learn more about producing and outputting styled tables using **Tplyr** in `vignette("styled-table")`
-In the Tplyr version 1.0.0, we've packed a number of new features in. For deeper dives on the largest new additions:
+In the **Tplyr** version 1.0.0, we've packed a number of new features in. For deeper dives on the largest new additions:
-- Learn about Tplyr's traceability metadata in `vignette("metadata")` and about how it can be extended in `vigentte("custom-metadata")`
+- Learn about **Tplyr**'s traceability metadata in `vignette("metadata")` and about how it can be extended in `vignette("custom-metadata")`
- Learn about layer templates in `vignette("layer_templates")`
# References
-In building 'Tplyr', we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.
+In building **Tplyr**, we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.
[Analysis and Displays Associated with Adverse Events](https://phuse.s3.eu-central-1.amazonaws.com/Deliverables/Standard+Analyses+and+Code+Sharing/Analyses+and+Displays+Associated+with+Adverse+Events+Focus+on+Adverse+Events+in+Phase+2-4+Clinical+Trials+and+Integrated+Summary.pdf)
diff --git a/README.md b/README.md
index 318bc427..98d0c81a 100644
--- a/README.md
+++ b/README.md
@@ -15,12 +15,12 @@ status](https://github.com/atorus-research/tplyr/workflows/R-CMD-check/badge.svg
stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
-Welcome to Tplyr! Tplyr is a traceability minded grammar of data format
-and summary. It’s designed to simplify the creation of common clinical
-summaries and help you focus on how you present your data rather than
-redundant summaries being performed. Furthermore, for every result Tplyr
-produces, it also produces the metadata necessary to give your
-traceability from source to summary.
+Welcome to **Tplyr**! **Tplyr** is a traceability minded grammar of data
+format and summary. It’s designed to simplify the creation of common
+clinical summaries and help you focus on how you present your data
+rather than redundant summaries being performed. Furthermore, for every
+result **Tplyr** produces, it also produces the metadata necessary to
+give your traceability from source to summary.
As always, we welcome your feedback. If you spot a bug, would like to
see a new feature, or if any documentation is unclear - submit an issue
@@ -32,7 +32,7 @@ Take a look at the
# Installation
-You can Tplyr install with:
+You can install **Tplyr** with:
``` r
# Install from CRAN:
@@ -47,11 +47,12 @@ devtools::install_github("https://github.com/atorus-research/Tplyr.git", ref="de
[dplyr](https://dplyr.tidyverse.org/) from tidyverse is a grammar of
data manipulation. So what does that allow you to do? It gives you, as a
data analyst, the capability to easily and intuitively approach the
-problem of manipulating your data into an analysis ready form. `dplyr`
-conceptually breaks things down into verbs that allow you to focus on
-*what* you want to do more than *how* you have to do it.
+problem of manipulating your data into an analysis ready form.
+[dplyr](https://dplyr.tidyverse.org/) conceptually breaks things down
+into verbs that allow you to focus on *what* you want to do more than
+*how* you have to do it.
-`Tplyr` is designed around a similar concept, but its focus is on
+**Tplyr** is designed around a similar concept, but its focus is on
building summary tables common within the clinical world. In the
pharmaceutical industry, a great deal of the data presented in the
outputs we create are very similar. For the most part, most of these
@@ -92,11 +93,11 @@ conceptualize this.
So we have one table, with 6 summaries (7 including the next page, not
shown) - but only 2 different approaches to summaries being performed.
-In the same way that `dplyr` is a grammar of data manipulation, `Tplyr`
-aims to be a grammar of data summary. The goal of `Tplyr` is to allow
-you to program a summary table like you see it on the page, by breaking
-a larger problem into smaller ‘layers’, and combining them together like
-you see on the page.
+In the same way that [dplyr](https://dplyr.tidyverse.org/) is a grammar
+of data manipulation, **Tplyr** aims to be a grammar of data summary.
+The goal of **Tplyr** is to allow you to program a summary table like
+you see it on the page, by breaking a larger problem into smaller
+‘layers’, and combining them together like you see on the page.
Enough talking - let’s see some code. In these examples, we will be
using data from the [PHUSE Test Data
@@ -132,10 +133,10 @@ tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
| Age Categories n (%) | \>80 | 30 ( 34.9%) | 18 ( 21.4%) | 29 ( 34.5%) | 2 | 1 | 2 |
| Age Categories n (%) | 65-80 | 42 ( 48.8%) | 55 ( 65.5%) | 47 ( 56.0%) | 2 | 1 | 3 |
-## ‘Tplyr’ is Qualified
+## Tplyr is Qualified
We understand how important documentation and testing is within the
-pharmaceutical world. This is why outside of unit testing ’Tplyr
+pharmaceutical world. This is why outside of unit testing **Tplyr**
includes an entire user-acceptance testing document, where requirements
were established, test-cases were written, and tests were independently
programmed and executed. We do this in the hope that you can leverage
@@ -150,7 +151,7 @@ this report.
## The TL;DR
-Here are some of the high level benefits of using `Tplyr`:
+Here are some of the high level benefits of using **Tplyr**:
- Easy construction of table data using an intuitive syntax
- Smart string formatting for your numbers that’s easily specified by
@@ -161,7 +162,7 @@ Here are some of the high level benefits of using `Tplyr`:
# Where to go from here?
There’s quite a bit more to learn! And we’ve prepared a number of other
-vignettes to help you get what you need out of ‘Tplyr’.
+vignettes to help you get what you need out of **Tplyr**.
- The best place to start is with our Getting Started vignette at
`vignette("Tplyr")`
@@ -172,25 +173,26 @@ vignettes to help you get what you need out of ‘Tplyr’.
- Learn more about percentages in `vignette("denom")`
- Learn more about calculating risk differences in
`vignette("riskdiff")`
-- Learn more about sorting ‘Tplyr’ tables in `vignette("sort")`
-- Learn more about using ‘Tplyr’ options in `vignette("options")`
+- Learn more about sorting **Tplyr** tables in `vignette("sort")`
+- Learn more about using **Tplyr** options in `vignette("options")`
- And finally, learn more about producing and outputting styled tables
- using ‘Tplyr’ in `vignette("styled-table")`
+ using **Tplyr** in `vignette("styled-table")`
-In the Tplyr version 1.0.0, we’ve packed a number of new features in.
-For deeper dives on the largest new additions:
+In the **Tplyr** version 1.0.0, we’ve packed a number of new features
+in. For deeper dives on the largest new additions:
-- Learn about Tplyr’s traceability metadata in `vignette("metadata")`
- and about how it can be extended in `vigentte("custom-metadata")`
+- Learn about **Tplyr’s** traceability metadata in
+ `vignette("metadata")` and about how it can be extended in
+ `vignette("custom-metadata")`
- Learn about layer templates in `vignette("layer_templates")`
# References
-In building ‘Tplyr’, we needed some additional resources in addition to
-our personal experience to help guide design. PHUSE has done some great
-work to create guidance for standard outputs with collaboration between
-multiple pharmaceutical companies and the FDA. You can find some of the
-resource that we referenced below.
+In building **Tplyr**, we needed some additional resources in addition
+to our personal experience to help guide design. PHUSE has done some
+great work to create guidance for standard outputs with collaboration
+between multiple pharmaceutical companies and the FDA. You can find some
+of the resource that we referenced below.
[Analysis and Displays Associated with Adverse
Events](https://phuse.s3.eu-central-1.amazonaws.com/Deliverables/Standard+Analyses+and+Code+Sharing/Analyses+and+Displays+Associated+with+Adverse+Events+Focus+on+Adverse+Events+in+Phase+2-4+Clinical+Trials+and+Integrated+Summary.pdf)
diff --git a/cran-comments.md b/cran-comments.md
index 4ee9d510..00720ad0 100644
--- a/cran-comments.md
+++ b/cran-comments.md
@@ -1,13 +1,12 @@
-## Submission 0.4.4
-* Functionality update for sorting nested count layers
-* Updates for changes to rlang
+## Submission 1.0.2
+* Bug fix identified in Tplyr 1.0.1
## Test Environments
* Local Ubuntu 18.04.4 devtools::check
* Latest Ubuntu CI with latest tidyverse
* Github release action with windows, linux, and osx check
-
+* RHub Check
## R CMD CHECK Results
No Errors, warnings, or notes
diff --git a/docs/404.html b/docs/404.html
index c505ccf7..7c82bc7d 100644
--- a/docs/404.html
+++ b/docs/404.html
@@ -13,8 +13,8 @@
-
-
+
+
@@ -41,7 +41,7 @@
Tplyr
- 1.0.0
+ 1.0.1.9000
+
Tplyr allows you to focus on these distinct counts
+and distinct percents within some grouping variable, like subject.
+Additionally, you can mix and match with the distinct counts with
+non-distinct counts in the same row too. The
+set_distinct_by() function sets the variables used to
+calculate the distinct occurrences of some value using the specified
+distinct_by variables.
By using set_nest_count(), this triggers ‘Tplyr’ to drop
-row_label1, and indent all of the AEDECOD values within row_label2. The
-columns are renamed appropriately as well. The default indentation used
-will be 3 spaces, but as you can see here - you can set the indentation
-however you like. This let’s you use tab strings for different
-language-specific output types, stick with spaces, indent wider or
-smaller - whatever you wish. All of the existing order variables remain,
-so this has no impact on your ability to sort the table.
+
By using set_nest_count(), this triggers
+Tplyr to drop row_label1, and indent all of the AEDECOD
+values within row_label2. The columns are renamed appropriately as well.
+The default indentation used will be 3 spaces, but as you can see here -
+you can set the indentation however you like. This let’s you use tab
+strings for different language-specific output types, stick with spaces,
+indent wider or smaller - whatever you wish. All of the existing order
+variables remain, so this has no impact on your ability to sort the
+table.
There’s a lot more to counting! So be sure to check out our vignettes
on sorting, shift tables, and denominators.
As covered in
+vignette('metadata'),Tplyr can produce
metadata for any result that it calculates. But what about data that
-Tplyr can’t produce, such as a efficacy results or some sort of custom
-analysis? You may still want that drill down capability either on your
-own or paired with an existing Tplyr table.
+Tplyr can’t produce, such as a efficacy results or some
+sort of custom analysis? You may still want that drill down capability
+either on your own or paired with an existing Tplyr
+table.
Take for instance Table 14-3.01 from the CDISC
Pilot. Skipping the actual construction of the table, here’s the
-output data from Tplyr and some manual calculation:
+output data from Tplyr and some manual calculation:
kable(full_data)
@@ -317,16 +319,18 @@
Metadata
This is the primary efficacy table from the trial. The top portion of
-this table is fairly straightforward with Tplyr and can be done using
-descriptive statistic layers. Once you hit the p-values on the lower
-house, this becomes beyond Tplyr’s remit. To produce the table, you can
-combine Tplyr output with a separate data frame analyzed and formatted
-yourself (but note you can still use some help from Tplyr tools like
+this table is fairly straightforward with Tplyr and can
+be done using descriptive statistic layers. Once you hit the p-values on
+the lower house, this becomes beyond Tplyr’s remit. To produce the
+table, you can combine Tplyr output with a separate
+data frame analyzed and formatted yourself (but note you can still use
+some help from Tplyr tools like
apply_formats()).
But what about the metadata? How do you get the drill down
capabilities for that lower half of the table? We’ve provided a couple
-additional tools in Tplyr to allow you to construct your own metadata
-and append existing metadata present in a Tplyr table.
+additional tools in Tplyr to allow you to construct
+your own metadata and append existing metadata present in a
+Tplyr table.
The tplyr_meta() function can take these fields
immediately upon creation. If you need to dynamically create a
-tplyr_meta object such as how Tplyr constructs the objects
-internally), the functions add_variables() and
-add_filters() are available to extend an existing
-tplyr_meta object:
+tplyr_meta object such as how Tplyr
+constructs the objects internally), the functions
+add_variables() and add_filters() are
+available to extend an existing tplyr_meta object:
Now that we’ve created our custom extension of the Tplyr metadata,
-let’s extend the existing data frame. To do this, Tplyr has the function
+
Now that we’ve created our custom extension of the
+Tplyr metadata, let’s extend the existing data frame.
+To do this, Tplyr has the function
append_metadata():
You very well may have a scenario where you want to use these
-metadata functions outside of Tplyr in general. As such, there are S3
-methods available to query metadata from a dataframe instead of a Tplyr
-table, and parameters to provide your own target data frame:
+metadata functions outside of Tplyr in general. As
+such, there are S3 methods available to query metadata from a dataframe
+instead of a Tplyr table, and parameters to provide
+your own target data frame:
get_meta_subset(eff_meta, 'x4_1', "var1_Xanomeline High Dose", target=adas)%>%head()%>%
@@ -718,8 +727,8 @@
Shift layersset_denoms_by() isn’t specified.
+treatment. This is the default, and this is how Tplyr
+will create the denominators if set_denoms_by() isn’t
+specified.
In the next example, the percentage denominators are calculated
row-wise, each row percentage sums to 100%.
Tplyroffers you the ability to specifically control
+the filter used within the denominator. This is provided through the
+function set_denom_where(). The default for
set_denom_where() is the layer level where
parameter, if one was supplied. set_denom_where() allows
you to replace this layer level filter with a custom filter of your
@@ -1668,33 +1670,33 @@
In addition to missing counts, some summaries require the addition of
-a ‘Total’ row. ‘Tplyr’ has the helper function
+a ‘Total’ row. Tplyr has the helper function
add_total_row() to ease this process for you. Like most
-other things within ‘Tplyr’ - particularly in this vignette - this too
-has a significant bit of nuance to it.
+other things within Tplyr - particularly in this
+vignette - this too has a significant bit of nuance to it.
Much of this functionality is similar to
set_missing_count(). You’re able to specify a different
format for the total, but if not specified, the associated count layer’s
@@ -1726,9 +1728,9 @@
Adding a ‘Total’ Row
By default, add_total_row()will count missing
values, but you can exclude those values using the
-count_missings parameter. ‘Tplyr’ will warn you when
-set_count_missing() has denom_ignore set to
-TRUE, add_total_row() has
+count_missings parameter. Tplyr will warn
+you when set_count_missing() has denom_ignore
+set to TRUE, add_total_row() has
count_missings set to TRUE and the format
contains a percentage. Why? Because if the denominator is ignoring
missing values but you’re still counting them in your total, the
@@ -1750,7 +1752,8 @@
+searching for how to create a total column.Tplyr allows
+you to do this as well with the function add_total_group()
+and read more in vignette("table").
And that’s it for denominators! Happy counting!
+decimal and integer precision - Tplyr can automatically
+determine this for you as well, but more on that later.
After the x’s there are unquoted variable names. This is where you
specify the actual summaries that will be performed. Notice that some
f_str() calls have two summaries specified. This allows you
@@ -251,14 +251,15 @@
Metadata
line.
But where do these summary names come from? And which ones does
-‘Tplyr’ have?
+Tplyr have?
Built-in Summaries
-
We’ve built a number of default summaries into ‘Tplyr’, which allows
-you to perform these summaries without having to specify the functions
-to calculate them yourself. The summaries built in to ‘Tplyr’ are listed
-below. In the second column are the names that you would use within an
+
We’ve built a number of default summaries into
+Tplyr, which allows you to perform these summaries
+without having to specify the functions to calculate them yourself. The
+summaries built in to Tplyr are listed below. In the
+second column are the names that you would use within an
f_str() call to use them. In the third column, we have the
syntax used to make the function call.
Note that the table code used to produce the output is the same. Now
-‘Tplyr’ used the custom summary function for mean as
-specified in the tplyr.custom_summaries option. Also note
-the use of rlang::quos(). We’ve done our best to mask this
-from the user everywhere possible and make the interfaces clean and
-intuitive, but a great deal of ‘Tplyr’ is built using ‘rlang’ and
+Tplyr used the custom summary function for
+mean as specified in the
+tplyr.custom_summaries option. Also note the use of
+rlang::quos(). We’ve done our best to mask this from the
+user everywhere possible and make the interfaces clean and intuitive,
+but a great deal of Tplyr is built using ‘rlang’ and
non-standard evaluation. Within this option is one of the very few
instances where a user needs to concern themselves with the use of
quosures. If you’d like to learn more about non-standard evaluation and
@@ -609,14 +611,14 @@
object
-using the empty parameter.
+var1_Placebo missing. Tplyr gives you
+control over how you fill this space. Let’s say that we wanted instead
+to make that space say “Missing”. You can control this with the
+f_str() object using the empty parameter.
Tplyr has a bit of a unique design, which might feel a bit weird as
-you get used to the package. The process flow of building a
-tplyr_table() object first, and then using
+
Tplyr has a bit of a unique design, which might feel
+a bit weird as you get used to the package. The process flow of building
+a tplyr_table() object first, and then using
build() to construct the data frame is different than
programming in the tidyverse, or creating a ggplot. Why create the
tplyr_table() object first? Why is the
tplyr_table() object different than the resulting data
frame?
-
The purpose of the tplyr_table() object is to let Tplyr
-do more than just summarize data. As you build the table, all of the
-metadata around the table being built is maintained - the target
-variables being summarized, the grouped variables by row and column, the
-filter conditions necessary applied to the table and each layer. As a
-user, you provide this information to create the summary. But what about
-after the results are produced? Summarizing data inevitably leads to new
-questions. Within clinical summaries, you may want to know which
-subjects experienced an adverse event, or why the lab summaries of a
-particular visit’s descriptive statistics are abnormal. Normally, you’d
-write a query to recreate the data that lead to that particular summary.
-Tplyr now allows you to immediately extract the input data or metadata
-that created an output result, thus providing traceability from the
-result back to the source.
+
The purpose of the tplyr_table() object is to let
+Tplyr do more than just summarize data. As you build
+the table, all of the metadata around the table being built is
+maintained - the target variables being summarized, the grouped
+variables by row and column, the filter conditions necessary applied to
+the table and each layer. As a user, you provide this information to
+create the summary. But what about after the results are produced?
+Summarizing data inevitably leads to new questions. Within clinical
+summaries, you may want to know which subjects experienced an adverse
+event, or why the lab summaries of a particular visit’s descriptive
+statistics are abnormal. Normally, you’d write a query to recreate the
+data that lead to that particular summary. Tplyr now
+allows you to immediately extract the input data or metadata that
+created an output result, thus providing traceability from the result
+back to the source.
Generating the Metadata
@@ -261,9 +262,9 @@
Generating the Metadata
To trigger the creation of metadata, the build()
function has a new argument metadata. By specifying
-TRUE, the underlying metadata within Tplyr are prepared in
-an extractable format. This is the only action a user needs to specify
-for this action to take place.
+TRUE, the underlying metadata within Tplyr
+are prepared in an extractable format. This is the only action a user
+needs to specify for this action to take place.
When the metadata argument is used, a new column will be
produced in the output dataframe called row_id. The
row_id variable provides a persistent reference to a row of
@@ -346,13 +347,13 @@
Extracting The Input Sourcetplyr_table() to the variable USUBJID. This is
-because get_meta_subset() has an additional argument
-add_cols that allows you to specify additional columns you
-want included in the resulting dataframe, and has a default of USUBJID.
-So let’s say we want additionally include the variable
-SEX.
+default, even though Tplyr there’s no reference
+anywhere in the tplyr_table() to the variable
+USUBJID. This is because get_meta_subset() has
+an additional argument add_cols that allows you to specify
+additional columns you want included in the resulting dataframe, and has
+a default of USUBJID. So let’s say we want additionally include the
+variable SEX.
Extracting a Result Cell’s Metadat
#> TRT01P, EFFFL, SAFFL, AGE #> Filters:#> TRT01P == c("Xanomeline High Dose"), EFFFL == "Y", SAFFL == "Y"
-
The resulting output is a new object Tplyr called
+
The resulting output is a new object Tplyr called
tplyr_meta(). This is a container of a relevent metadata
for a specific result. The object itself is a list with two elements:
names and filters.
One thing that we wanted to build into ‘Tplyr’ to make it a user
-friendly package is the ability to eliminate redundant code where
-possible. This is why there are several options available in ‘Tplyr’ to
-allow you to control things at your session level instead of each
-individual table, or even each individual layer.
-
The following are the options available in ‘Tplyr’ and their
-descriptions:
+
One thing that we wanted to build into Tplyr to make
+it a user friendly package is the ability to eliminate redundant code
+where possible. This is why there are several options available in
+Tplyr to allow you to control things at your session
+level instead of each individual table, or even each individual
+layer.
+
The following are the options available in Tplyr and
+their descriptions:
@@ -185,20 +186,20 @@
Metadata
Each of these options allows you to set these settings in one place,
-and every ‘Tplyr’ table you create will inherit your option settings as
-defaults. This allows your table code to be more concise, and
-centralizes where you need to make an update when your code has to be
-adjusted. This vignette is dedicated to helping you understand how to
-leverage each of these options properly.
+and every Tplyr table you create will inherit your
+option settings as defaults. This allows your table code to be more
+concise, and centralizes where you need to make an update when your code
+has to be adjusted. This vignette is dedicated to helping you understand
+how to leverage each of these options properly.
Default Layer Formats
Declaring string formats and summaries that need to be performed is
-one of the more verbose parts of ‘Tplyr’. Furthermore, this is something
-that will often be fairly consistent within a study, as you’ll likely
-want to look across a consistent set of descriptive statistics, or your
-count/shift tables will likely require the same sort of “n (%)”
-formatting.
+one of the more verbose parts of Tplyr. Furthermore,
+this is something that will often be fairly consistent within a study,
+as you’ll likely want to look across a consistent set of descriptive
+statistics, or your count/shift tables will likely require the same sort
+of “n (%)” formatting.
Using the format options is very similar to setting a string format.
The only difference is that you need to enter the string formats as a
named list instead of as separate parameters to a function call.
Here you can see that ‘Tplyr’ picks up these option changes. In the
-table below, we didn’t use set_format_strings() anywhere -
-instead we let ‘Tplyr’ pick up the default formats from the options.
+
Here you can see that Tplyr picks up these option
+changes. In the table below, we didn’t use
+set_format_strings() anywhere - instead we let
+Tplyr pick up the default formats from the options.
Setting formats at the tplyr_table() level will
-override the ‘Tplyr’ options and extends to any layer of the specified
-type in the current table.
+override the Tplyr options and extends to any layer of
+the specified type in the current table.
Setting formats at the tplyr_layer() level will always
-be prioritized over ‘Tplyr’ options and any tplyr_table()
-formats set. This has the narrowest scope and will always be used when
-specified.
+be prioritized over Tplyr options and any
+tplyr_table() formats set. This has the narrowest scope and
+will always be used when specified.
-
To demonstrate, consider the following. The ‘Tplyr’ options remain
-set from the block above.
+
To demonstrate, consider the following. The Tplyr
+options remain set from the block above.
Each of the outputs ignores the ‘Tpylr’ option defaults.
+
Each of the outputs ignores the Tplyr option
+defaults.
Precision Cap
-
‘Tplyr’ defaults to avoiding capping precision. We do this because
-capping precision should be a conscious decision, where you as the user
-specifically set the limit of how many decimal places are relevant to a
-specific result. One way to cap precision is by using the
+
Tplyr defaults to avoiding capping precision. We do
+this because capping precision should be a conscious decision, where you
+as the user specifically set the limit of how many decimal places are
+relevant to a specific result. One way to cap precision is by using the
cap parameter within set_format_strings(). But
perhaps you have a specific limit to what you’d like to see on any
output. Here, we offer the tplyr.precision_cap option to
@@ -567,37 +571,39 @@
Custom summaries allow you to extend the capabilities of descriptive
-statistics layers in ‘Tplyr’. Maybe our defaults don’t work how you’d
-like them to, or maybe you have some custom functions within your
-organization that you commonly would like to use. Specifying the custom
-summaries you wish to use in every table would prove quite tedious -
-therefore, the tplyr.custom_summaries option is a better
-choice.
+statistics layers in Tplyr. Maybe our defaults don’t
+work how you’d like them to, or maybe you have some custom functions
+within your organization that you commonly would like to use. Specifying
+the custom summaries you wish to use in every table would prove quite
+tedious - therefore, the tplyr.custom_summaries option is a
+better choice.
Note that the table code used to produce the output is the same. Now
-‘Tplyr’ used the custom summary function for geometric_mean
-as specified in the tplyr.custom_summaries option. Also
-note the use of rlang::quos(). We’ve done our best to mask
-this from the user everywhere possible and make the interfaces clean and
-intuitive, but a great deal of ‘Tplyr’ is built using ‘rlang’ and
+Tplyr used the custom summary function for
+geometric_mean as specified in the
+tplyr.custom_summaries option. Also note the use of
+rlang::quos(). We’ve done our best to mask this from the
+user everywhere possible and make the interfaces clean and intuitive,
+but a great deal of Tplyr is built using ‘rlang’ and
non-standard evaluation. Within this option is one of the very few
instances where a user needs to concern themselves with the use of
quosures. If you’d like to learn more about non-standard evaluation and
quosures, we recommend Section IV in
Advanced R.
-
Now that geometric mean is set within the ‘Tplyr’ options, you can
-use it within your descriptive statistics layers, just like it was one
-of the built-in summaries.
+
Now that geometric mean is set within the Tplyr
+options, you can use it within your descriptive statistics layers, just
+like it was one of the built-in summaries.
Scientific Notationoptions(scipen =-1)# Only require 3 decimal places.001#> [1] 1e-03
-
In ‘Tplyr’, we have the option tplyr.scipen. This is the
-scipen setting that will be used only while the
-‘Tplyr’ table is being built. This allows you to use a different
-scipen setting within ‘Tplyr’ than your R session. The
-default value we use in ‘Tplyr’ is 9999, which is intended to totally
-prevent numbers from switching to scientific notation. We want this to
-be a conscious decision that you make in order to prevent any unexpected
-outputs.
+
In Tplyr, we have the option
+tplyr.scipen. This is the scipen setting that
+will be used only while the Tplyr table is
+being built. This allows you to use a different scipen
+setting within Tplyr than your R session. The default
+value we use in Tplyr is 9999, which is intended to
+totally prevent numbers from switching to scientific notation. We want
+this to be a conscious decision that you make in order to prevent any
+unexpected outputs.
In certain cases users may want to match tables produced by other
-languages that IBM rounding. Tplyr offers the option ‘tplyr.IBMRounding’
-to change the default rounding behavior of Tplyr tables. Review var1_4
-in the tables below.
+languages that IBM rounding.Tplyr offers the option
+‘tplyr.IBMRounding’ to change the default rounding behavior of
+Tplyr tables. Review var1_4 in the tables below.
‘Tplyr’ does not support, nor do we intend to support, a wide array
-of statistical methods. Our goal is rather to take your focus as an
-analyst off the mundane summaries so you can focus on the interesting
-analysis. That said, there are some things that are common enough that
-we feel that it’s reasonable for us to include. So let’s take a look at
-risk difference.
+
Tplyr does not support, nor do we intend to support,
+a wide array of statistical methods. Our goal is rather to take your
+focus as an analyst off the mundane summaries so you can focus on the
+interesting analysis. That said, there are some things that are common
+enough that we feel that it’s reasonable for us to include. So let’s
+take a look at risk difference.
-
‘Tplyr’ Implementation
+
+Tplyr Implementation
Our current implementation of risk difference is solely built on top
of the base R function stats::prop.test(). For any and all
@@ -174,50 +175,50 @@
Just like you can get the numeric data from a ‘Tplyr’ layer with
-get_numeric_data(), we’ve also opened up the door to
-extract the raw numeric data from risk difference calculations as well.
-This is done using the function get_stats_data(). The
-function interface is almost identical to
+
Just like you can get the numeric data from a Tplyr
+layer with get_numeric_data(), we’ve also opened up the
+door to extract the raw numeric data from risk difference calculations
+as well. This is done using the function get_stats_data().
+The function interface is almost identical to
get_numeric_data(), except for the extra parameter of
statistic. Although risk difference is the only statistic
-implemented in ‘Tplyr’ at the moment (outside of descriptive
-statistics), we understand that there are multiple methods to calculate
-risk difference, so we’ve built risk difference in a way that it could
-be expanded to easily add new methods in the future. And therefore,
-get_stats_data() the statistic parameter to
-allow you to differentiate in the situation where there are multiple
-statistical methods applied to the layer.
+implemented in Tplyr at the moment (outside of
+descriptive statistics), we understand that there are multiple methods
+to calculate risk difference, so we’ve built risk difference in a way
+that it could be expanded to easily add new methods in the future. And
+therefore, get_stats_data() the statistic
+parameter to allow you to differentiate in the situation where there are
+multiple statistical methods applied to the layer.
The output of get_stats_data() depends on what
parameters have been used:
it is. But in a handful of cases it can get quite tricky, with some odd
situations that need to be handled carefully. For this reason, we found
it necessary to dedicate an entire vignette to just sorting and handling
-columns output by ‘Tplyr’.
Ordering helpers are columns added into ‘Tplyr’ tables to make sure
-that you can sort the display to your preference. In general, ‘Tplyr’
-will create:
+
Ordering helpers are columns added into Tplyr tables
+to make sure that you can sort the display to your preference. In
+general, Tplyr will create:
Layers are one of the fundamental building blocks of ‘Tplyr’. Each
-layer executes independently, and at the end of a build they’re bound
-together. The ord_layer_index variable allows you
-differentiate and sort layers after the table is built. Layers are
-indexed in the order in which they were added to the table using
-add_layer() or add_layers(). For example,
-let’s say you wanted to reverse the order of the layers.
+
Layers are one of the fundamental building blocks of
+Tplyr. Each layer executes independently, and at the
+end of a build they’re bound together. The ord_layer_index
+variable allows you differentiate and sort layers after the table is
+built. Layers are indexed in the order in which they were added to the
+table using add_layer() or add_layers(). For
+example, let’s say you wanted to reverse the order of the layers.
If there’s no VARN variable in the target dataset,
-‘Tplyr’ will then check if the variable you provided is a factor. If
-you’re new to R, spending some time trying to understand factor
-variables is quite worthwhile. Let’s look at example using the variable
-ETHNIC and see some of the advantages in practice.
+Tplyr will then check if the variable you provided is a
+factor. If you’re new to R, spending some time trying to understand
+factor variables is quite worthwhile. Let’s look at example using the
+variable ETHNIC and see some of the advantages in
+practice.
adsl$ETHNIC<-factor(adsl$ETHNIC, levels=c("HISPANIC OR LATINO", "NOT HISPANIC OR LATINO", "DUMMMY"))tplyr_table(adsl, TRT01A)%>%
@@ -1068,19 +1069,20 @@
Factor
variable we set above specifies that “HISPANIC OR LATINO” should sort
first, then “NOT HISPANIC OR LATINO”, and finally “DUMMY”. Notice how
they’re not alphabetical?
-
A highly advantageous aspect of using factor variables in ‘Tplyr’ is
-that factor variables can be used to insert dummy values into your
-table. Consider this line of code from above:
+
A highly advantageous aspect of using factor variables in
+Tplyr is that factor variables can be used to insert
+dummy values into your table. Consider this line of code from above:
adsl$ETHNIC<-factor(adsl$ETHNIC, levels=c("HISPANIC OR LATINO", "NOT HISPANIC OR LATINO", "DUMMMY"))
This is converting the variable ETHNIC to a factor, then
setting the factor levels. But it doesn’t change any of the
values in the dataset - there are no values of “dummy” within
ETHNIC in ADSL. Yet in the output built above, you see rows
-for “DUMMY”. By using factors, you can insert rows into your ‘Tplyr’
-table that don’t exist in the data. This is particularly helpful if
-you’re working with data early on in a study, where certain values are
-expected, yet do not currently exist in the data. This will help you
-prepare tables that are complete even when your data are not.
+for “DUMMY”. By using factors, you can insert rows into your
+Tplyr table that don’t exist in the data. This is
+particularly helpful if you’re working with data early on in a study,
+where certain values are expected, yet do not currently exist in the
+data. This will help you prepare tables that are complete even when your
+data are not.
VARN
@@ -1112,8 +1114,8 @@
VARN
-
‘Tplyr’ will automatically figure this out for you, and pull the
-RACEN values into the variable
+
Tplyr will automatically figure this out for you,
+and pull the RACEN values into the variable
ord_layer_1.
Lastly, If the target doesn’t have a VARN variable in
-the target dataset and isn’t a factor, ‘Tplyr’ will sort the variable
-alphabetically. The resulting order variable will be numeric, simply
-numbering each of the variable values alphabetically. Nothing fancy to
-it!
+the target dataset and isn’t a factor,Tplyr will sort
+the variable alphabetically. The resulting order variable will be
+numeric, simply numbering each of the variable values alphabetically.
+Nothing fancy to it!
“byfactor” - The default method is to sort by a
@@ -1244,10 +1247,10 @@
Sorting Count Layersset_ordering_cols() and
+particular group, like a treatment variable. Tplyr can
+populate the ordering column based on numeric values within any results
+column. This requires some more granular control, for which we’ve
+created the functions set_ordering_cols() and
set_result_order_var() to specify the column and numeric
value on which the ordering column should be based.
In the other vignettes we talk about how to get the most out of
-‘Tplyr’ when it comes to preparing your data. The last step we need to
-cover is how to get from the data output by ‘Tplyr’ to a presentation
-ready table.
+Tplyr when it comes to preparing your data. The last
+step we need to cover is how to get from the data output by
+Tplyr to a presentation ready table.
There are a few things left to do after a table is built. These steps
will vary based on what package you’re using for presentation - but
within this vignette we will demonstrate how to use ‘huxtable’ to prepare
your table and ‘pharmaRTF’ to
write the output.
-
After a ‘Tplyr’ table is built, you will likely have to:
+
After a Tplyr table is built, you will likely have
+to:
Sort the table however you wish using the provided order
variables
, factor variables input to
-‘Tplyr’ will use the factor order for the resulting order variable.
-Another particularly useful advantage of this is dummying values. The
-adsl dataset only contains the races “WHITE”, “BLACK OR AFRICAN
-AMERICAN”, and “AMERICAN INDIAN OR ALASK NATIVE”. If you set factor
-levels prior to entering the data into ‘Tplyr’, the values will be
-dummied for you. This is particularly advantageous when a study is early
-on and data may be sparse. Your output can display complete values and
-the presentation will be consistent as data come in.
+Tplyr will use the factor order for the resulting order
+variable. Another particularly useful advantage of this is dummying
+values. The adsl dataset only contains the races “WHITE”, “BLACK OR
+AFRICAN AMERICAN”, and “AMERICAN INDIAN OR ALASK NATIVE”. If you set
+factor levels prior to entering the data into Tplyr,
+the values will be dummied for you. This is particularly advantageous
+when a study is early on and data may be sparse. Your output can display
+complete values and the presentation will be consistent as data come
+in.
@@ -1480,47 +1484,49 @@
Sorting, Column Ord
Used the apply_row_masks() function. Essentially,
after your data are sorted, this function will look at
all of your row_label variables and drop any repeating values. For
-packages like ‘huxtable’, this eases the process of making your table
-presentation ready, so you don’t need to do any cell merging once the
-‘huxtable’ table is created. Additionally, here we set the
-row_breaks option to TRUE. This will insert a
-blank row between each of your layers, which helps improve the
-presentation depending on your output. It’s important to note that the
-input dataset must still have the
-ord_layer_index variable attached in order for the blank
-rows to be added. Sorting should be done prior, and column
+packages like huxtable, this eases the process of
+making your table presentation ready, so you don’t need to do any cell
+merging once the huxtable table is created.
+Additionally, here we set the row_breaks option to
+TRUE. This will insert a blank row between each of your
+layers, which helps improve the presentation depending on your output.
+It’s important to note that the input dataset must
+still have the ord_layer_index variable attached in order
+for the blank rows to be added. Sorting should be done prior, and column
reordering/dropping may be done after.
Re-ordered the columns and dropped off the order columns. For more
information about how to get the most out of
dplyr::select(), you can look into ‘tidyselect’ here.
This is where the tidyselect::starts_with() comes
from.
-
Added the column headers. In a huxtable, your column headers are
-basically just the top rows of your data frame. The ‘Tplyr’ function
-add_column_headers() does this for you by letting you just
-enter in a string to define your headers. But there’s more to it - you
-can also create nested headers by nesting text within curly brackets
-({}), and notice that we have the treatment groups within the two stars?
-This actually allows you to take the header N values that ‘Tplyr’
-calculates for you, and use it within the column headers. As you can see
-in the first row of the output, the text shows the (N=XX) values
-populated with the proper header_n counts.
+
Added the column headers. In a huxtable, your
+column headers are basically just the top rows of your data frame. The
+Tplyr function add_column_headers() does
+this for you by letting you just enter in a string to define your
+headers. But there’s more to it - you can also create nested headers by
+nesting text within curly brackets ({}), and notice that we have the
+treatment groups within the two stars? This actually allows you to take
+the header N values that Tplyr calculates for you, and
+use it within the column headers. As you can see in the first row of the
+output, the text shows the (N=XX) values populated with the proper
+header_n counts.
Table Styling
-
There are a lot of options of where to go next. The ‘gt’ package is always a good choice,
-and we’ve been using ‘kableExtra’
+
There are a lot of options of where to go next. The gt package is always a
+good choice, and we’ve been using kableExtra
throughout these vignettes. At the moment, with the tools we’ve made
-available in ‘Tplyr’, if you’re aiming to create RTF outputs (which is
-still a common requirement in within pharma companies), ‘huxtable’ and our
-package ‘pharmaRTF’ will
-get you where you need to go.
-
(Note: We plan to extend ‘pharmaRTF’ to support ‘GT’ when it has
-better RTF support)
-
Alright - so the table is ready. Let’s prepare the ‘huxtable’
-table.
+available in Tplyr, if you’re aiming to create RTF
+outputs (which is still a common requirement in within pharma
+companies), huxtable
+and our package pharmaRTF
+will get you where you need to go.
+
(Note: We plan to extend pharmaRTF to support
+GT when it has better RTF support)
+
Alright - so the table is ready. Let’s prepare the
+huxtable table.
# Make the tableht<-huxtable::as_hux(dat, add_colnames=FALSE)%>%
@@ -1779,7 +1785,8 @@
And here we have it - Our table is styled and ready to go!
-
If you’d like to learn more about how to use ‘huxtable’, be sure to
-check out the website. For use
-specifically with ‘pharmaRTF’, we prepared a vignette of tips and tricks
-here.
+
If you’d like to learn more about how to use huxtable,
+be sure to check out the website. For use
+specifically with pharmaRTF, we prepared a vignette of tips
+and tricks here.
Most of the work in creating a ‘Tplyr’ table is at the layer level,
-but there are a few overarching properties that are worth spending some
-time discussing. One of the things that we wanted to make sure we did in
-‘Tplyr’ is allow you to eliminate redundant code wherever possible.
-Adding some processing to the tplyr_table() level allows us
-to do that. Furthermore, some settings simply need to be applied table
-wide.
+
Most of the work in creating a Tplyr table is at the
+layer level, but there are a few overarching properties that are worth
+spending some time discussing. One of the things that we wanted to make
+sure we did in Tplyr is allow you to eliminate
+redundant code wherever possible. Adding some processing to the
+tplyr_table() level allows us to do that. Furthermore, some
+settings simply need to be applied table wide.
A last and very important aspect of table level properties in ‘Tplyr’
-is the addition of a population dataset. In CDISC standards, datasets
-like adae only contain adverse events when they occur. This
-means that if a subject did not experience an adverse event, or did not
-experience an adverse event within the criteria that you’re subsetting
-for, they don’t appear in the dataset. When you’re looking at the
-proportion of subject who experienced an adverse event compared to the
-total number of subjects in that cohort, adae itself leaves
-you no way to calculate that total - as the subjects won’t exist in the
-data.
-
‘Tplyr’ allows you to provide a separate population dataset to
-overcome this. Furthermore, you are also able to provide a separate
-population dataset where parameter and a population
-treatment variable named pop_treat_var, as variable names
-may differ between the datasets.
+
A last and very important aspect of table level properties in
+Tplyr is the addition of a population dataset. In CDISC
+standards, datasets like adae only contain adverse events
+when they occur. This means that if a subject did not experience an
+adverse event, or did not experience an adverse event within the
+criteria that you’re subsetting for, they don’t appear in the dataset.
+When you’re looking at the proportion of subject who experienced an
+adverse event compared to the total number of subjects in that cohort,
+adae itself leaves you no way to calculate that total - as
+the subjects won’t exist in the data.
+
Tplyr allows you to provide a separate population
+dataset to overcome this. Furthermore, you are also able to provide a
+separate population dataset where parameter and a
+population treatment variable named pop_treat_var, as
+variable names may differ between the datasets.
Note the percentage values in the summary above. By setting the
-population data, ‘Tplyr’ now knew to use those values when calculating
-the percentages for the distinct counts of subjects who experienced the
-summarized adverse events. Furthermore, with the population data
-provided, ‘Tplyr’ is able to calculate your header N’s properly:
+population data, Tplyr now knew to use those values
+when calculating the percentages for the distinct counts of subjects who
+experienced the summarized adverse events. Furthermore, with the
+population data provided, Tplyr is able to calculate
+your header N’s properly:
And finally, learn more about producing and outputting styled tables
-using ‘Tplyr’ in vignettes("styled-table")
+using Tplyr in
+vignette("styled-table")
Welcome to Tplyr! Tplyr is a traceability minded grammar of data format and summary. It’s designed to simplify the creation of common clinical summaries and help you focus on how you present your data rather than redundant summaries being performed. Furthermore, for every result Tplyr produces, it also produces the metadata necessary to give your traceability from source to summary.
+
Welcome to Tplyr! Tplyr is a traceability minded grammar of data format and summary. It’s designed to simplify the creation of common clinical summaries and help you focus on how you present your data rather than redundant summaries being performed. Furthermore, for every result Tplyr produces, it also produces the metadata necessary to give your traceability from source to summary.
As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue through GitHub right here.
dplyr from tidyverse is a grammar of data manipulation. So what does that allow you to do? It gives you, as a data analyst, the capability to easily and intuitively approach the problem of manipulating your data into an analysis ready form. dplyr conceptually breaks things down into verbs that allow you to focus on what you want to do more than how you have to do it.
-
Tplyr is designed around a similar concept, but its focus is on building summary tables common within the clinical world. In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories:
+
dplyr from tidyverse is a grammar of data manipulation. So what does that allow you to do? It gives you, as a data analyst, the capability to easily and intuitively approach the problem of manipulating your data into an analysis ready form. dplyr conceptually breaks things down into verbs that allow you to focus on what you want to do more than how you have to do it.
+
Tplyr is designed around a similar concept, but its focus is on building summary tables common within the clinical world. In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories:
Counting for event based variables or categories
Shifting, which is just counting a change in state with a ‘from’ and a ‘to’
So we have one table, with 6 summaries (7 including the next page, not shown) - but only 2 different approaches to summaries being performed. In the same way that dplyr is a grammar of data manipulation, Tplyr aims to be a grammar of data summary. The goal of Tplyr is to allow you to program a summary table like you see it on the page, by breaking a larger problem into smaller ‘layers’, and combining them together like you see on the page.
+
So we have one table, with 6 summaries (7 including the next page, not shown) - but only 2 different approaches to summaries being performed. In the same way that dplyr is a grammar of data manipulation, Tplyr aims to be a grammar of data summary. The goal of Tplyr is to allow you to program a summary table like you see it on the page, by breaking a larger problem into smaller ‘layers’, and combining them together like you see on the page.
We understand how important documentation and testing is within the pharmaceutical world. This is why outside of unit testing ’Tplyr includes an entire user-acceptance testing document, where requirements were established, test-cases were written, and tests were independently programmed and executed. We do this in the hope that you can leverage our work within a qualified programming environment, and that we save you a substantial amount of trouble in getting it there.
+
We understand how important documentation and testing is within the pharmaceutical world. This is why outside of unit testing Tplyr includes an entire user-acceptance testing document, where requirements were established, test-cases were written, and tests were independently programmed and executed. We do this in the hope that you can leverage our work within a qualified programming environment, and that we save you a substantial amount of trouble in getting it there.
You can find the qualification document within this repository right here. The ‘uat’ folder additionally contains all of the raw files, programmatic tests, specifications, and test cases necessary to create this report.
The TL;DR
-
Here are some of the high level benefits of using Tplyr:
+
Here are some of the high level benefits of using Tplyr:
Easy construction of table data using an intuitive syntax
Smart string formatting for your numbers that’s easily specified by the user
In building ‘Tplyr’, we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.
+
In building Tplyr, we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.
diff --git a/docs/reference/by.html b/docs/reference/by.html
index cfee31b7..890fd735 100644
--- a/docs/reference/by.html
+++ b/docs/reference/by.html
@@ -1,5 +1,5 @@
-Set or return by layer binding — get_by • TplyrSet or return by layer binding — get_by • TplyrCreate a f_str object — f_str • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
@@ -198,7 +198,7 @@
Valid f_str() Variable
mean
sd
median
-
variance
+
var
min
max
iqr
diff --git a/docs/reference/get_meta_result.html b/docs/reference/get_meta_result.html
index 9780be7a..764626b1 100644
--- a/docs/reference/get_meta_result.html
+++ b/docs/reference/get_meta_result.html
@@ -1,6 +1,6 @@
Extract the result metadata of a Tplyr table — get_meta_result • TplyrExtract the result metadata of a Tplyr table — get_meta_result • TplyrExtract the subset of data based on result metadata — get_meta_subset • TplyrExtract the subset of data based on result metadata — get_meta_subset • TplyrGet the metadata dataframe from a tplyr_table — get_metadata • TplyrGet the metadata dataframe from a tplyr_table — get_metadata • TplyrRetrieve the numeric data from a tplyr objects — get_numeric_data • TplyrGet statistics data — get_stats_data • TplyrGet statistics data — get_stats_data • TplyrReturn or set header_n binding — header_n • TplyrReturn or set header_n binding — header_n • TplyrFunction reference • TplyrFunction reference • TplyrSelect levels to keep in a count layer — keep_levels • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
diff --git a/docs/reference/layer_attachment.html b/docs/reference/layer_attachment.html
index 1c335dff..4b7b936f 100644
--- a/docs/reference/layer_attachment.html
+++ b/docs/reference/layer_attachment.html
@@ -17,7 +17,7 @@
attach them. This is helpful when using functions like get_numeric_data or
get_stats_data when you can access information from a layer directly.
add_layer has a name parameter, and layers can be named in add_layers by
-submitting the layer as a named argument.'>Attach a layer to a tplyr_table object — add_layer • TplyrAttach a layer to a tplyr_table object — add_layer • TplyrCreate a count, desc, or shift layer for discrete count
-based summaries, descriptive statistics summaries, or shift count summaries — group_count • TplyrSet the ordering logic for the count layer — set_order_count_method • TplyrSet the ordering logic for the count layer — set_order_count_method • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
@@ -306,11 +306,12 @@
Examples)build(t)#># A tibble: 3 × 6
-#> row_label1 var1_3 var1_4 var1_5 ord_layer_index ord_layer_1
-#><chr><chr><chr><chr><int><dbl>
-#>1 4 " 1 ( 6.7%)"" 8 ( 66.7%)"" 2 ( 40.0… 1 1
-#>2 6 " 2 ( 13.3%)"" 4 ( 33.3%)"" 1 ( 20.0… 1 2
-#>3 8 "12 ( 80.0%)"" 0 ( 0.0%)"" 2 ( 40.0… 1 3
+#> row_label1 var1_3 var1_4 var1_5 ord_layer_index ord_lay…¹
+#><chr><chr><chr><chr><int><dbl>
+#>1 4 " 1 ( 6.7%)"" 8 ( 66.7%)"" 2 ( 40.0%)" 1 1
+#>2 6 " 2 ( 13.3%)"" 4 ( 33.3%)"" 1 ( 20.0%)" 1 2
+#>3 8 "12 ( 80.0%)"" 0 ( 0.0%)"" 2 ( 40.0%)" 1 3
+#># … with abbreviated variable name ¹ord_layer_1# Sorting by <VAR>Nmtcars$cylN<-mtcars$cyl
diff --git a/docs/reference/pipe.html b/docs/reference/pipe.html
index 3e4f4c30..74535a1f 100644
--- a/docs/reference/pipe.html
+++ b/docs/reference/pipe.html
@@ -1,5 +1,5 @@
-Pipe operator — %>% • TplyrPipe operator — %>% • TplyrReturn or set population data bindings — pop_data • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
diff --git a/docs/reference/pop_treat_var.html b/docs/reference/pop_treat_var.html
index 038e90b9..4fe21240 100644
--- a/docs/reference/pop_treat_var.html
+++ b/docs/reference/pop_treat_var.html
@@ -1,7 +1,7 @@
Return or set pop_treat_var binding — pop_treat_var • TplyrReturn or set pop_treat_var binding — pop_treat_var • TplyrSet or return precision_by layer binding — get_precision_by • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
diff --git a/docs/reference/precision_on.html b/docs/reference/precision_on.html
index 7b2fc3c6..61f909d8 100644
--- a/docs/reference/precision_on.html
+++ b/docs/reference/precision_on.html
@@ -1,7 +1,7 @@
Set or return precision_on layer binding — get_precision_on • TplyrSet or return precision_on layer binding — get_precision_on • TplyrProcess layers to get formatted and pivoted tables. — process_formatting • TplyrProcess layers to get formatted and pivoted tables. — process_formatting • TplyrProcess a tplyr_statistic object — process_statistic_data • TplyrProcess string formatting on a tplyr_statistic object — process_statistic_formatting • TplyrProcess string formatting on a tplyr_statistic object — process_statistic_formatting • TplyrProcess layers to get numeric results of layer — process_summaries • TplyrProcess layers to get numeric results of layer — process_summaries • TplyrSet custom summaries to be performed within a descriptive statistics layer — set_custom_summaries • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
@@ -156,11 +156,11 @@
- 1.0.0
+ 1.0.1.9000
diff --git a/docs/reference/set_outer_sort_position.html b/docs/reference/set_outer_sort_position.html
index 6deca4cc..aeabb3fc 100644
--- a/docs/reference/set_outer_sort_position.html
+++ b/docs/reference/set_outer_sort_position.html
@@ -1,5 +1,5 @@
-Set the value of a outer nested count layer to Inf or -Inf — set_outer_sort_position • TplyrSet the value of a outer nested count layer to Inf or -Inf — set_outer_sort_position • TplyrSet precision data — set_precision_data • TplyrSet descriptive statistics as columns — set_stats_as_columns • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
diff --git a/docs/reference/set_total_row_label.html b/docs/reference/set_total_row_label.html
index 0a2215f7..f279da3d 100644
--- a/docs/reference/set_total_row_label.html
+++ b/docs/reference/set_total_row_label.html
@@ -1,6 +1,6 @@
Set the label for the total row — set_total_row_label • TplyrSet the label for the total row — set_total_row_label • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
diff --git a/docs/reference/table_format_defaults.html b/docs/reference/table_format_defaults.html
index 45e6bfff..fd7e0d46 100644
--- a/docs/reference/table_format_defaults.html
+++ b/docs/reference/table_format_defaults.html
@@ -3,7 +3,7 @@
strings. You may wish to reuse the same format strings across numerous
layers. set_desc_layer_formats and set_count_layer_formats
allow you to apply your desired format strings within the entire scope of the
-table.">Get or set the default format strings for descriptive statistics layers — get_desc_layer_formats • TplyrGet or set the default format strings for descriptive statistics layers — get_desc_layer_formats • TplyrTplyr
- 1.0.0
+ 1.0.1.9000
diff --git a/docs/reference/target_var.html b/docs/reference/target_var.html
index c3b28ef0..acb140b3 100644
--- a/docs/reference/target_var.html
+++ b/docs/reference/target_var.html
@@ -1,5 +1,5 @@
-Set or return treat_var binding — get_target_var • TplyrSet or return treat_var binding — get_target_var • TplyrCreate a tplyr_layer object — tplyr_layer • TplyrCombine existing treatment groups for summary — add_treat_grps • TplyrCombine existing treatment groups for summary — add_treat_grps • Tplyr%
+ set_pop_data(adsl) %>%
+ set_pop_treat_var(TRT01A) %>%
+ set_pop_where(TRUE) %>%
+ add_layer(
+ group_count(AEDECOD, where = AEREL != "NONE") %>%
+ set_distinct_by(USUBJID) %>%
+ set_format_strings(f_str("xxx, [xxx] (xxx.x%) [xxx.x%]", distinct_n, n, distinct_pct, pct))
+ )
+
+ expect_snapshot(dput(build(t)))
+
+})
diff --git a/vignettes/Tplyr.Rmd b/vignettes/Tplyr.Rmd
index 69e435f1..5c2068b9 100644
--- a/vignettes/Tplyr.Rmd
+++ b/vignettes/Tplyr.Rmd
@@ -26,17 +26,17 @@ load("adsl.Rdata")
load("adlb.Rdata")
```
-# How 'Tplyr' Works
+# How **Tplyr** Works
When you look at a summary table within a clinical report, you can often break it down into some basic pieces. Consider this output.
![](./demo_table.png)
-Different variables are being summarized in chunks of the table, which we refer to as "layers". Additionally, this table really only contains a few different types of summaries, which makes many of the calculations rather redundant. This drives the motivation behind 'Tplyr'. The containing table is encapsulated within the `tplyr_table()` object, and each section, or "layer", within the summary table can be broken down into a `tplyr_layer()` object.
+Different variables are being summarized in chunks of the table, which we refer to as "layers". Additionally, this table really only contains a few different types of summaries, which makes many of the calculations rather redundant. This drives the motivation behind **Tplyr**. The containing table is encapsulated within the `tplyr_table()` object, and each section, or "layer", within the summary table can be broken down into a `tplyr_layer()` object.
## The `tplyr_table()` Object
-The `tplyr_table()` object is the conceptual "table" that contains all of the logic necessary to construct and display the data. 'Tplyr' tables are made up of one or more layers. Each layer contains an instruction for a summary to be performed. The `tplyr_table()` object contains those layers, and the general data, metadata, and logic necessary to prepare the data before any layers are constructed.
+The `tplyr_table()` object is the conceptual "table" that contains all of the logic necessary to construct and display the data. **Tplyr** tables are made up of one or more layers. Each layer contains an instruction for a summary to be performed. The `tplyr_table()` object contains those layers, and the general data, metadata, and logic necessary to prepare the data before any layers are constructed.
When a `tplyr_table()` is created, it will contain the following bindings:
@@ -60,14 +60,14 @@ t <- tplyr_table(adsl, TRT01P, where = SAFFL == "Y")
t
```
-## The `tplyr_layer` Object
+## The `tplyr_layer()` Object
-Users of 'Tplyr' interface with `tplyr_layer()` objects using the `group_` family of functions. This family specifies the type of summary that is to be performed within a layer. `count` layers are used to create summary counts of some discrete variable. `shift` layers summarize the counts for different changes in states. Lastly, `desc` layers create descriptive statistics.
+Users of **Tplyr** interface with `tplyr_layer()` objects using the `group_` family of functions. This family specifies the type of summary that is to be performed within a layer. `count` layers are used to create summary counts of some discrete variable. `shift` layers summarize the counts for different changes in states. Lastly, `desc` layers create descriptive statistics.
- **Count Layers**
- Count layers allow you to easily create summaries based on counting distinct or non-distinct occurrences of values within a variable. Additionally, this layer allows you to create n (%) summaries where you're also summarizing the proportion of instances a value occurs compared to some denominator. Count layers are also capable of producing counts of nested relationships. For example, if you want to produce counts of an overall outside group, and then the subgroup counts within that group, you can simply specify the target variable as `vars(OutsideVariable, InsideVariable)`. This allows you to do tables like Adverse Events where you want to see the Preferred Terms within Body Systems, all in one layer. Count layers can also distinguish between distinct and non-distinct counts. Using some specified by variable, you can count the unique occurrences of some variable within the specified by grouping, including the target. This allows you to do a summary like unique subjects and their proportion experiencing some adverse event, and the number of total occurrences of that adverse event.
- **Descriptive Statistics Layers**
- - Descriptive statistics layers perform summaries on continuous variables. There are a number of summaries built into 'Tplyr' already that you can perform, including n, mean, median, standard deviation, variance, min, max, interquartile range, Q1, Q3, and missing value counts. From these available summaries, the default presentation of a descriptive statistics layer will output 'n', 'Mean (SD)', 'Median', 'Q1, Q3', 'Min, Max', and 'Missing'. You can change these summaries using `set_format_strings()`, and you can also add your own summaries using `set_custom_summaries()`. This allows you to easily implement any additional summary statistics you want presented.
+ - Descriptive statistics layers perform summaries on continuous variables. There are a number of summaries built into **Tplyr** already that you can perform, including n, mean, median, standard deviation, variance, min, max, interquartile range, Q1, Q3, and missing value counts. From these available summaries, the default presentation of a descriptive statistics layer will output 'n', 'Mean (SD)', 'Median', 'Q1, Q3', 'Min, Max', and 'Missing'. You can change these summaries using `set_format_strings()`, and you can also add your own summaries using `set_custom_summaries()`. This allows you to easily implement any additional summary statistics you want presented.
- **Shift Layers**
- Shift layers are largely an abstraction of the count layer - and in fact, we re-use a lot of the same code to process these layers. In many shift tables, the "from" state is presented as rows in the table, and the "to" state is presented as columns. This clearly lays out how many subjects changed state between a baseline and some point in time. Shift layers give you an intuitive API to break these out, using a very similar interface as the other layers. There are also a number of modifier functions available to control nuanced aspects, such as how denominators should be applied.
@@ -85,7 +85,7 @@ shf
## Adding Layers to a Table
-Everyone has their own style of coding - so we've tried to be flexible to an extent. Overall, 'Tplyr' is built around tidy syntax, so all of our object construction supports piping with [`magrittr`](https://magrittr.tidyverse.org/articles/magrittr.html) (i.e. `%>%`).
+Everyone has their own style of coding - so we've tried to be flexible to an extent. Overall, **Tplyr** is built around tidy syntax, so all of our object construction supports piping with [`magrittr`](https://magrittr.tidyverse.org/articles/magrittr.html) (i.e. `%>%`).
There are two ways to add layers to a `tplyr_table()`: `add_layer()` and `add_layers()`. The difference is that `add_layer()` allows you to construct the layer within the call to `add_layer()`, whereas with `add_layers()` you can attach multiple layers that have already been constructed upfront:
@@ -140,7 +140,7 @@ t %>%
```
-But there's more you can get from 'Tplyr'. It's great to have the formatted numbers, but what about the numeric data behind the scenes? Maybe a number looks suspicious and you need to investigate _how_ you got that number. What if you want to calculate your own statistics based off of the counts? You can get that information as well using `get_numeric_data()`. This returns the numeric data from each layer as a list of data frames:
+But there's more you can get from **Tplyr**. It's great to have the formatted numbers, but what about the numeric data behind the scenes? Maybe a number looks suspicious and you need to investigate _how_ you got that number. What if you want to calculate your own statistics based off of the counts? You can get that information as well using `get_numeric_data()`. This returns the numeric data from each layer as a list of data frames:
```{r get_numeric_data}
get_numeric_data(t) %>%
@@ -148,15 +148,15 @@ get_numeric_data(t) %>%
kable()
```
-By storing pertinent information, you can get more out of a 'Tplyr' object than processed data for display. And by specifying when you want to get data out of 'Tplyr', we can save you from repeatedly processing data while your constructing your outputs - which is particularly useful when that computation starts taking time.
+By storing pertinent information, you can get more out of a **Tplyr** object than processed data for display. And by specifying when you want to get data out of **Tplyr**, we can save you from repeatedly processing data while your constructing your outputs - which is particularly useful when that computation starts taking time.
## Constructing Layers
-The bulk of 'Tplyr' coding comes from constructing your layers and specifying the work you want to be done. Before we get into this, it's important to discuss how 'Tplyr' handles string formatting.
+The bulk of **Tplyr** coding comes from constructing your layers and specifying the work you want to be done. Before we get into this, it's important to discuss how **Tplyr** handles string formatting.
-### String Formatting in 'Tplyr'
+### String Formatting in **Tplyr**
-String formatting in 'Tplyr' is controlled by an object called an `f_str()`, which is also the name of function you use to create these formats. To set these format strings into a `tplyr_layer()`, you use the function `set_format_strings()`, and this usage varies slightly between layer types (which is covered in other vignettes).
+String formatting in **Tplyr** is controlled by an object called an `f_str()`, which is also the name of function you use to create these formats. To set these format strings into a `tplyr_layer()`, you use the function `set_format_strings()`, and this usage varies slightly between layer types (which is covered in other vignettes).
So - why is this object necessary. Consider this example:
@@ -181,7 +181,7 @@ In a perfect world, the `f_str()` calls wouldn't be necessary - but in reality t
- The row labels in the `row_label2` column are taken from the left side of each `=` in `set_format_strings()`
- The string formats, including integer length and decimal precision, and exact presentation formatting are taken from the strings within the first parameter of each `f_str()` call
-- The second and greater parameters within each `f_str()` call determine the descriptive statistic summaries that will be performed. This is connected to a number of default summaries available within 'Tplyr', but you can also create your own summaries (covered in other vignettes). The default summaries that are built in include:
+- The second and greater parameters within each `f_str()` call determine the descriptive statistic summaries that will be performed. This is connected to a number of default summaries available within **Tplyr**, but you can also create your own summaries (covered in other vignettes). The default summaries that are built in include:
- `n` = Number of observations
- `mean` = Mean
- `sd` = Standard Deviation
@@ -246,7 +246,7 @@ tplyr_table(adsl, TRT01P) %>%
kable()
```
-To override these defaults, just specify the summaries that you want to be performed using `set_format_strings()` as described above. But what if 'Tplyr' doesn't have a built in function to do the summary statistic that you want to see? Well - you can make your own! This is where `set_custom_summaries()` comes into play. Let's say you want to derive a geometric mean.
+To override these defaults, just specify the summaries that you want to be performed using `set_format_strings()` as described above. But what if **Tplyr** doesn't have a built in function to do the summary statistic that you want to see? Well - you can make your own! This is where `set_custom_summaries()` comes into play. Let's say you want to derive a geometric mean.
```{r custom_summaries}
tplyr_table(adsl, TRT01P) %>%
@@ -266,7 +266,7 @@ tplyr_table(adsl, TRT01P) %>%
In `set_custom_summaries()`, first you name the summary being performed. This is important - that name is what you use in the `f_str()` call to incorporate it into a format. Next, you program or call the function desired. What happens in the background is that this is used in a call to `dplyr::summarize()` - so use similar syntax. Use the variable name `.var` in your custom summary function. This is necessary because it allows a generic variable name to be used when multiple target variables are specified - and therefore the function can be applied to both target variables.
-Sometimes there's a need to present multiple variables summarized side by side. 'Tplyr' allows you to do this as well.
+Sometimes there's a need to present multiple variables summarized side by side. **Tplyr** allows you to do this as well.
```{r desc2}
tplyr_table(adsl, TRT01P) %>%
@@ -278,7 +278,7 @@ tplyr_table(adsl, TRT01P) %>%
```
-'Tplyr' summarizes both variables and merges them together. This makes creating tables where you need to compare BASE, AVAL, and CHG next to each other nice and simple. Note the use of `dplyr::vars()` - in any situation where you'd like to use multiple variable names in a parameter, use `dplyr::vars()` to specify the variables. You can use text strings in the calls to `dplyr::vars()` as well.
+**Tplyr** summarizes both variables and merges them together. This makes creating tables where you need to compare BASE, AVAL, and CHG next to each other nice and simple. Note the use of `dplyr::vars()` - in any situation where you'd like to use multiple variable names in a parameter, use `dplyr::vars()` to specify the variables. You can use text strings in the calls to `dplyr::vars()` as well.
## Count Layers
@@ -295,7 +295,7 @@ tplyr_table(adsl, TRT01P) %>%
```
-Sometimes it's also necessary to count summaries based on distinct values. 'Tplyr' allows you to do this as well with `set_distinct_by()`:
+Sometimes it's also necessary to count summaries based on distinct values. **Tplyr** allows you to do this as well with `set_distinct_by()`:
```{r count_distinct}
tplyr_table(adae, TRTA) %>%
@@ -311,7 +311,7 @@ tplyr_table(adae, TRTA) %>%
There's another trick going on here - to create a summary with row label text like you see above, text strings can be used as the target variables. Here, we use this in combination with `set_distinct_by()` to count distinct subjects.
-Adverse event tables often call for counting AEs of something like a body system and counting actual events within that body system. 'Tplyr' has means of making this simple for the user as well.
+Adverse event tables often call for counting AEs of something like a body system and counting actual events within that body system. **Tplyr** has means of making this simple for the user as well.
```{r count_nested}
tplyr_table(adae, TRTA) %>%
@@ -323,11 +323,11 @@ tplyr_table(adae, TRTA) %>%
kable()
```
-Here we again use `dplyr::vars()` to specify multiple target variables. When used in a count layer, 'Tplyr' knows automatically that the first variable is a grouping variable for the second variable, and counts shall be produced for both then merged together.
+Here we again use `dplyr::vars()` to specify multiple target variables. When used in a count layer, **Tplyr** knows automatically that the first variable is a grouping variable for the second variable, and counts shall be produced for both then merged together.
## Shift Layers
-Lastly, let's talk about shift layers. A common example of this would be looking at a subject's lab levels at baseline versus some designated evaluation point. This would tell us, for example, how many subjects were high at baseline for a lab test vs. after an intervention has been introduced. The shift layer in 'Tplyr' is intended for creating shift tables that show these data as a matrix, where one state will be presented in rows and the other in columns. Let's look at an example.
+Lastly, let's talk about shift layers. A common example of this would be looking at a subject's lab levels at baseline versus some designated evaluation point. This would tell us, for example, how many subjects were high at baseline for a lab test vs. after an intervention has been introduced. The shift layer in **Tplyr** is intended for creating shift tables that show these data as a matrix, where one state will be presented in rows and the other in columns. Let's look at an example.
```{r shift1}
# Tplyr can use factor orders to dummy values and order presentation
@@ -344,24 +344,24 @@ tplyr_table(adlb, TRTA, where = PARAMCD == "CK") %>%
```
-The underlying process of shift tables is the same as count layers - we're counting the number of occurrences of something by a set of grouping variables. This differs in that 'Tplyr' uses the `group_shift()` API to use the same basic interface as other tables, but translate your target variables into the row variable and the column variable. Furthermore, there is some enhanced control over how denominators should behave that is necessary for a shift layer.
+The underlying process of shift tables is the same as count layers - we're counting the number of occurrences of something by a set of grouping variables. This differs in that **Tplyr** uses the `group_shift()` API to use the same basic interface as other tables, but translate your target variables into the row variable and the column variable. Furthermore, there is some enhanced control over how denominators should behave that is necessary for a shift layer.
# Where to go from here?
-There's quite a bit more to learn! And we've prepared a number of other vignettes to help you get what you need out of 'Tplyr'.
+There's quite a bit more to learn! And we've prepared a number of other vignettes to help you get what you need out of **Tplyr**.
- Learn more about table level settings in `vignette("table")`
- Learn more about descriptive statistics layers in `vignette("desc")`
- Learn more about count and shift layers in `vignette("count")`
- Learn more about shift layers in `vignette("shift")`
- Learn more about calculating risk differences in `vignette("riskdiff")`
-- Learn more about sorting 'Tplyr' tables in `vignette("sort")`
-- Learn more about using 'Tplyr' options in `vignette("options")`
-- And finally, learn more about producing and outputting styled tables using 'Tplyr' in `vignette("styled-table")`
+- Learn more about sorting **Tplyr** tables in `vignette("sort")`
+- Learn more about using **Tplyr** options in `vignette("options")`
+- And finally, learn more about producing and outputting styled tables using **Tplyr** in `vignette("styled-table")`
# References
-In building 'Tplyr', we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.
+In building **Tplyr**, we needed some additional resources in addition to our personal experience to help guide design. PHUSE has done some great work to create guidance for standard outputs with collaboration between multiple pharmaceutical companies and the FDA. You can find some of the resource that we referenced below.
[Analysis and Displays Associated with Adverse Events](https://phuse.s3.eu-central-1.amazonaws.com/Deliverables/Standard+Analyses+and+Code+Sharing/Analyses+and+Displays+Associated+with+Adverse+Events+Focus+on+Adverse+Events+in+Phase+2-4+Clinical+Trials+and+Integrated+Summary.pdf)
diff --git a/vignettes/count.Rmd b/vignettes/count.Rmd
index 04162872..3ceebb4a 100644
--- a/vignettes/count.Rmd
+++ b/vignettes/count.Rmd
@@ -52,7 +52,7 @@ kable(t)
Another exceptionally important consideration within count layers is whether you should be using distinct counts, non-distinct counts, or some combination of both. Adverse event tables are a perfect example. Often, you're concerned about how many subjects had an adverse event in particular instead of just the number of occurrences of that adverse event. Similarly, the number occurrences of an event isn't necessarily relevant when compared to the total number of adverse events that occurred. For this reason, what you likely want to look at is instead the number of subjects who experienced an event compared to the total number of subjects in that treatment group.
-'Tplyr' allows you to focus on these distinct counts and distinct percents within some grouping variable, like subject. Additionally, you can mix and match with the distinct counts with non-distinct counts in the same row too. The `set_distinct_by()` function sets the variables used to calculate the distinct occurrences of some value using the specified `distinct_by` variables.
+**Tplyr** allows you to focus on these distinct counts and distinct percents within some grouping variable, like subject. Additionally, you can mix and match with the distinct counts with non-distinct counts in the same row too. The `set_distinct_by()` function sets the variables used to calculate the distinct occurrences of some value using the specified `distinct_by` variables.
```{r}
t <- tplyr_table(adae, TRTA) %>%
@@ -101,6 +101,6 @@ tplyr_table(adae, TRTA) %>%
kable()
```
-By using `set_nest_count()`, this triggers 'Tplyr' to drop row_label1, and indent all of the AEDECOD values within row_label2. The columns are renamed appropriately as well. The default indentation used will be 3 spaces, but as you can see here - you can set the indentation however you like. This let's you use tab strings for different language-specific output types, stick with spaces, indent wider or smaller - whatever you wish. All of the existing order variables remain, so this has no impact on your ability to sort the table.
+By using `set_nest_count()`, this triggers **Tplyr** to drop row_label1, and indent all of the AEDECOD values within row_label2. The columns are renamed appropriately as well. The default indentation used will be 3 spaces, but as you can see here - you can set the indentation however you like. This let's you use tab strings for different language-specific output types, stick with spaces, indent wider or smaller - whatever you wish. All of the existing order variables remain, so this has no impact on your ability to sort the table.
There's a lot more to counting! So be sure to check out our vignettes on sorting, shift tables, and denominators.
diff --git a/vignettes/custom-metadata.Rmd b/vignettes/custom-metadata.Rmd
index 3b1830cd..7cebb1ce 100644
--- a/vignettes/custom-metadata.Rmd
+++ b/vignettes/custom-metadata.Rmd
@@ -79,17 +79,17 @@ full_data <- bind_rows(sum_data, model_portion) %>%
```
-As covered in `vignette('metadata')`, Tplyr can produce metadata for any result that it calculates. But what about data that Tplyr can't produce, such as a efficacy results or some sort of custom analysis? You may still want that drill down capability either on your own or paired with an existing Tplyr table.
+As covered in `vignette('metadata')`,**Tplyr** can produce metadata for any result that it calculates. But what about data that **Tplyr** can't produce, such as a efficacy results or some sort of custom analysis? You may still want that drill down capability either on your own or paired with an existing **Tplyr** table.
-Take for instance Table 14-3.01 from the [CDISC Pilot](https://github.com/atorus-research/CDISC_pilot_replication). Skipping the actual construction of the table, here's the output data from Tplyr and some manual calculation:
+Take for instance Table 14-3.01 from the [CDISC Pilot](https://github.com/atorus-research/CDISC_pilot_replication). Skipping the actual construction of the table, here's the output data from **Tplyr** and some manual calculation:
```{r view data}
kable(full_data)
```
-This is the primary efficacy table from the trial. The top portion of this table is fairly straightforward with Tplyr and can be done using descriptive statistic layers. Once you hit the p-values on the lower house, this becomes beyond Tplyr's remit. To produce the table, you can combine Tplyr output with a separate data frame analyzed and formatted yourself (but note you can still use some help from Tplyr tools like `apply_formats()`).
+This is the primary efficacy table from the trial. The top portion of this table is fairly straightforward with **Tplyr** and can be done using descriptive statistic layers. Once you hit the p-values on the lower house, this becomes beyond Tplyr's remit. To produce the table, you can combine **Tplyr** output with a separate data frame analyzed and formatted yourself (but note you can still use some help from **Tplyr** tools like `apply_formats()`).
-But what about the metadata? How do you get the drill down capabilities for that lower half of the table? We've provided a couple additional tools in Tplyr to allow you to construct your own metadata and append existing metadata present in a Tplyr table.
+But what about the metadata? How do you get the drill down capabilities for that lower half of the table? We've provided a couple additional tools in **Tplyr** to allow you to construct your own metadata and append existing metadata present in a **Tplyr** table.
## Build a `tplyr_meta` object
@@ -103,7 +103,7 @@ m <- tplyr_meta(
m
```
-The `tplyr_meta()` function can take these fields immediately upon creation. If you need to dynamically create a `tplyr_meta` object such as how Tplyr constructs the objects internally), the functions `add_variables()` and `add_filters()` are available to extend an existing `tplyr_meta` object:
+The `tplyr_meta()` function can take these fields immediately upon creation. If you need to dynamically create a `tplyr_meta` object such as how **Tplyr** constructs the objects internally), the functions `add_variables()` and `add_filters()` are available to extend an existing `tplyr_meta` object:
```{r extending tplyr_meta}
m <- m %>%
@@ -159,11 +159,11 @@ When building a data frame for use with `tplyr_table` metadata, there are really
- You need a column in the data frame called `row_id`
- The `row_id` values cannot be duplicates of any other value within the existing metadata.
-The `row_id` values built by Tplyr will always follow the format "n_n", where the first letter of the layer type will either be "c", "d", or "s". The next number is the layer number (i.e. the order in which the layer was inserted to the Tplyr table), and then finally the row of that layer within the output. For example, the third row of a count layer that was the second layer in the table would have a `row_id` of "c2_3". In this example, I chose "x4_n" as the format for the "x" to symbolize custom, and these data can be thought of as the fourth layer. That said, these values would typically be masked by the viewer of the table so they really just need to be unique - so you can choose whatever you want.
+The `row_id` values built by **Tplyr** will always follow the format "n_n", where the first letter of the layer type will either be "c", "d", or "s". The next number is the layer number (i.e. the order in which the layer was inserted to the **Tplyr** table), and then finally the row of that layer within the output. For example, the third row of a count layer that was the second layer in the table would have a `row_id` of "c2_3". In this example, I chose "x4_n" as the format for the "x" to symbolize custom, and these data can be thought of as the fourth layer. That said, these values would typically be masked by the viewer of the table so they really just need to be unique - so you can choose whatever you want.
-## Appending Existing Tplyr Metadata
+## Appending Existing **Tplyr** Metadata
-Now that we've created our custom extension of the Tplyr metadata, let's extend the existing data frame. To do this, Tplyr has the function `append_metadata()`:
+Now that we've created our custom extension of the **Tplyr** metadata, let's extend the existing data frame. To do this, **Tplyr** has the function `append_metadata()`:
```{r extending metadata}
t <- append_metadata(t, eff_meta)
@@ -185,7 +185,7 @@ get_meta_subset(t, 'x4_1', "var1_Xanomeline High Dose") %>%
## Metadata Without Tplyr
-You very well may have a scenario where you want to use these metadata functions outside of Tplyr in general. As such, there are S3 methods available to query metadata from a dataframe instead of a Tplyr table, and parameters to provide your own target data frame:
+You very well may have a scenario where you want to use these metadata functions outside of **Tplyr** in general. As such, there are S3 methods available to query metadata from a dataframe instead of a **Tplyr** table, and parameters to provide your own target data frame:
```{r metadata without Tplyr}
get_meta_subset(eff_meta, 'x4_1', "var1_Xanomeline High Dose", target=adas) %>%
@@ -193,7 +193,7 @@ get_meta_subset(eff_meta, 'x4_1', "var1_Xanomeline High Dose", target=adas) %>%
kable()
```
-As with the Tplyr metadata, the only strict criteria here is that your custom metadata have a `row_id` column.
+As with the **Tplyr** metadata, the only strict criteria here is that your custom metadata have a `row_id` column.
## Tying it Together
diff --git a/vignettes/denom.Rmd b/vignettes/denom.Rmd
index dc6dfcf2..3caefcc2 100644
--- a/vignettes/denom.Rmd
+++ b/vignettes/denom.Rmd
@@ -35,7 +35,7 @@ Make sure you have a good understand of count and shift layers before you review
What do you do when your target dataset doesn't _have_ the information necessary to create your denominator? For example - when you create an adverse event table, the adverse event dataset likely only contains records for subjects who experienced an adverse event. But subjects who did _not_ have an adverse event are still part of the study population and must be considered in the denominator.
-For this reason, Tplyr allows lets you set a separate population dataset - but there are a couple things you need to do to trigger Tplyr to use the population data as your denominator.
+For this reason,**Tplyr** allows lets you set a separate population dataset - but there are a couple things you need to do to trigger **Tplyr** to use the population data as your denominator.
Consider these two examples.
@@ -141,7 +141,7 @@ tplyr_table(adlb, TRTA, where=PARAMCD == "CK") %>%
kable()
```
-In the example above, the denominators were based on the by and treatment variables, `TRTA`, `PARAM` and `VISIT`. This creates a 3 X 3 box, where the denominator is the total of all record within the **FROM** and **TO** shift variables, within each parameter, visit, and treatment. This is the default, and this is how 'Tplyr' will create the denominators if `set_denoms_by()` isn't specified.
+In the example above, the denominators were based on the by and treatment variables, `TRTA`, `PARAM` and `VISIT`. This creates a 3 X 3 box, where the denominator is the total of all record within the **FROM** and **TO** shift variables, within each parameter, visit, and treatment. This is the default, and this is how **Tplyr** will create the denominators if `set_denoms_by()` isn't specified.
In the next example, the percentage denominators are calculated row-wise, each row percentage sums to 100%.
@@ -179,7 +179,7 @@ Our hope is that this gives you the flexibility you need to structure your denom
There are some circumstances that you'll encounter where the filter used for a denominator needs to be different than the filter used to count. Disposition tables are an example of this, and we'll use that example to paint this picture.
-'Tplyr' offers you the ability to specifically control the filter used within the denominator. This is provided through the function `set_denom_where()`. The default for `set_denom_where()` is the layer level `where` parameter, if one was supplied. `set_denom_where()` allows you to replace this layer level filter with a custom filter of your choosing. This is done on top of any filtering specified in the `tplyr_table()` where parameter - which means that the `set_denom_where()` filter is applied _in addition to_ any table level filtering.
+**Tplyr**offers you the ability to specifically control the filter used within the denominator. This is provided through the function `set_denom_where()`. The default for `set_denom_where()` is the layer level `where` parameter, if one was supplied. `set_denom_where()` allows you to replace this layer level filter with a custom filter of your choosing. This is done on top of any filtering specified in the `tplyr_table()` where parameter - which means that the `set_denom_where()` filter is applied _in addition to_ any table level filtering.
Yeah we know - there are a lot of different places that filtering can happen...
@@ -238,16 +238,16 @@ We did one more other thing worth explaining in the example above - gave the mis
## Adding a 'Total' Row
-In addition to missing counts, some summaries require the addition of a 'Total' row. 'Tplyr' has the helper function `add_total_row()` to ease this process for you. Like most other things within 'Tplyr' - particularly in this vignette - this too has a significant bit of nuance to it.
+In addition to missing counts, some summaries require the addition of a 'Total' row. **Tplyr** has the helper function `add_total_row()` to ease this process for you. Like most other things within **Tplyr** - particularly in this vignette - this too has a significant bit of nuance to it.
Much of this functionality is similar to `set_missing_count()`. You're able to specify a different format for the total, but if not specified, the associated count layer's format will be used. You're able to set your own sort value to specify where you want the total row to sit.
More nuance comes in two places:
-- By default, `add_total_row()` *will count missing values*, but you can exclude those values using the `count_missings` parameter. 'Tplyr' will warn you when `set_count_missing()` has `denom_ignore` set to `TRUE`, `add_total_row()` has `count_missings` set to `TRUE` and the format contains a percentage. Why? Because if the denominator is ignoring missing values but you're still counting them in your total, the percentage shown can exceed 100%.
+- By default, `add_total_row()` *will count missing values*, but you can exclude those values using the `count_missings` parameter. **Tplyr** will warn you when `set_count_missing()` has `denom_ignore` set to `TRUE`, `add_total_row()` has `count_missings` set to `TRUE` and the format contains a percentage. Why? Because if the denominator is ignoring missing values but you're still counting them in your total, the percentage shown can exceed 100%.
- `add_total_row()` will throw a warning when a `by` variable is used, because it becomes ambiguous what total should be calculated. You can rectify this by using `set_denoms_by()`, which allows the user to control exactly which groups are used to form the denominator. This way the totals presented by `add_total_row()` will align with denominators specified in `set_denom_by()` and generate total rows that match the grouping of your denominator values.
-In the example below, we summarize age groups by sex. The denominators are determined by treatment group and sex, and since we are not excluding any values from the denominator, the total row ends up matching the denominator that was used. The 'Missing' row tells us the number of missing values, but because `count_missings` is set to `TRUE`, the missing counts are included in the total row. This probably isn't how you would choose to display things, but here we're trying to show the flexibility built into 'Tplyr'.
+In the example below, we summarize age groups by sex. The denominators are determined by treatment group and sex, and since we are not excluding any values from the denominator, the total row ends up matching the denominator that was used. The 'Missing' row tells us the number of missing values, but because `count_missings` is set to `TRUE`, the missing counts are included in the total row. This probably isn't how you would choose to display things, but here we're trying to show the flexibility built into **Tplyr**.
```{r}
adsl2 <- adsl
@@ -285,6 +285,6 @@ tplyr_table(adsl2, TRT01P) %>%
Now the table is more intuitive. We used `set_missing_count()` to update our denominators, so missing have been excluded. Now, the total row intuitively matches the denominators used within each group, and we can see how many missing records were excluded.
-_You may have stumbled upon this portion of the vignette while searching for how to create a total column. Tplyr allows you to do this as well with the function `add_total_group()` and read more in `vignette("table")`._
+_You may have stumbled upon this portion of the vignette while searching for how to create a total column. **Tplyr** allows you to do this as well with the function `add_total_group()` and read more in `vignette("table")`._
And that's it for denominators! Happy counting!
diff --git a/vignettes/desc.Rmd b/vignettes/desc.Rmd
index d45c7e5e..6a401a86 100644
--- a/vignettes/desc.Rmd
+++ b/vignettes/desc.Rmd
@@ -25,7 +25,7 @@ load("adlb.Rdata")
load("adsl.Rdata")
```
-Descriptive statistics in 'Tplyr' are created using `group_desc()` function when creating a layer. While `group_desc()` allows you to set your target, by variables, and filter criteria, a great deal of the control of the layer comes from `set_format_strings()` where the actual summaries are declared.
+Descriptive statistics in **Tplyr** are created using `group_desc()` function when creating a layer. While `group_desc()` allows you to set your target, by variables, and filter criteria, a great deal of the control of the layer comes from `set_format_strings()` where the actual summaries are declared.
```{r intro}
tplyr_table(adsl, TRT01P) %>%
@@ -48,20 +48,20 @@ Let's walk through this call to `set_format_strings` to understand in detail wha
1) The quoted strings on the left of the '=' within `set_format_strings()` become the row label in the output. This allows you to define some custom text in `set_format_strings()` to explain the summary that is presented on the associated row. This text is fully in your control.
2) On the right side of each equals is a call to `f_str()`. As explained in the `vignette("Tplyr")`, this is an object that captures a lot of metadata to understand how the strings should be presented.
-3) Within the `f_str()` call, you see x's in quotes. This defines how you'd like the numbers formatted from the resulting summaries. The number of x's you use on the left side of a decimal control the space allotted for an integer, and the right side controls the decimal precision. Decimals are rounded prior to string formatting - so no need to worry about that. Note that this forcefully sets the decimal and integer precision - 'Tplyr' can automatically determine this for you as well, but more on that later.
+3) Within the `f_str()` call, you see x's in quotes. This defines how you'd like the numbers formatted from the resulting summaries. The number of x's you use on the left side of a decimal control the space allotted for an integer, and the right side controls the decimal precision. Decimals are rounded prior to string formatting - so no need to worry about that. Note that this forcefully sets the decimal and integer precision - **Tplyr** can automatically determine this for you as well, but more on that later.
4) After the x's there are unquoted variable names. This is where you specify the actual summaries that will be performed. Notice that some `f_str()` calls have two summaries specified. This allows you to put two summaries in the same string and present them on the same line.
-But where do these summary names come from? And which ones does 'Tplyr' have?
+But where do these summary names come from? And which ones does **Tplyr** have?
## Built-in Summaries
-We've built a number of default summaries into 'Tplyr', which allows you to perform these summaries without having to specify the functions to calculate them yourself. The summaries built in to 'Tplyr' are listed below. In the second column are the names that you would use within an `f_str()` call to use them. In the third column, we have the syntax used to make the function call.
+We've built a number of default summaries into **Tplyr**, which allows you to perform these summaries without having to specify the functions to calculate them yourself. The summaries built in to **Tplyr** are listed below. In the second column are the names that you would use within an `f_str()` call to use them. In the third column, we have the syntax used to make the function call.
```{r varnames, echo=FALSE}
x <- data.frame(
Statistic = c('N', 'Mean', "Standard Deviation", "Median", "Variance", "Minimum",
"Maximum", "Interquartile Range", "Q1", "Q3", "Missing"),
- `Variable Names` = c("n", "mean", "sd", "median", "variance", "min", "max", "iqr", "q1", "q3", "missing"),
+ `Variable Names` = c("n", "mean", "sd", "median", "var", "min", "max", "iqr", "q1", "q3", "missing"),
`Function Call` = c("n()", "mean(.var, na.rm=TRUE)", "sd(.var, na.rm=TRUE)", "median(.var, na.rm=TRUE)",
"var(.var, na.rm=TRUE)", "min(.var, na.rm=TRUE)", "max(.var, na.rm=TRUE)",
"IQR(.var, na.rm=TRUE, type=getOption('tplyr.quantile_type')",
@@ -121,18 +121,18 @@ options(tplyr.quantile_type=7)
It's up to you to determine which algorithm you should use - but we found it necessary to provide you with the flexibility to change this within the default summaries.
-But what if 'Tplyr' doesn't offer you the summaries that you need?
+But what if **Tplyr** doesn't offer you the summaries that you need?
## Custom Summaries
-We understand that our defaults may not cover every descriptive statistic that you'd like to see. That's why we've opened to door to creating custom summaries. Custom summaries allow you to provide any function you'd like into a desc layer. You can focus on the derivation and how to calculate the number you want to see. 'Tplyr' can consume this function, and use all the existing tools within 'Tplyr' to produce the string formatted result alongside any of the default summaries we provide as well.
+We understand that our defaults may not cover every descriptive statistic that you'd like to see. That's why we've opened to door to creating custom summaries. Custom summaries allow you to provide any function you'd like into a desc layer. You can focus on the derivation and how to calculate the number you want to see. **Tplyr** can consume this function, and use all the existing tools within **Tplyr** to produce the string formatted result alongside any of the default summaries we provide as well.
Custom summaries may be provided in two ways:
- Through the `tplyr.custom_summaries` option set at your session level
- Through the function `set_custom_summaries()` at the layer level
-As with any other setting in 'Tplyr', the layer setting will always take precedence over any other setting.
+As with any other setting in **Tplyr**, the layer setting will always take precedence over any other setting.
Let's look at an example.
@@ -158,9 +158,9 @@ Here, a few important things are demonstrated:
- The parameter names in `set_custom_summaries()`, or names on the left side of the equals, flow into `set_format_strings()` in the `f_str()` calls. Just like the default summaries, `geometric_mean` becomes the name that you refer to in order to use the geometric mean derivation in a summary.
- In geometric mean, the target variable that you're summarizing is referred to as `.var`. This may not seem intuitive. The reason we have to use `.var` is so that, like in this example, the custom function can be applied to each of the separate target variables.
-Another note about custom summaries is that you're able to overwrite the default summaries built into 'Tplyr' as well. Don't like the default summary functions that we provide? Use the `tplyr.custom_summaries` option to overwrite them in your session, and add any new ones that you would like to include.
+Another note about custom summaries is that you're able to overwrite the default summaries built into **Tplyr** as well. Don't like the default summary functions that we provide? Use the `tplyr.custom_summaries` option to overwrite them in your session, and add any new ones that you would like to include.
-For example, here we use the 'Tplyr' default mean.
+For example, here we use the **Tplyr** default mean.
```{r custom_options}
tplyr_table(adsl, TRT01P) %>%
@@ -190,15 +190,15 @@ tplyr_table(adsl, TRT01P) %>%
kable()
```
-Note that the table code used to produce the output is the same. Now 'Tplyr' used the custom summary function for `mean` as specified in the `tplyr.custom_summaries` option. Also note the use of `rlang::quos()`. We've done our best to mask this from the user everywhere possible and make the interfaces clean and intuitive, but a great deal of 'Tplyr' is built using 'rlang' and non-standard evaluation. Within this option is one of the very few instances where a user needs to concern themselves with the use of quosures. If you'd like to learn more about non-standard evaluation and quosures, we recommend [Section IV](https://adv-r.hadley.nz/metaprogramming.html) in Advanced R.
+Note that the table code used to produce the output is the same. Now **Tplyr** used the custom summary function for `mean` as specified in the `tplyr.custom_summaries` option. Also note the use of `rlang::quos()`. We've done our best to mask this from the user everywhere possible and make the interfaces clean and intuitive, but a great deal of **Tplyr** is built using 'rlang' and non-standard evaluation. Within this option is one of the very few instances where a user needs to concern themselves with the use of quosures. If you'd like to learn more about non-standard evaluation and quosures, we recommend [Section IV](https://adv-r.hadley.nz/metaprogramming.html) in Advanced R.
## Formatting
-A lot of the nuance to formatting descriptive statistics layers has already been covered above, but there are a couple more tricks to getting the most out of 'Tplyr'. One of these tricks is filling empty values.
+A lot of the nuance to formatting descriptive statistics layers has already been covered above, but there are a couple more tricks to getting the most out of **Tplyr**. One of these tricks is filling empty values.
By default, if there is no available value for a summary in a particular observation, the result being presented will be blanked out.
-_Note: Tplyr generally respects factor levels - so in instances of a missing row or column group, if the factor level is present, then the variable or row will still generate)_
+_Note: **Tplyr**generally respects factor levels - so in instances of a missing row or column group, if the factor level is present, then the variable or row will still generate)_
```{r missing}
adsl$TRT01P <- as.factor(adsl$TRT01P)
@@ -220,7 +220,7 @@ tplyr_table(adlb_2, TRTA) %>%
kable()
```
-Note how the entire example above has all records in `var1_Placebo` missing. 'Tplyr' gives you control over how you fill this space. Let's say that we wanted instead to make that space say "Missing". You can control this with the `f_str()` object using the `empty` parameter.
+Note how the entire example above has all records in `var1_Placebo` missing. **Tplyr** gives you control over how you fill this space. Let's say that we wanted instead to make that space say "Missing". You can control this with the `f_str()` object using the `empty` parameter.
```{r missing1}
tplyr_table(adlb_2, TRTA) %>%
@@ -258,7 +258,7 @@ In the example above, instead of filling the whole space, the `empty` text of "N
You may have noticed that the approach to formatting covered so far leaves a lot to be desired. Consider analyzing lab results, where you may want precision to vary based on the collected precision of the tests. Furthermore, depending on the summary being presented, you may wish to increase the precision further. For example, you may want the mean to be at collected precision +1 decimal place, and for standard deviation +2.
-'Tplyr' has this covered using auto-precision. Auto-precision allows you to format your numeric summaries based on the precision of the data collected. This has all been built into the format strings, because a natural place to specify your desired format is where you specify how you want your data presented. If you wish to use auto-precision, use `a` instead of `x` when creating your summaries. Note that only one `a` is needed on each side of a decimal. To use increased precision, use `a+n` where `n` is the number of additional spaces you wish to add.
+**Tplyr** has this covered using auto-precision. Auto-precision allows you to format your numeric summaries based on the precision of the data collected. This has all been built into the format strings, because a natural place to specify your desired format is where you specify how you want your data presented. If you wish to use auto-precision, use `a` instead of `x` when creating your summaries. Note that only one `a` is needed on each side of a decimal. To use increased precision, use `a+n` where `n` is the number of additional spaces you wish to add.
```{r autoprecision1}
tplyr_table(adlb, TRTA) %>%
@@ -274,7 +274,7 @@ tplyr_table(adlb, TRTA) %>%
kable()
```
-As you can see, the decimal precision is now varying depending on the test being performed. Notice that both the integer and the decimal side of each number fluctuate as well. `Tpylr` collects both the integer and decimal precision, and you can specify both separately. For example, you could use `x`'s to specify a default number of spaces for your integers that are used consistently across by variables, but vary the decimal precision based on collected data. You can also increment the number of spaces for both integer and decimal separately.
+As you can see, the decimal precision is now varying depending on the test being performed. Notice that both the integer and the decimal side of each number fluctuate as well. **Tplyr** collects both the integer and decimal precision, and you can specify both separately. For example, you could use `x`'s to specify a default number of spaces for your integers that are used consistently across by variables, but vary the decimal precision based on collected data. You can also increment the number of spaces for both integer and decimal separately.
But - this is kind of ugly, isn't it? Do we really need all 6 decimal places collected for CA? For this reason, you're able to set a cap on the precision that's displayed:
@@ -318,9 +318,9 @@ Three variables are being summarized here - AVAL, CHG, and BASE. So which should
### External Precision
-Lastly, while dynamic precision might be what you're looking for, you may not want precision driven by the data. Perhaps there's a company standard that dictates what decimal precision should be used for each separate lab test. Maybe even deeper down to the lab test and category. New in Tplyr 1.0.0 we've added the ability to take decimal precision from an external source.
+Lastly, while dynamic precision might be what you're looking for, you may not want precision driven by the data. Perhaps there's a company standard that dictates what decimal precision should be used for each separate lab test. Maybe even deeper down to the lab test and category. New in **Tplyr** 1.0.0 we've added the ability to take decimal precision from an external source.
-The principal of external precision is exactly the same as auto-precision. The only difference is that you - the user - provide the precision table that Tplyr was automatically calculating in the background. This is done using the new function `set_precision_data()`. In the output below, Notice how the precision by PARAMCD varies depending on what was specified in the data frame `prec_data`.
+The principal of external precision is exactly the same as auto-precision. The only difference is that you - the user - provide the precision table that **Tplyr** was automatically calculating in the background. This is done using the new function `set_precision_data()`. In the output below, Notice how the precision by PARAMCD varies depending on what was specified in the data frame `prec_data`.
```{r external-precision}
@@ -350,7 +350,7 @@ tplyr_table(adlb, TRTA) %>%
```
-If one of your by variable groups are missing in the precision data, Tplyr can default back to using auto-precision by using the option `default=auto`.
+If one of your by variable groups are missing in the precision data, **Tplyr** can default back to using auto-precision by using the option `default=auto`.
```{r external-precision2}
prec_data <- tibble::tribble(
diff --git a/vignettes/layer_templates.Rmd b/vignettes/layer_templates.Rmd
index 15952a96..3c1bd5f9 100644
--- a/vignettes/layer_templates.Rmd
+++ b/vignettes/layer_templates.Rmd
@@ -20,7 +20,7 @@ library(knitr)
load('adsl.Rdata')
```
-There are several scenarios where a layer template may be useful. Some tables, like demographics tables, may have many layers that will all essentially look the same. Categorical variables will have the same count layer settings, and continuous variables will have the same desc layer settings. A template allows a user to build those settings once per layer, then reference the template when the Tplyr table is actually built. Another scenario might be building a set of company layer templates that are built for standard tables to reduce the footprint of code across analyses. In either of these cases, the idea is the reduce the amount of redundant code necessary to create a table.
+There are several scenarios where a layer template may be useful. Some tables, like demographics tables, may have many layers that will all essentially look the same. Categorical variables will have the same count layer settings, and continuous variables will have the same desc layer settings. A template allows a user to build those settings once per layer, then reference the template when the **Tplyr** table is actually built. Another scenario might be building a set of company layer templates that are built for standard tables to reduce the footprint of code across analyses. In either of these cases, the idea is the reduce the amount of redundant code necessary to create a table.
Tplyr has already has a couple of mechanisms to reduce redundant application of formats. For example, `vignettes('tplyr_options')` shows how the options `tplyr.count_layer_default_formats`, `tplyr.desc_layer_default_formats`, and `tplyr.shift_layer_default_formats` can be used to create default format string settings. Additionally, you can set formats table wide using `set_count_layer_formats()`, `set_desc_layer_formats()`, or `set_shift_layer_formats()`. But what these functions and options _don't_ allow you to do is pre-set and reuse the settings for an entire layer, so all of the additional potential layer modifying functions are ignored. This is where layer templates come in.
@@ -38,7 +38,7 @@ new_layer_template(
)
```
-In this example, we've created a basic layer template. The template is named "example_template", and this is the name we'll use to reference the template when we want to use it. When the template is created, we start with the function `group_count(...)`. Note the use of the ellipsis (i.e. `...`). This is a required part of a layer template. Templates must start with a Tplyr layer constructor, which is one of the function `group_count()`, `group_desc()`, or `group_shift()`. The ellipsis is necessary because when the template is used, we are able to pass arguments directly into the layer constructor. For example:
+In this example, we've created a basic layer template. The template is named "example_template", and this is the name we'll use to reference the template when we want to use it. When the template is created, we start with the function `group_count(...)`. Note the use of the ellipsis (i.e. `...`). This is a required part of a layer template. Templates must start with a **Tplyr** layer constructor, which is one of the function `group_count()`, `group_desc()`, or `group_shift()`. The ellipsis is necessary because when the template is used, we are able to pass arguments directly into the layer constructor. For example:
```{r using a template}
tplyr_table(adsl, TRT01P) %>%
@@ -51,7 +51,7 @@ tplyr_table(adsl, TRT01P) %>%
Within `use_template()`, the first parameter is the template name. After that, we supply arguments as we normally would into `group_count()`, `group_desc()`, or `group_shift()`. Additionally, note that our formats have been applied just as they would be if we used `set_format_strings()` as specified in the template. Our template was applied, the table built with all of the settings appropriately.
-An additional feature of layer templates is that they act just as any other function would in a Tplyr layer. This means that they're also extensible and can be expanded on directly within a Tplyr table. For example:
+An additional feature of layer templates is that they act just as any other function would in a **Tplyr** layer. This means that they're also extensible and can be expanded on directly within a **Tplyr** table. For example:
```{r extending a template}
tplyr_table(adsl, TRT01P) %>%
@@ -63,7 +63,7 @@ tplyr_table(adsl, TRT01P) %>%
kable()
```
-Here we show two things - first, that the we called the template without the by variable argument from the previous example. This allows a template to have some flexibility depending on the context of its usage. Furthermore, we added the additional modifier function `add_total_row()`. In this example, we took the layer as constructed by the template and then modified that layer further. This may be useful if most but not all of a layer is reusable. The reusable portions can be put in a template, and the rest added using normal Tplyr syntax.
+Here we show two things - first, that the we called the template without the by variable argument from the previous example. This allows a template to have some flexibility depending on the context of its usage. Furthermore, we added the additional modifier function `add_total_row()`. In this example, we took the layer as constructed by the template and then modified that layer further. This may be useful if most but not all of a layer is reusable. The reusable portions can be put in a template, and the rest added using normal **Tplyr** syntax.
## Templates With Parameters
@@ -78,7 +78,7 @@ new_layer_template("example_params",
)
```
-In this example, we create a template similar to the first example. But now we add two more modifying functions, `set_order_count_method()` and `set_ordering_cols()`. Within these functions, we've supplied interchangeable parameters to the template function, which are `sort_meth` and `sort_col`. In a Tplyr layer template, these parameters are supplied using curly brackets (i.e. {}).
+In this example, we create a template similar to the first example. But now we add two more modifying functions, `set_order_count_method()` and `set_ordering_cols()`. Within these functions, we've supplied interchangeable parameters to the template function, which are `sort_meth` and `sort_col`. In a **Tplyr** layer template, these parameters are supplied using curly brackets (i.e. {}).
To specify these arguments when using the templater, we use the `use_template()` argument `add_params`. For example:
diff --git a/vignettes/metadata.Rmd b/vignettes/metadata.Rmd
index c7325882..9edf5cf4 100644
--- a/vignettes/metadata.Rmd
+++ b/vignettes/metadata.Rmd
@@ -25,9 +25,9 @@ library(knitr)
load("adsl.Rdata")
```
-Tplyr has a bit of a unique design, which might feel a bit weird as you get used to the package. The process flow of building a `tplyr_table()` object first, and then using `build()` to construct the data frame is different than programming in the tidyverse, or creating a ggplot. Why create the `tplyr_table()` object first? Why is the `tplyr_table()` object different than the resulting data frame?
+**Tplyr** has a bit of a unique design, which might feel a bit weird as you get used to the package. The process flow of building a `tplyr_table()` object first, and then using `build()` to construct the data frame is different than programming in the tidyverse, or creating a ggplot. Why create the `tplyr_table()` object first? Why is the `tplyr_table()` object different than the resulting data frame?
-The purpose of the `tplyr_table()` object is to let Tplyr do more than just summarize data. As you build the table, all of the metadata around the table being built is maintained - the target variables being summarized, the grouped variables by row and column, the filter conditions necessary applied to the table and each layer. As a user, you provide this information to create the summary. But what about after the results are produced? Summarizing data inevitably leads to new questions. Within clinical summaries, you may want to know which subjects experienced an adverse event, or why the lab summaries of a particular visit's descriptive statistics are abnormal. Normally, you'd write a query to recreate the data that lead to that particular summary. Tplyr now allows you to immediately extract the input data or metadata that created an output result, thus providing traceability from the result back to the source.
+The purpose of the `tplyr_table()` object is to let **Tplyr** do more than just summarize data. As you build the table, all of the metadata around the table being built is maintained - the target variables being summarized, the grouped variables by row and column, the filter conditions necessary applied to the table and each layer. As a user, you provide this information to create the summary. But what about after the results are produced? Summarizing data inevitably leads to new questions. Within clinical summaries, you may want to know which subjects experienced an adverse event, or why the lab summaries of a particular visit's descriptive statistics are abnormal. Normally, you'd write a query to recreate the data that lead to that particular summary. **Tplyr** now allows you to immediately extract the input data or metadata that created an output result, thus providing traceability from the result back to the source.
## Generating the Metadata
@@ -47,7 +47,7 @@ dat <- t %>% build(metadata=TRUE)
kable(dat)
```
-To trigger the creation of metadata, the `build()` function has a new argument `metadata`. By specifying `TRUE`, the underlying metadata within Tplyr are prepared in an extractable format. This is the only action a user needs to specify for this action to take place.
+To trigger the creation of metadata, the `build()` function has a new argument `metadata`. By specifying `TRUE`, the underlying metadata within **Tplyr** are prepared in an extractable format. This is the only action a user needs to specify for this action to take place.
When the `metadata` argument is used, a new column will be produced in the output dataframe called `row_id`. The `row_id` variable provides a persistent reference to a row of interest, even if the output dataframe is sorted. If you review `vignette("styled-table")`, note that we expect a certain amount of post processing and styling of the built data frame from Tplyr, to let you use whatever other packages you prefer. As such, this reference ID is necessary.
@@ -60,7 +60,7 @@ get_meta_subset(t, 'c2_1', 'var1_Placebo') %>%
kable()
```
-By using the `row_id` and column, the dataframe is pulled right out for us. Notice that `USUBJID` was included by default, even though Tplyr there's no reference anywhere in the `tplyr_table()` to the variable `USUBJID`. This is because `get_meta_subset()` has an additional argument `add_cols` that allows you to specify additional columns you want included in the resulting dataframe, and has a default of USUBJID. So let's say we want additionally include the variable `SEX`.
+By using the `row_id` and column, the dataframe is pulled right out for us. Notice that `USUBJID` was included by default, even though **Tplyr** there's no reference anywhere in the `tplyr_table()` to the variable `USUBJID`. This is because `get_meta_subset()` has an additional argument `add_cols` that allows you to specify additional columns you want included in the resulting dataframe, and has a default of USUBJID. So let's say we want additionally include the variable `SEX`.
```{r add_vars}
get_meta_subset(t, 'c2_1', 'var1_Placebo', add_cols = vars(USUBJID, SEX)) %>%
@@ -89,7 +89,7 @@ To extract the dataframe in `get_meta_subset()`, the metadata of the result cell
get_meta_result(t, 'd1_2', 'var1_Xanomeline High Dose')
```
-The resulting output is a new object Tplyr called `tplyr_meta()`. This is a container of a relevent metadata for a specific result. The object itself is a list with two elements: `names` and `filters`.
+The resulting output is a new object **Tplyr** called `tplyr_meta()`. This is a container of a relevent metadata for a specific result. The object itself is a list with two elements: `names` and `filters`.
The `names` element contains quosures for each variable relevant to a specific result. This will include the target variable, the `by` variables used on the layer, the `cols` variables used on the table, and all variables included in any filter condition relevant to create the result.
diff --git a/vignettes/options.Rmd b/vignettes/options.Rmd
index 0d26b57e..529c5cd5 100644
--- a/vignettes/options.Rmd
+++ b/vignettes/options.Rmd
@@ -27,19 +27,19 @@ load("adlb.Rdata")
op <- options()
```
-One thing that we wanted to build into 'Tplyr' to make it a user friendly package is the ability to eliminate redundant code where possible. This is why there are several options available in 'Tplyr' to allow you to control things at your session level instead of each individual table, or even each individual layer.
+One thing that we wanted to build into **Tplyr** to make it a user friendly package is the ability to eliminate redundant code where possible. This is why there are several options available in **Tplyr** to allow you to control things at your session level instead of each individual table, or even each individual layer.
-The following are the options available in 'Tplyr' and their descriptions:
+The following are the options available in **Tplyr** and their descriptions:
```{r options_table, echo=FALSE}
suppressMessages(read_csv('tplyr_options.csv')) %>%
kable(align="cl", col.names = c('Option', 'Description'))
```
-Each of these options allows you to set these settings in one place, and every 'Tplyr' table you create will inherit your option settings as defaults. This allows your table code to be more concise, and centralizes where you need to make an update when your code has to be adjusted. This vignette is dedicated to helping you understand how to leverage each of these options properly.
+Each of these options allows you to set these settings in one place, and every **Tplyr** table you create will inherit your option settings as defaults. This allows your table code to be more concise, and centralizes where you need to make an update when your code has to be adjusted. This vignette is dedicated to helping you understand how to leverage each of these options properly.
## Default Layer Formats
-Declaring string formats and summaries that need to be performed is one of the more verbose parts of 'Tplyr'. Furthermore, this is something that will often be fairly consistent within a study, as you'll likely want to look across a consistent set of descriptive statistics, or your count/shift tables will likely require the same sort of "n (%)" formatting.
+Declaring string formats and summaries that need to be performed is one of the more verbose parts of **Tplyr**. Furthermore, this is something that will often be fairly consistent within a study, as you'll likely want to look across a consistent set of descriptive statistics, or your count/shift tables will likely require the same sort of "n (%)" formatting.
Using the format options is very similar to setting a string format. The only difference is that you need to enter the string formats as a named list instead of as separate parameters to a function call.
@@ -64,7 +64,7 @@ options(
```
-Here you can see that 'Tplyr' picks up these option changes. In the table below, we didn't use `set_format_strings()` anywhere - instead we let 'Tplyr' pick up the default formats from the options.
+Here you can see that **Tplyr** picks up these option changes. In the table below, we didn't use `set_format_strings()` anywhere - instead we let **Tplyr** pick up the default formats from the options.
```{r default_formats2}
tplyr_table(adsl, TRT01P) %>%
@@ -81,11 +81,11 @@ tplyr_table(adsl, TRT01P) %>%
One important thing to understand about how these options work in particular is the scoping.
-- 'Tplyr' options have the broadest scope, as they work across the entire session for any table.
-- Setting formats at the `tplyr_table()` level will override the 'Tplyr' options and extends to any layer of the specified type in the current table.
-- Setting formats at the `tplyr_layer()` level will always be prioritized over 'Tplyr' options and any `tplyr_table()` formats set. This has the narrowest scope and will always be used when specified.
+- **Tplyr** options have the broadest scope, as they work across the entire session for any table.
+- Setting formats at the `tplyr_table()` level will override the **Tplyr** options and extends to any layer of the specified type in the current table.
+- Setting formats at the `tplyr_layer()` level will always be prioritized over **Tplyr** options and any `tplyr_table()` formats set. This has the narrowest scope and will always be used when specified.
-To demonstrate, consider the following. The 'Tplyr' options remain set from the block above.
+To demonstrate, consider the following. The **Tplyr** options remain set from the block above.
```{r scoping1}
tplyr_table(adsl, TRT01P) %>%
@@ -111,11 +111,11 @@ In the above output:
- The first count layer for "Categorical Age Groups" uses the specified table default using `set_count_layer_formats()`
- The second count layer for "Ethnicity" uses the layer level format specified using `set_format_strings()`
-Each of the outputs ignores the 'Tpylr' option defaults.
+Each of the outputs ignores the **Tplyr** option defaults.
## Precision Cap
-'Tplyr' defaults to avoiding capping precision. We do this because capping precision should be a conscious decision, where you as the user specifically set the limit of how many decimal places are relevant to a specific result. One way to cap precision is by using the `cap` parameter within `set_format_strings()`. But perhaps you have a specific limit to what you'd like to see on any output. Here, we offer the `tplyr.precision_cap` option to set whatever cap you wish.
+**Tplyr** defaults to avoiding capping precision. We do this because capping precision should be a conscious decision, where you as the user specifically set the limit of how many decimal places are relevant to a specific result. One way to cap precision is by using the `cap` parameter within `set_format_strings()`. But perhaps you have a specific limit to what you'd like to see on any output. Here, we offer the `tplyr.precision_cap` option to set whatever cap you wish.
```{r precision_cap1}
options(tplyr.precision_cap = c('int'=2, 'dec'=2))
@@ -148,11 +148,11 @@ Both layers in the above example are summarizing the same data. The top layer is
- Integers will not be truncated - rather, the auto precision will default how much padding should be applied. If the number exceeds that number of spaces, then the length will be extended (i.e. if 2 integer places are allotted, 100 will still consume 3 places).
- The cap applies to the spaces allotted by the 'a'. If the cap is 2, 'a+1' will not exceed 3 spaces.
-The bottom layer overrides the 'Tplyr' option. Instead, integers are capped at 1 space, and decimals are capped at 0.
+The bottom layer overrides the **Tplyr** option. Instead, integers are capped at 1 space, and decimals are capped at 0.
## Custom Summaries
-Custom summaries allow you to extend the capabilities of descriptive statistics layers in 'Tplyr'. Maybe our defaults don't work how you'd like them to, or maybe you have some custom functions within your organization that you commonly would like to use. Specifying the custom summaries you wish to use in every table would prove quite tedious - therefore, the `tplyr.custom_summaries` option is a better choice.
+Custom summaries allow you to extend the capabilities of descriptive statistics layers in **Tplyr**. Maybe our defaults don't work how you'd like them to, or maybe you have some custom functions within your organization that you commonly would like to use. Specifying the custom summaries you wish to use in every table would prove quite tedious - therefore, the `tplyr.custom_summaries` option is a better choice.
```{r custom_summaries1}
options(tplyr.custom_summaries = rlang::quos(
@@ -161,9 +161,9 @@ options(tplyr.custom_summaries = rlang::quos(
```
-Note that the table code used to produce the output is the same. Now 'Tplyr' used the custom summary function for `geometric_mean` as specified in the `tplyr.custom_summaries` option. Also note the use of `rlang::quos()`. We've done our best to mask this from the user everywhere possible and make the interfaces clean and intuitive, but a great deal of 'Tplyr' is built using 'rlang' and non-standard evaluation. Within this option is one of the very few instances where a user needs to concern themselves with the use of quosures. If you'd like to learn more about non-standard evaluation and quosures, we recommend [Section IV](https://adv-r.hadley.nz/metaprogramming.html) in Advanced R.
+Note that the table code used to produce the output is the same. Now **Tplyr** used the custom summary function for `geometric_mean` as specified in the `tplyr.custom_summaries` option. Also note the use of `rlang::quos()`. We've done our best to mask this from the user everywhere possible and make the interfaces clean and intuitive, but a great deal of **Tplyr** is built using 'rlang' and non-standard evaluation. Within this option is one of the very few instances where a user needs to concern themselves with the use of quosures. If you'd like to learn more about non-standard evaluation and quosures, we recommend [Section IV](https://adv-r.hadley.nz/metaprogramming.html) in Advanced R.
-Now that geometric mean is set within the 'Tplyr' options, you can use it within your descriptive statistics layers, just like it was one of the built-in summaries.
+Now that geometric mean is set within the **Tplyr** options, you can use it within your descriptive statistics layers, just like it was one of the built-in summaries.
```{r custom_summaries2}
tplyr_table(adsl, TRT01P) %>%
@@ -195,7 +195,7 @@ options(scipen = -1) # Only require 3 decimal places
.001
```
-In 'Tplyr', we have the option `tplyr.scipen`. This is the `scipen` setting that will be used _only_ while the 'Tplyr' table is being built. This allows you to use a different `scipen` setting within 'Tplyr' than your R session. The default value we use in 'Tplyr' is 9999, which is intended to totally prevent numbers from switching to scientific notation. We want this to be a conscious decision that you make in order to prevent any unexpected outputs.
+In **Tplyr**, we have the option `tplyr.scipen`. This is the `scipen` setting that will be used _only_ while the **Tplyr** table is being built. This allows you to use a different `scipen` setting within **Tplyr** than your R session. The default value we use in **Tplyr** is 9999, which is intended to totally prevent numbers from switching to scientific notation. We want this to be a conscious decision that you make in order to prevent any unexpected outputs.
```{r reset_settings1, include=FALSE}
options(op)
@@ -218,7 +218,7 @@ suppressWarnings(build(t)) %>% # Chi-squared warnings occur with small samples
options(op)
```
-Note that the risk-difference variables above have mostly shifted to scientific notation. This is because the limit has been shifted to .1 within the 'Tplyr' build.
+Note that the risk-difference variables above have mostly shifted to scientific notation. This is because the limit has been shifted to .1 within the **Tplyr** build.
## Quantile Algorithms
@@ -263,7 +263,7 @@ tplyr_table(adsl, TRT01P) %>%
## IBM Rounding
-In certain cases users may want to match tables produced by other languages that IBM rounding. Tplyr offers the option 'tplyr.IBMRounding' to change the default rounding behavior of Tplyr tables. Review var1_4 in the tables below.
+In certain cases users may want to match tables produced by other languages that IBM rounding.**Tplyr** offers the option 'tplyr.IBMRounding' to change the default rounding behavior of **Tplyr** tables. Review var1_4 in the tables below.
Using the default R behavior
```{r rounding_1}
diff --git a/vignettes/riskdiff.Rmd b/vignettes/riskdiff.Rmd
index cf316129..296338da 100644
--- a/vignettes/riskdiff.Rmd
+++ b/vignettes/riskdiff.Rmd
@@ -25,9 +25,9 @@ load("adae.Rdata")
load("adsl.Rdata")
```
-'Tplyr' does not support, nor do we intend to support, a wide array of statistical methods. Our goal is rather to take your focus as an analyst off the mundane summaries so you can focus on the interesting analysis. That said, there are some things that are common enough that we feel that it's reasonable for us to include. So let's take a look at risk difference.
+**Tplyr** does not support, nor do we intend to support, a wide array of statistical methods. Our goal is rather to take your focus as an analyst off the mundane summaries so you can focus on the interesting analysis. That said, there are some things that are common enough that we feel that it's reasonable for us to include. So let's take a look at risk difference.
-## 'Tplyr' Implementation
+## **Tplyr** Implementation
Our current implementation of risk difference is solely built on top of the base R function `stats::prop.test()`. For any and all questions about this method, please review the `stats::prop.test()` documentation within R.
@@ -197,7 +197,7 @@ suppressWarnings(build(t)) %>%
## Getting Raw Numbers
-Just like you can get the numeric data from a 'Tplyr' layer with `get_numeric_data()`, we've also opened up the door to extract the raw numeric data from risk difference calculations as well. This is done using the function `get_stats_data()`. The function interface is almost identical to `get_numeric_data()`, except for the extra parameter of `statistic`. Although risk difference is the only statistic implemented in 'Tplyr' at the moment (outside of descriptive statistics), we understand that there are multiple methods to calculate risk difference, so we've built risk difference in a way that it could be expanded to easily add new methods in the future. And therefore, `get_stats_data()` the `statistic` parameter to allow you to differentiate in the situation where there are multiple statistical methods applied to the layer.
+Just like you can get the numeric data from a **Tplyr** layer with `get_numeric_data()`, we've also opened up the door to extract the raw numeric data from risk difference calculations as well. This is done using the function `get_stats_data()`. The function interface is almost identical to `get_numeric_data()`, except for the extra parameter of `statistic`. Although risk difference is the only statistic implemented in **Tplyr** at the moment (outside of descriptive statistics), we understand that there are multiple methods to calculate risk difference, so we've built risk difference in a way that it could be expanded to easily add new methods in the future. And therefore, `get_stats_data()` the `statistic` parameter to allow you to differentiate in the situation where there are multiple statistical methods applied to the layer.
The output of `get_stats_data()` depends on what parameters have been used:
diff --git a/vignettes/shift.Rmd b/vignettes/shift.Rmd
index 8905c60e..05094358 100644
--- a/vignettes/shift.Rmd
+++ b/vignettes/shift.Rmd
@@ -50,7 +50,7 @@ tplyr_table(adlb, TRTA, where=PARAMCD == "CK") %>%
kable()
```
-First, let's look at the differences in the shift API. Shift layers *must* take a row and a column variable, as the layer is designed to create a box for you that explains the changes in state. The row variable will typically be your "from" variable, and the column variable will typically be your "to" variable. Behind the scenes, Tplyr breaks this down for you to properly count and present the data.
+First, let's look at the differences in the shift API. Shift layers *must* take a row and a column variable, as the layer is designed to create a box for you that explains the changes in state. The row variable will typically be your "from" variable, and the column variable will typically be your "to" variable. Behind the scenes, **Tplyr** breaks this down for you to properly count and present the data.
For the most part, the last example gets us where we want to go - but there's still some that's left to be desired. It doesn’t look like there are any 'L' values for BNRIND in the dataset so we are not getting and rows containing 'L'. Let’s see if we can fix that by dummying in the possible values.
diff --git a/vignettes/sort.Rmd b/vignettes/sort.Rmd
index 5a4dcc25..57d3a38e 100644
--- a/vignettes/sort.Rmd
+++ b/vignettes/sort.Rmd
@@ -25,7 +25,7 @@ load("adae.Rdata")
load("adlb.Rdata")
```
-At surface level - sorting a table may seem easy, and in many cases it is. But in a handful of cases it can get quite tricky, with some odd situations that need to be handled carefully. For this reason, we found it necessary to dedicate an entire vignette to just sorting and handling columns output by 'Tplyr'.
+At surface level - sorting a table may seem easy, and in many cases it is. But in a handful of cases it can get quite tricky, with some odd situations that need to be handled carefully. For this reason, we found it necessary to dedicate an entire vignette to just sorting and handling columns output by **Tplyr**.
Let's start by looking at an example.
@@ -57,7 +57,7 @@ Now let's dig in.
### Ordering Helpers
-Ordering helpers are columns added into 'Tplyr' tables to make sure that you can sort the display to your preference. In general, 'Tplyr' will create:
+Ordering helpers are columns added into **Tplyr** tables to make sure that you can sort the display to your preference. In general, **Tplyr** will create:
- One order variable to order layers
- One order variable for each by variable
@@ -100,7 +100,7 @@ For more information, it's well worth your time to familiarize yourself with the
## Sorting the Layers
-Layers are one of the fundamental building blocks of 'Tplyr'. Each layer executes independently, and at the end of a build they're bound together. The `ord_layer_index` variable allows you differentiate and sort layers after the table is built. Layers are indexed in the order in which they were added to the table using `add_layer()` or `add_layers()`. For example, let's say you wanted to reverse the order of the layers.
+Layers are one of the fundamental building blocks of **Tplyr**. Each layer executes independently, and at the end of a build they're bound together. The `ord_layer_index` variable allows you differentiate and sort layers after the table is built. Layers are indexed in the order in which they were added to the table using `add_layer()` or `add_layers()`. For example, let's say you wanted to reverse the order of the layers.
```{r}
t %>%
@@ -121,7 +121,7 @@ These order variables will calculate based on the first applicable method below.
### Factor
-If there's no `VARN` variable in the target dataset, 'Tplyr' will then check if the variable you provided is a factor. If you're new to R, spending some time trying to understand factor variables is quite worthwhile. Let's look at example using the variable `ETHNIC` and see some of the advantages in practice.
+If there's no `VARN` variable in the target dataset, **Tplyr** will then check if the variable you provided is a factor. If you're new to R, spending some time trying to understand factor variables is quite worthwhile. Let's look at example using the variable `ETHNIC` and see some of the advantages in practice.
```{r}
adsl$ETHNIC <- factor(adsl$ETHNIC, levels=c("HISPANIC OR LATINO", "NOT HISPANIC OR LATINO", "DUMMMY"))
@@ -136,13 +136,13 @@ tplyr_table(adsl, TRT01A) %>%
Factor variables have 'levels'. These levels are essentially what the `VARN` variables are trying to achieve - they specify the order of the different values within the associated variable. The variable we set above specifies that "HISPANIC OR LATINO" should sort first, then "NOT HISPANIC OR LATINO", and finally "DUMMY". Notice how they're not alphabetical?
-A highly advantageous aspect of using factor variables in 'Tplyr' is that factor variables can be used to insert dummy values into your table. Consider this line of code from above:
+A highly advantageous aspect of using factor variables in **Tplyr** is that factor variables can be used to insert dummy values into your table. Consider this line of code from above:
```
adsl$ETHNIC <- factor(adsl$ETHNIC, levels=c("HISPANIC OR LATINO", "NOT HISPANIC OR LATINO", "DUMMMY"))
```
-This is converting the variable `ETHNIC` to a factor, then setting the factor levels. But it doesn't _change_ any of the values in the dataset - there are no values of "dummy" within `ETHNIC` in ADSL. Yet in the output built above, you see rows for "DUMMY". By using factors, you can insert rows into your 'Tplyr' table that don't exist in the data. This is particularly helpful if you're working with data early on in a study, where certain values are expected, yet do not currently exist in the data. This will help you prepare tables that are complete even when your data are not.
+This is converting the variable `ETHNIC` to a factor, then setting the factor levels. But it doesn't _change_ any of the values in the dataset - there are no values of "dummy" within `ETHNIC` in ADSL. Yet in the output built above, you see rows for "DUMMY". By using factors, you can insert rows into your **Tplyr** table that don't exist in the data. This is particularly helpful if you're working with data early on in a study, where certain values are expected, yet do not currently exist in the data. This will help you prepare tables that are complete even when your data are not.
### VARN
@@ -154,7 +154,7 @@ adsl %>%
kable()
```
-'Tplyr' will automatically figure this out for you, and pull the `RACEN` values into the variable `ord_layer_1`.
+ **Tplyr** will automatically figure this out for you, and pull the `RACEN` values into the variable `ord_layer_1`.
```{r}
tplyr_table(adsl, TRT01A) %>%
@@ -169,7 +169,7 @@ tplyr_table(adsl, TRT01A) %>%
### Alphabetical
-Lastly, If the target doesn't have a `VARN` variable in the target dataset and isn't a factor, 'Tplyr' will sort the variable alphabetically. The resulting order variable will be numeric, simply numbering each of the variable values alphabetically. Nothing fancy to it!
+Lastly, If the target doesn't have a `VARN` variable in the target dataset and isn't a factor,**Tplyr** will sort the variable alphabetically. The resulting order variable will be numeric, simply numbering each of the variable values alphabetically. Nothing fancy to it!
## Sorting Descriptive Statistic Summaries
@@ -194,16 +194,16 @@ Each of the separate "Groups" added above were indexed based on their position i
## Sorting Count Layers
-The order in which results appear on a frequency table can be deceptively complex and depends on the situation at hand. With this in mind, 'Tplyr' has 3 different methods of ordering the results of a count layer using the function `set_order_count_method()`:
+The order in which results appear on a frequency table can be deceptively complex and depends on the situation at hand. With this in mind, **Tplyr** has 3 different methods of ordering the results of a count layer using the function `set_order_count_method()`:
1. **"byfactor"** - The default method is to sort by a factor. If the input variable is not a factor, alphabetical sorting will be used.
2. **"byvarn"** - Similar to a 'by' variable, a count target can be sorted with a VARN variable existing in the target dataset.
-3. **"bycount"** - This is the most complex method. Many tables require counts to be sorted based on the counts within a particular group, like a treatment variable. 'Tplyr' can populate the ordering column based on numeric values within any results column. This requires some more granular control, for which we've created the functions `set_ordering_cols()` and `set_result_order_var()` to specify the column and numeric value on which the ordering column should be based.
+3. **"bycount"** - This is the most complex method. Many tables require counts to be sorted based on the counts within a particular group, like a treatment variable. **Tplyr** can populate the ordering column based on numeric values within any results column. This requires some more granular control, for which we've created the functions `set_ordering_cols()` and `set_result_order_var()` to specify the column and numeric value on which the ordering column should be based.
### "byfactor" and "byvarn"
-"byfactor" is the default ordering method of results for count layers. Both "byfactor" and "byvarn" behave exactly like the order variables associated with `by` variables in a 'Tplyr' table. For "byvarn", you must set the sort method using `set_order_count_method()`.
+"byfactor" is the default ordering method of results for count layers. Both "byfactor" and "byvarn" behave exactly like the order variables associated with `by` variables in a **Tplyr** table. For "byvarn", you must set the sort method using `set_order_count_method()`.
```{r}
adsl$AGEGR1 <- factor(adsl$AGEGR1, c("<65", "65-80", ">80"))
@@ -310,7 +310,7 @@ tplyr_table(adae, TRTA) %>%
In a layer that uses nesting, we need one more order variable - as we're now concerned with the sorting of both the outside and inside variable. Counts are being summarized for both - so we need to know how both should be sorted. Additionally, we need to make sure that, in this case, the adverse events within a body system stay within the rows of that body system.
-These result variables will always be the last two order variables output by 'Tplyr'. In the above example, `ord_layer_1` is for `AEBODSYS` and `ord_layer_2` is for `AEDECOD`. Note that `ord_layer_2` has `Inf` where `row_label1` and `row_label2` are both equal. This is the row that summarizes the `AEBODSYS` counts. By default, 'Tplyr' is set to assume that you will use **descending** sort on the order variable associated with the inside count variable (i.e. `AEDECOD`). This is because in nested count layer you will often want to sort by descending occurrence of the inside target variable. If you'd like to use ascending sorting instead, we offer the function `set_outer_sort_position()`.
+These result variables will always be the last two order variables output by **Tplyr**. In the above example, `ord_layer_1` is for `AEBODSYS` and `ord_layer_2` is for `AEDECOD`. Note that `ord_layer_2` has `Inf` where `row_label1` and `row_label2` are both equal. This is the row that summarizes the `AEBODSYS` counts. By default, **Tplyr** is set to assume that you will use **descending** sort on the order variable associated with the inside count variable (i.e. `AEDECOD`). This is because in nested count layer you will often want to sort by descending occurrence of the inside target variable. If you'd like to use ascending sorting instead, we offer the function `set_outer_sort_position()`.
```{r}
tplyr_table(adae, TRTA) %>%
diff --git a/vignettes/styled-table.Rmd b/vignettes/styled-table.Rmd
index b1435ea8..f4e26aeb 100644
--- a/vignettes/styled-table.Rmd
+++ b/vignettes/styled-table.Rmd
@@ -24,11 +24,11 @@ library(kableExtra)
load("adsl.Rdata")
```
-In the other vignettes we talk about how to get the most out of 'Tplyr' when it comes to preparing your data. The last step we need to cover is how to get from the data output by 'Tplyr' to a presentation ready table.
+In the other vignettes we talk about how to get the most out of **Tplyr** when it comes to preparing your data. The last step we need to cover is how to get from the data output by **Tplyr** to a presentation ready table.
There are a few things left to do after a table is built. These steps will vary based on what package you're using for presentation - but within this vignette we will demonstrate how to use ['huxtable'](https://hughjonesd.github.io/huxtable/) to prepare your table and ['pharmaRTF'](https://github.com/atorus-research/pharmaRTF) to write the output.
-After a 'Tplyr' table is built, you will likely have to:
+After a **Tplyr** table is built, you will likely have to:
- Sort the table however you wish using the provided order variables
- Drop the order variables once sorted
@@ -82,8 +82,8 @@ dat %>%
In the block above, we assembled the count and descriptive statistic summaries one by one. But notice that I did some pre-processing on the dataset. There are some important considerations here:
-- 'Tplyr' does **not** do any data cleaning. We summarize and prepare the data that you enter. If you're following CDISC standards properly, this shouldn't be a concern - because ADaM data should already be formatted to be presentation ready. 'Tplyr' works under this assumption, and we won't do any re-coding or casing changes. In this example, the original `SEX` values were "M" and "F" - so I switched them to be "Male" and "Female" instead.
-- The second pre-processing step does something interesting. If you recall from `vignette("sort")`, factor variables input to 'Tplyr' will use the factor order for the resulting order variable. Another particularly useful advantage of this is dummying values. The adsl dataset only contains the races "WHITE", "BLACK OR AFRICAN AMERICAN", and "AMERICAN INDIAN OR ALASK NATIVE". If you set factor levels prior to entering the data into 'Tplyr', the values will be dummied for you. This is particularly advantageous when a study is early on and data may be sparse. Your output can display complete values and the presentation will be consistent as data come in.
+- **Tplyr** does **not** do any data cleaning. We summarize and prepare the data that you enter. If you're following CDISC standards properly, this shouldn't be a concern - because ADaM data should already be formatted to be presentation ready. **Tplyr** works under this assumption, and we won't do any re-coding or casing changes. In this example, the original `SEX` values were "M" and "F" - so I switched them to be "Male" and "Female" instead.
+- The second pre-processing step does something interesting. If you recall from `vignette("sort")`, factor variables input to **Tplyr** will use the factor order for the resulting order variable. Another particularly useful advantage of this is dummying values. The adsl dataset only contains the races "WHITE", "BLACK OR AFRICAN AMERICAN", and "AMERICAN INDIAN OR ALASK NATIVE". If you set factor levels prior to entering the data into **Tplyr**, the values will be dummied for you. This is particularly advantageous when a study is early on and data may be sparse. Your output can display complete values and the presentation will be consistent as data come in.
## Sorting, Column Ordering, Column Headers, and Clean-up
@@ -105,18 +105,18 @@ dat %>%
Now you can see things coming together. In this block, we:
- Sorted the data by the layer, by the row labels (which in this case are just text strings we defined), and by the results. Review `vignette("sort")` to understand how each layer handles sorting in more detail.
-- Used the `apply_row_masks()` function. Essentially, **after** your data are sorted, this function will look at all of your row_label variables and drop any repeating values. For packages like 'huxtable', this eases the process of making your table presentation ready, so you don't need to do any cell merging once the 'huxtable' table is created. Additionally, here we set the `row_breaks` option to `TRUE`. This will insert a blank row between each of your layers, which helps improve the presentation depending on your output. It's important to note that the input dataset **must** still have the `ord_layer_index` variable attached in order for the blank rows to be added. Sorting should be done prior, and column reordering/dropping may be done after.
+- Used the `apply_row_masks()` function. Essentially, **after** your data are sorted, this function will look at all of your row_label variables and drop any repeating values. For packages like **huxtable**, this eases the process of making your table presentation ready, so you don't need to do any cell merging once the **huxtable** table is created. Additionally, here we set the `row_breaks` option to `TRUE`. This will insert a blank row between each of your layers, which helps improve the presentation depending on your output. It's important to note that the input dataset **must** still have the `ord_layer_index` variable attached in order for the blank rows to be added. Sorting should be done prior, and column reordering/dropping may be done after.
- Re-ordered the columns and dropped off the order columns. For more information about how to get the most out of `dplyr::select()`, you can look into 'tidyselect' [here](https://tidyselect.r-lib.org/reference/select_helpers.html). This is where the `tidyselect::starts_with()` comes from.
-- Added the column headers. In a huxtable, your column headers are basically just the top rows of your data frame. The 'Tplyr' function `add_column_headers()` does this for you by letting you just enter in a string to define your headers. But there's more to it - you can also create nested headers by nesting text within curly brackets ({}), and notice that we have the treatment groups within the two stars? This actually allows you to take the header N values that 'Tplyr' calculates for you, and use it within the column headers. As you can see in the first row of the output, the text shows the (N=XX) values populated with the proper header_n counts.
+- Added the column headers. In a **huxtable**, your column headers are basically just the top rows of your data frame. The **Tplyr** function `add_column_headers()` does this for you by letting you just enter in a string to define your headers. But there's more to it - you can also create nested headers by nesting text within curly brackets ({}), and notice that we have the treatment groups within the two stars? This actually allows you to take the header N values that **Tplyr** calculates for you, and use it within the column headers. As you can see in the first row of the output, the text shows the (N=XX) values populated with the proper header_n counts.
## Table Styling
-There are a lot of options of where to go next. The ['gt'](https://gt.rstudio.com/) package is always a good choice, and we've been using ['kableExtra'](https://haozhu233.github.io/kableExtra/) throughout these vignettes. At the moment, with the tools we've made available in 'Tplyr', if you're aiming to create RTF outputs (which is still a common requirement in within pharma companies), ['huxtable'](https://hughjonesd.github.io/huxtable/) and our package ['pharmaRTF'](https://github.com/atorus-research/pharmaRTF) will get you where you need to go.
+There are a lot of options of where to go next. The [`gt`](https://gt.rstudio.com/) package is always a good choice, and we've been using [**kableExtra**](https://haozhu233.github.io/kableExtra/) throughout these vignettes. At the moment, with the tools we've made available in **Tplyr**, if you're aiming to create RTF outputs (which is still a common requirement in within pharma companies), [`huxtable`](https://hughjonesd.github.io/huxtable/) and our package [`pharmaRTF`](https://github.com/atorus-research/pharmaRTF) will get you where you need to go.
-_(Note: We plan to extend 'pharmaRTF' to support 'GT' when it has better RTF support)_
+_(Note: We plan to extend **pharmaRTF** to support **GT** when it has better RTF support)_
-Alright - so the table is ready. Let's prepare the 'huxtable' table.
+Alright - so the table is ready. Let's prepare the `huxtable` table.
```{r huxtable}
# Make the table
@@ -134,7 +134,7 @@ ht
# Output to File
-So now this is starting to look a lot more like what we're going for! The table styling is coming together. The last step is to get it into a final output document. So here we'll jump into 'pharmaRTF'
+So now this is starting to look a lot more like what we're going for! The table styling is coming together. The last step is to get it into a final output document. So here we'll jump into **pharmaRTF**
```{r rtf_output}
doc <- pharmaRTF::rtf_doc(ht) %>%
@@ -151,7 +151,7 @@ doc <- pharmaRTF::rtf_doc(ht) %>%
pharmaRTF::set_column_header_buffer(top=1)
```
-This may look a little messy, but 'pharmaRTF' syntax can be abbreviated by standardizing your process. Check out [this vignette](https://atorus-research.github.io/tf_from_file.html) for instructions on how to read titles and footnotes into 'pharmaRTF' from a file.
+This may look a little messy, but **pharmaRTF** syntax can be abbreviated by standardizing your process. Check out [this vignette](https://atorus-research.github.io/tf_from_file.html) for instructions on how to read titles and footnotes into 'pharmaRTF' from a file.
Our document is now created, all the titles and footnotes are added, and settings are good to go. Last step is to write it out.
@@ -165,4 +165,4 @@ And here we have it - Our table is styled and ready to go!
knitr::include_graphics("styled_example.png")
```
-If you'd like to learn more about how to use 'huxtable', be sure to check out the [website](https://hughjonesd.github.io/huxtable/). For use specifically with 'pharmaRTF', we prepared a vignette of tips and tricks [here](https://atorus-research.github.io/huxtable_tips.html).
+If you'd like to learn more about how to use `huxtable`, be sure to check out the [website](https://hughjonesd.github.io/huxtable/). For use specifically with `pharmaRTF`, we prepared a vignette of tips and tricks [here](https://atorus-research.github.io/huxtable_tips.html).
diff --git a/vignettes/table.Rmd b/vignettes/table.Rmd
index 3d90c233..0d4ab526 100644
--- a/vignettes/table.Rmd
+++ b/vignettes/table.Rmd
@@ -25,7 +25,7 @@ load("adae.Rdata")
load("adsl.Rdata")
```
-Most of the work in creating a 'Tplyr' table is at the layer level, but there are a few overarching properties that are worth spending some time discussing. One of the things that we wanted to make sure we did in 'Tplyr' is allow you to eliminate redundant code wherever possible. Adding some processing to the `tplyr_table()` level allows us to do that. Furthermore, some settings simply need to be applied table wide.
+Most of the work in creating a **Tplyr** table is at the layer level, but there are a few overarching properties that are worth spending some time discussing. One of the things that we wanted to make sure we did in **Tplyr** is allow you to eliminate redundant code wherever possible. Adding some processing to the `tplyr_table()` level allows us to do that. Furthermore, some settings simply need to be applied table wide.
## Table Parameters
@@ -87,9 +87,9 @@ Note how in the above example, there are two new columns added to the data - `va
## Population Data
-A last and very important aspect of table level properties in 'Tplyr' is the addition of a population dataset. In CDISC standards, datasets like `adae` only contain adverse events when they occur. This means that if a subject did not experience an adverse event, or did not experience an adverse event within the criteria that you're subsetting for, they don't appear in the dataset. When you're looking at the proportion of subject who experienced an adverse event compared to the total number of subjects in that cohort, `adae` itself leaves you no way to calculate that total - as the subjects won't exist in the data.
+A last and very important aspect of table level properties in **Tplyr** is the addition of a population dataset. In CDISC standards, datasets like `adae` only contain adverse events when they occur. This means that if a subject did not experience an adverse event, or did not experience an adverse event within the criteria that you're subsetting for, they don't appear in the dataset. When you're looking at the proportion of subject who experienced an adverse event compared to the total number of subjects in that cohort, `adae` itself leaves you no way to calculate that total - as the subjects won't exist in the data.
-'Tplyr' allows you to provide a separate population dataset to overcome this. Furthermore, you are also able to provide a separate population dataset `where` parameter and a population treatment variable named `pop_treat_var`, as variable names may differ between the datasets.
+**Tplyr** allows you to provide a separate population dataset to overcome this. Furthermore, you are also able to provide a separate population dataset `where` parameter and a population treatment variable named `pop_treat_var`, as variable names may differ between the datasets.
```{r pop_data1}
t <- tplyr_table(adae, TRTA, where = AEREL != "NONE") %>%
@@ -108,19 +108,19 @@ t %>%
In the above example, `AEREL` doesn’t exist in `adsl`, therefore we used `set_pop_where()` to remove the filter criteria on the population data. Setting the population dataset where parameter to `TRUE` removes any filter applied by the population data. If `set_pop_where()` is not set for the population data, it will default to the `where` parameter used in `tplyr_table()`. The same logic applies to the population treatment variable. `TRTA` does not exist in `adsl` either, so we used `set_pop_treat_var()` to change it to the appropriate variable in `adsl`.
-Note the percentage values in the summary above. By setting the population data, 'Tplyr' now knew to use those values when calculating the percentages for the distinct counts of subjects who experienced the summarized adverse events. Furthermore, with the population data provided, 'Tplyr' is able to calculate your header N's properly:
+Note the percentage values in the summary above. By setting the population data, **Tplyr** now knew to use those values when calculating the percentages for the distinct counts of subjects who experienced the summarized adverse events. Furthermore, with the population data provided, **Tplyr** is able to calculate your header N's properly:
```{r pop_data2}
header_n(t) %>%
kable()
```
-With the table level settings under control, now you're ready to learn more about what 'Tplyr' has to offer in each layer.
+With the table level settings under control, now you're ready to learn more about what **Tplyr** has to offer in each layer.
-- Learn more about descriptive statistics layers in `vignettes("desc")`
+- Learn more about descriptive statistics layers in `vignette("desc")`
- Learn more about count and shift layers in `vignette("count")`
- Learn more about shift layers in `vignette("shift")`
-- Learn more about calculating risk differences in `vignettes("riskdiff")`
-- Learn more about sorting 'Tplyr' tables in `vignettes("sort")`
-- Learn more about using 'Tplyr' options in `vignettes("options")`
-- And finally, learn more about producing and outputting styled tables using 'Tplyr' in `vignettes("styled-table")`
+- Learn more about calculating risk differences in `vignette("riskdiff")`
+- Learn more about sorting **Tplyr** tables in `vignette("sort")`
+- Learn more about using **Tplyr** options in `vignette("options")`
+- And finally, learn more about producing and outputting styled tables using **Tplyr** in `vignette("styled-table")`