Skip to content

Commit

Permalink
Merge pull request #164 from atorus-research/gh_issue_62
Browse files Browse the repository at this point in the history
Move data from vignettes to package data
  • Loading branch information
mstackhouse committed Dec 18, 2023
2 parents bb2dc13 + 7cf5e60 commit 5e1dcf2
Show file tree
Hide file tree
Showing 45 changed files with 492 additions and 234 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@
^Makefile$
^Jenkinsfile$
^rsconnect$
^data-raw$
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,4 @@ VignetteBuilder: knitr
RoxygenNote: 7.2.3
RdMacros: lifecycle
Config/testthat/edition: 3
LazyData: true
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ export(collapse_row_labels)
export(f_str)
export(get_by)
export(get_count_layer_formats)
export(get_data_labels)
export(get_desc_layer_formats)
export(get_layer_template)
export(get_layer_templates)
Expand Down
67 changes: 67 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#' ADSL Data
#'
#' A subset of the PHUSE Test Data Factory ADSL data set.
#'
#' @format A data.frame with 254 rows and 49 columns.
#'
#' @seealso [get_data_labels()]
#'
#' @source https://github.com/phuse-org/TestDataFactory
#'
"tplyr_adsl"


#' ADAE Data
#'
#' A subset of the PHUSE Test Data Factory ADAE data set.
#'
#' @format A data.frame with 276 rows and 55 columns.
#'
#' @seealso [get_data_labels()]
#'
#' @source https://github.com/phuse-org/TestDataFactory
#'
"tplyr_adae"

#' ADAS Data
#'
#' A subset of the PHUSE Test Data Factory ADAS data set.
#'
#' @format A data.frame with 1,040 rows and 40 columns.
#'
#' @seealso [get_data_labels()]
#'
#' @source https://github.com/phuse-org/TestDataFactory
#'
"tplyr_adas"

#' ADLB Data
#'
#' A subset of the PHUSE Test Data Factory ADLB data set.
#'
#' @format A data.frame with 311 rows and 46 columns.
#'
#' @seealso [get_data_labels()]
#'
#' @source https://github.com/phuse-org/TestDataFactory
#'
"tplyr_adlb"


#' Get Data Labels
#'
#' Get labels for data sets included in Tplyr.
#'
#' @param data A Tplyr data set.
#'
#' @return A data.frame with columns `name` and `label` containing the names and labels of each column.
#'
#' @export
get_data_labels <- function(data) {
map_dfr(
names(data),
function(name) {
list(name = name, label = attr(data[[name]], "label"))
}
)
}
8 changes: 2 additions & 6 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ library(tidyverse)
library(magrittr)
library(Tplyr)
library(knitr)
load("vignettes/adae.Rdata")
load("vignettes/adsl.Rdata")
```

# *Tplyr* <img src="man/figures/logo.png" align="right" alt="" width="120" />
Expand Down Expand Up @@ -76,11 +74,10 @@ When you look at this table, you can begin breaking this output down into smalle
So we have one table, with 6 summaries (7 including the next page, not shown) - but only 2 different approaches to summaries being performed.
In the same way that [dplyr](https://dplyr.tidyverse.org/) is a grammar of data manipulation, **Tplyr** aims to be a grammar of data summary. The goal of **Tplyr** is to allow you to program a summary table like you see it on the page, by breaking a larger problem into smaller 'layers', and combining them together like you see on the page.

Enough talking - let's see some code. In these examples, we will be using data from the [PHUSE Test Data Factory]( https://advance.phuse.global/display/WEL/Test+Dataset+Factory) based on the [original pilot project submission package](https://github.com/atorus-research/CDISC_pilot_replication). Note: You can see our replication of the CDISC pilot using the PHUSE Test Data Factory data [here](https://github.com/atorus-research/CDISC_pilot_replication).
Enough talking - let's see some code. In these examples, we will be using data from the [PHUSE Test Data Factory]( https://advance.phuse.global/display/WEL/Test+Dataset+Factory) based on the [original pilot project submission package](https://github.com/atorus-research/CDISC_pilot_replication). We've packaged some subsets of that data into **Tplyr**, which you can use to replicate our examples and run our vignette code yourself. Note: You can see our replication of the CDISC pilot using the PHUSE Test Data Factory data [here](https://github.com/atorus-research/CDISC_pilot_replication).

```{r initial_demo}
tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
tplyr_table(tplyr_adsl, TRT01P, where = SAFFL == "Y") %>%
add_layer(
group_desc(AGE, by = "Age (years)")
) %>%
Expand All @@ -89,7 +86,6 @@ tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
) %>%
build() %>%
kable()
```

## *Tplyr* is Qualified
Expand Down
85 changes: 43 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

# Tplyr <img src="man/figures/logo.png" align="right" alt="" width="120" />
# *Tplyr* <img src="man/figures/logo.png" align="right" alt="" width="120" />

<!-- badges: start -->

Expand Down Expand Up @@ -42,7 +42,7 @@ install.packages("Tplyr")
devtools::install_github("https://github.com/atorus-research/Tplyr.git", ref="devel")
```

# What is Tplyr?
# What is *Tplyr*?

[dplyr](https://dplyr.tidyverse.org/) from tidyverse is a grammar of
data manipulation. So what does that allow you to do? It gives you, as a
Expand All @@ -58,10 +58,10 @@ pharmaceutical industry, a great deal of the data presented in the
outputs we create are very similar. For the most part, most of these
tables can be broken down into a few categories:

- Counting for event based variables or categories
- Shifting, which is just counting a change in state with a ‘from’ and
a ‘to’
- Generating descriptive statistics around some continuous variable.
- Counting for event based variables or categories
- Shifting, which is just counting a change in state with a ‘from’ and a
‘to’
- Generating descriptive statistics around some continuous variable.

For many of the tables that go into a clinical submission, the tables
are made up of a combination of these approaches. Consider a
Expand All @@ -81,15 +81,15 @@ into smaller, redundant, components. These components can be viewed as
layers. The boxes in the image above represent how you can begin to
conceptualize this.

- First we have Sex, which is made up of n (%) counts.
- Next we have Age as a continuous variable, where we have a number of
descriptive statistics, including n, mean, standard deviation,
median, quartile 1, quartile 3, min, max, and missing values.
- After that we have age, but broken into categories - so this is once
again n (%) values.
- Race - more counting,
- Ethnicity - more counting
- Weight - and we’re back to descriptive statistics.
- First we have Sex, which is made up of n (%) counts.
- Next we have Age as a continuous variable, where we have a number of
descriptive statistics, including n, mean, standard deviation, median,
quartile 1, quartile 3, min, max, and missing values.
- After that we have age, but broken into categories - so this is once
again n (%) values.
- Race - more counting,
- Ethnicity - more counting
- Weight - and we’re back to descriptive statistics.

So we have one table, with 6 summaries (7 including the next page, not
shown) - but only 2 different approaches to summaries being performed.
Expand All @@ -104,13 +104,14 @@ using data from the [PHUSE Test Data
Factory](https://advance.phuse.global/display/WEL/Test+Dataset+Factory)
based on the [original pilot project submission
package](https://github.com/atorus-research/CDISC_pilot_replication).
Note: You can see our replication of the CDISC pilot using the PHUSE
Test Data Factory data
We’ve packaged some subsets of that data into **Tplyr**, which you can
use to replicate our examples and run our vignette code yourself. Note:
You can see our replication of the CDISC pilot using the PHUSE Test Data
Factory data
[here](https://github.com/atorus-research/CDISC_pilot_replication).

``` r

tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
tplyr_table(tplyr_adsl, TRT01P, where = SAFFL == "Y") %>%
add_layer(
group_desc(AGE, by = "Age (years)")
) %>%
Expand All @@ -133,7 +134,7 @@ tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
| Age Categories n (%) | \>80 | 30 ( 34.9%) | 18 ( 21.4%) | 29 ( 34.5%) | 2 | 1 | 2 |
| Age Categories n (%) | 65-80 | 42 ( 48.8%) | 55 ( 65.5%) | 47 ( 56.0%) | 2 | 1 | 3 |

## Tplyr is Qualified
## *Tplyr* is Qualified

We understand how important documentation and testing is within the
pharmaceutical world. This is why outside of unit testing **Tplyr**
Expand All @@ -153,38 +154,38 @@ this report.

Here are some of the high level benefits of using **Tplyr**:

- Easy construction of table data using an intuitive syntax
- Smart string formatting for your numbers that’s easily specified by
the user
- A great deal of flexibility in what is performed and how it’s
presented, without specifying hundreds of parameters
- Easy construction of table data using an intuitive syntax
- Smart string formatting for your numbers that’s easily specified by
the user
- A great deal of flexibility in what is performed and how it’s
presented, without specifying hundreds of parameters

# Where to go from here?

There’s quite a bit more to learn! And we’ve prepared a number of other
vignettes to help you get what you need out of **Tplyr**.

- The best place to start is with our Getting Started vignette at
`vignette("Tplyr")`
- Learn more about table level settings in `vignette("table")`
- Learn more about descriptive statistics layers in `vignette("desc")`
- Learn more about count layers in `vignette("count")`
- Learn more about shift layers in `vignette("shift")`
- Learn more about percentages in `vignette("denom")`
- Learn more about calculating risk differences in
`vignette("riskdiff")`
- Learn more about sorting **Tplyr** tables in `vignette("sort")`
- Learn more about using **Tplyr** options in `vignette("options")`
- And finally, learn more about producing and outputting styled tables
using **Tplyr** in `vignette("styled-table")`
- The best place to start is with our Getting Started vignette at
`vignette("Tplyr")`
- Learn more about table level settings in `vignette("table")`
- Learn more about descriptive statistics layers in `vignette("desc")`
- Learn more about count layers in `vignette("count")`
- Learn more about shift layers in `vignette("shift")`
- Learn more about percentages in `vignette("denom")`
- Learn more about calculating risk differences in
`vignette("riskdiff")`
- Learn more about sorting **Tplyr** tables in `vignette("sort")`
- Learn more about using **Tplyr** options in `vignette("options")`
- And finally, learn more about producing and outputting styled tables
using **Tplyr** in `vignette("styled-table")`

In the **Tplyr** version 1.0.0, we’ve packed a number of new features
in. For deeper dives on the largest new additions:

- Learn about **Tplyr’s** traceability metadata in
`vignette("metadata")` and about how it can be extended in
`vignette("custom-metadata")`
- Learn about layer templates in `vignette("layer_templates")`
- Learn about **Tplyr**’s traceability metadata in
`vignette("metadata")` and about how it can be extended in
`vignette("custom-metadata")`
- Learn about layer templates in `vignette("layer_templates")`

# References

Expand Down
15 changes: 12 additions & 3 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,13 @@ reference:
- title: Post-pocessing
desc: Post-pocessing functions
- contents:
- str_indent_wrap
- apply_row_masks
- apply_conditional_format
- apply_formats
- apply_row_masks
- collapse_row_labels
- str_extract_fmt_group
- str_extract_num
- apply_formats
- str_indent_wrap
- title: Helper functions
desc: General helper functions
- contents:
Expand All @@ -122,6 +123,14 @@ reference:
- get_where.tplyr_layer
- Tplyr
- get_tplyr_regex
- title: Data
desc: Tplyr Built-in Datasets
- contents:
- tplyr_adae
- tplyr_adas
- tplyr_adlb
- tplyr_adsl
- get_data_labels

articles:
- title: Table Basics
Expand Down
3 changes: 3 additions & 0 deletions data-raw/DATASET.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## code to prepare `DATASET` dataset goes here

usethis::use_data(DATASET, overwrite = TRUE)
6 changes: 6 additions & 0 deletions data-raw/adae.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# note: adae.Rdata was copied over from vignettes/adsl.Rdata
# this is a copy of the PHUSE Test Data Factory data, trimmed down for size

load("data-raw/adae.Rdata")
tplyr_adae <- adae
usethis::use_data(tplyr_adae, overwrite = TRUE)
File renamed without changes.
6 changes: 6 additions & 0 deletions data-raw/adas.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# note: adlb.Rdata was copied over from vignettes/adsl.Rdata
# this is a copy of the PHUSE Test Data Factory data, trimmed down for size

load("data-raw/adas.Rdata")
tplyr_adas <- adas
usethis::use_data(tplyr_adas, overwrite = TRUE)
File renamed without changes.
6 changes: 6 additions & 0 deletions data-raw/adlb.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# note: adlb.Rdata was copied over from vignettes/adsl.Rdata
# this is a copy of the PHUSE Test Data Factory data, trimmed down for size

load("data-raw/adlb.Rdata")
tplyr_adlb <- adlb
usethis::use_data(tplyr_adlb, overwrite = TRUE)
File renamed without changes.
6 changes: 6 additions & 0 deletions data-raw/adsl.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# note: adsl.Rdata was copied over from vignettes/adsl.Rdata
# this is a copy of the PHUSE Test Data Factory data, trimmed down for size

load("data-raw/adsl.Rdata")
tplyr_adsl <- adsl
usethis::use_data(tplyr_adsl, overwrite = TRUE)
File renamed without changes.
Binary file added data/tplyr_adae.rda
Binary file not shown.
Binary file added data/tplyr_adas.rda
Binary file not shown.
Binary file added data/tplyr_adlb.rda
Binary file not shown.
Binary file added data/tplyr_adsl.rda
Binary file not shown.
22 changes: 11 additions & 11 deletions man/collapse_row_labels.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions man/get_data_labels.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 22 additions & 0 deletions man/tplyr_adae.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 5e1dcf2

Please sign in to comment.