Merge branch 'devel' into 7-function-documentation

atorus-research · Jun 15, 2023 · 191c885 · 191c885
2 parents 5630317 + 4692171
commit 191c885
Show file tree

Hide file tree

Showing 33 changed files with 950 additions and 160 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -22,3 +22,5 @@
 ^advs\.xpt$
 ^advs_Define-Excel-Spec_match_admiral\.xlsx
 ^cran-comments\.md$
+^example_data_specs$
+
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -81,3 +81,5 @@ Suggests:
  metacore
 Config/testthat/edition: 3
 VignetteBuilder: knitr
+Depends: 
+ R (>= 3.5)
diff --git a/NEWS.md b/NEWS.md
@@ -2,21 +2,26 @@
 
 ## New Features and Bug Fixes
 
-* Fixed an issue where `xportr_type` would overwrite column labels, widths, and "sas.formats"
-* Fixed messaging of `xportr_order`to give better visability of the number of variables being reordered.
-* Add new argument to `xportr_write` to allow users to specify how xpt validation checks are handled.
+* Fixed an issue where `xportr_type()` would overwrite column labels, widths, and "sas.formats"
+* Fixed messaging of `xportr_order()`to give better visibility of the number of variables being reordered.
+* Add new argument to `xportr_write()` to allow users to specify how xpt validation checks are handled.
 * Fixed bug where character_types were case sensitive. They are now case insensitive.
-* Updated `xportr_type` to make type coercion more explicit. 
+* Updated `xportr_type()` to make type coercion more explicit. 
 * `xpt_validate` updated to accept iso8601 date formats. (#76)
 * Added function `xportr_metadata()` to explicitly set metadata at the start of a pipeline (#44)
 * Metadata order columns are now coerced to numeric by default in `xportr_order()` to prevent character sorting (#149)
 * Message is shown on `xportr_*` functions when the metadata being used has multiple variables with the same name in the same domain (#128)
+* Fixed an issue with `xport_type()` where `DT`, `DTM` variables with a format specified in the metadata (e.g. date9., datetime20.) were being converted to numeric, which will cause a 10 year difference when reading it back by `read_xpt()`. SAS's uniform start date is 1960 whereas Linux's uniform start date is 1970.
 
 ## Documentation
 
 * Moved `{pkgdown}` site to bootswatch. Enabled search and linked slack icon (#122).
+* Additional Deep Dive vignette showcasing functions and quality of life utilities for processing `xpts` created (#84)
+* Get Started vignette spruced up. Messages are now displayed and link to Deep Dive vignette (#150)
 
-## Deprecation and Breaking Changes
+
+## Deprecation
+and Breaking Changes
 
 * The `metacore` argument has been renamed to `metadata` in the following six xportr functions: `xportr_df_label()`, `xportr_format()`, `xportr_label()`, `xportr_length()`, `xportr_order()`, and `xportr_type()`. Please update your code to use the new `metadata` argument in place of `metacore`.
 

diff --git a/R/data.R b/R/data.R
@@ -0,0 +1,84 @@
+#' Analysis Dataset Subject Level
+#'
+#' An example dataset containing subject level data
+#'
+#' @format ## `adsl`
+#' A data frame with 254 rows and 48 columns:
+#' \describe{
+#' \item{STUDYID}{Study Identifier}
+#' \item{USUBJID}{Unique Subject Identifier}
+#' \item{SUBJID}{Subject Identifier for the Study}
+#' \item{SITEID}{Study Site Identifier}
+#' \item{SITEGR1}{Pooled Site Group 1}
+#' \item{ARM}{Description of Planned Arm}
+#' \item{TRT01P}{Planned Treatment for Period 01}
+#' \item{TRT01PN}{Planned Treatment for Period 01 (N)}
+#' \item{TRT01A}{Actual Treatment for Period 01}
+#' \item{TRT01AN}{Actual Treatment for Period 01 (N)}
+#' \item{TRTSDT}{Date of First Exposure to Treatment}
+#' \item{TRTEDT}{Date of Last Exposure to Treatment}
+#' \item{TRTDUR}{Duration of Treatment (days)}
+#' \item{AVGDD}{Avg Daily Dose (as planned)}
+#' \item{CUMDOSE}{Cumulative Dose (as planned)}
+#' \item{AGE}{Age}
+#' \item{AGEGR1}{Pooled Age Group 1}
+#' \item{AGEGR1N}{Pooled Age Group 1 (N)}
+#' \item{AGEU}{Age Units}
+#' \item{RACE}{Race}
+#' \item{RACEN}{Race (N)}
+#' \item{SEX}{Sex}
+#' \item{ETHNIC}{Ethnicity}
+#' \item{SAFFL}{Safety Population Flag}
+#' \item{ITTFL}{Intent-To-Treat Population Flag}
+#' \item{EFFFL}{Efficacy Population Flag}
+#' \item{COMP8FL}{Completers of Week 8 Population Flag}
+#' \item{COMP16FL}{Completers of Week 16 Population Flag}
+#' \item{COMP24FL}{Completers of Week 24 Population Flag}
+#' \item{DISCONFL}{Did the Subject Discontinue the Study}
+#' \item{DSRAEFL}{Discontinued due to AE}
+#' \item{DTHFL}{Subject Died}
+#' \item{BMIBL}{Baseline BMI (kg/m^2)}
+#' \item{BMIBLGR1}{Pooled Baseline BMI Group 1}
+#' \item{HEIGHTBL}{Baseline Height (cm)}
+#' \item{WEIGHTBL}{Baseline Weight (kg)}
+#' \item{EDUCLVL}{Years of Education}
+#' \item{DISONSDT}{Date of Onset of Disease}
+#' \item{DURDIS}{Duration of Disease (Months)}
+#' \item{DURDSGR1}{Pooled Disease Duration Group 1}
+#' \item{VISIT1DT}{Date of Visit 1}
+#' \item{RFSTDTC}{Subject Reference Start Date/Time}
+#' \item{RFENDTC}{Subject Reference End Date/Time}
+#' \item{VISNUMEN}{End of Trt Visit (Vis 12 or Early Term.)}
+#' \item{RFENDT}{Date of Discontinuation/Completion}
+#' \item{DCDECOD}{Standardized Disposition Term}
+#' \item{DCREASCD}{Reason for Discontinuation}
+#' \item{MMSETOT}{MMSE Total}
+#' }
+"adsl"
+
+#' Example Dataset Specification
+#'
+#' @format ## `var_spec`
+#' A data frame with 216 rows and 19 columns:
+#' \describe{
+#' \item{Order}{Order of variable}
+#' \item{Dataset}{Dataset}
+#' \item{Variable}{Variable}
+#' \item{Label}{Variable Label}
+#' \item{Data Type}{Data Type}
+#' \item{Length}{Variable Length}
+#' \item{Significant Digits}{Significant Digits}
+#' \item{Format}{Variable Format}
+#' \item{Mandatory}{Mandatory Variable Flag}
+#' \item{Assigned Value}{Variable Assigned Value}
+#' \item{Codelist}{Variable Codelist}
+#' \item{Common}{Common Variable Flag}
+#' \item{Origin}{Variable Origin}
+#' \item{Pages}{Pages}
+#' \item{Method}{Variable Method}
+#' \item{Predecessor}{Variable Predecessor}
+#' \item{Role}{Variable Role}
+#' \item{Comment}{Comment}
+#' \item{Developer Notes}{Developer Notes}
+#' }
+"var_spec"
diff --git a/R/messages.R b/R/messages.R
@@ -91,7 +91,7 @@ type_log <- function(meta_ordered, type_mismatch_ind, verbose) {
 
 #' Utility for Lengths
 #'
-#' @param miss_vars Variables missing from metatdata
+#' @param miss_vars Variables missing from metadata
 #' @param verbose Provides additional messaging for user
 #'
 #' @return Output to Console

diff --git a/R/metadata.R b/R/metadata.R
@@ -16,6 +16,7 @@
 #' dataset = "test",
 #' variable = c("Subj", "Param", "Val", "NotUsed"),
 #' type = c("numeric", "character", "numeric", "character"),
+#' format = NA,
 #' order = c(1, 3, 4, 2)
 #' )
 #'

diff --git a/R/type.R b/R/type.R
@@ -54,7 +54,8 @@
 #' metadata <- data.frame(
 #' dataset = "test",
 #' variable = c("Subj", "Param", "Val", "NotUsed"),
-#' type = c("numeric", "character", "numeric", "character")
+#' type = c("numeric", "character", "numeric", "character"),
+#' format = NA
 #' )
 #'
 #' .df <- data.frame(
@@ -84,6 +85,7 @@ xportr_type <- function(.df,
  type_name <- getOption("xportr.type_name")
  characterTypes <- c(getOption("xportr.character_types"), "_character")
  numericTypes <- c(getOption("xportr.numeric_types"), "_numeric")
+ format_name <- getOption("xportr.format_name")
 
  ## Common section to detect domain from argument or pipes
 
@@ -106,8 +108,9 @@ xportr_type <- function(.df,
  metadata <- metadata %>%
  filter(!!sym(domain_name) == domain)
  }
- metadata <- metadata %>%
- select(!!sym(variable_name), !!sym(type_name))
+
+ metacore <- metadata %>%
+ select(!!sym(variable_name), !!sym(type_name), !!sym(format_name))
 
  # Common check for multiple variables name
  check_multiple_var_specs(metadata, variable_name)
@@ -125,9 +128,13 @@ xportr_type <- function(.df,
  # _character is used here as a mask of character, in case someone doesn't
  # want 'character' coerced to character
  type.x = if_else(type.x %in% characterTypes, "_character", type.x),
- type.x = if_else(type.x %in% numericTypes, "_numeric", type.x),
+ type.x = if_else(type.x %in% numericTypes | (grepl("DT$|DTM$|TM$", variable) & !is.na(format)),
+ "_numeric",
+ type.x
+ ),
+ type.y = if_else(is.na(type.y), type.x, type.y),
  type.y = tolower(type.y),
- type.y = if_else(type.y %in% characterTypes, "_character", type.y),
+ type.y = if_else(type.y %in% characterTypes | (grepl("DTC$", variable) & is.na(format)), "_character", type.y),
  type.y = if_else(type.y %in% numericTypes, "_numeric", type.y)
  )
 
@@ -138,7 +145,6 @@ xportr_type <- function(.df,
  type_mismatch_ind <- which(meta_ordered$type.x != meta_ordered$type.y)
  type_log(meta_ordered, type_mismatch_ind, verbose)
 
-
  # Check if variable types match
  is_correct <- sapply(meta_ordered[["type.x"]] == meta_ordered[["type.y"]], isTRUE)
  # Use the original variable iff metadata is missing that variable
@@ -161,6 +167,5 @@ xportr_type <- function(.df,
  }
  }, is_correct
  )
-
  .df
 }
diff --git a/R/xportr-package.R b/R/xportr-package.R
@@ -116,7 +116,8 @@
 globalVariables(c(
  "abbr_parsed", "abbr_stem", "adj_orig", "adj_parsed", "col_pos", "dict_varname",
  "lower_original_varname", "my_minlength", "num_st_ind", "original_varname",
- "renamed_n", "renamed_var", "use_bundle", "viable_start", "type.x", "type.y"
+ "renamed_n", "renamed_var", "use_bundle", "viable_start", "type.x", "type.y",
+ "variable"
 ))
 
 # The following block is used by usethis to automatically manage

diff --git a/README.Rmd b/README.Rmd
@@ -82,7 +82,7 @@ data sets (≤ 200)
 - Coerces variables to only numeric or character types
 - Display format support for numeric float and date/time values
 - Variables names are ≤ 8 characters.
-- Variable labels are ≤ 200 characters. 
+- Variable labels are ≤ 40 characters. 
 - Data set labels are ≤ 40 characters.
 - Presence of non-ASCII characters in Variable Names, Labels or data set labels.
 
@@ -103,7 +103,7 @@ To do this we will need to do the following:
  - Apply a dataset label 
  - Write out a version 5 xpt file
 
-All of which can be done using a well-defined specification file and the `xportr` package! 
+All of which can be done using a well-defined specification file and the `{xportr}` package! 
 
 First we will start with our `ADSL` dataset created in R. This example `ADSL` dataset is taken from the [`{admiral}`](https://pharmaverse.github.io/admiral/index.html) package. The script that generates this `ADSL` dataset can be created by using this command `admiral::use_ad_template("adsl")`. This `ADSL` dataset has 306 observations and 48 variables.
 
@@ -125,7 +125,19 @@ var_spec <- readxl::read_xlsx(spec_path, sheet = "Variables") %>%
  rlang::set_names(tolower)
 ```
 
-Each `xportr_` function has been written in a way to take in a part of the specification file and apply that piece to the dataset. 
+Each `xportr_` function has been written in a way to take in a part of the specification file and apply that piece to the dataset. Setting `verbose = "warn"` will send appropriate warning message to the console. We have suppressed the warning for the sake of brevity.
+
+```{r, warning = FALSE, message=FALSE, eval=TRUE}
+adsl %>%
+ xportr_type(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_length(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_label(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_order(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_format(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")
+```
+
+The `xportr_metadata()` function can reduce duplication by setting the variable specification and domain explicitly at the top of a pipeline. If you would like to use the `verbose` argument, you will need to set in each function call.
 
 ```{r, message=FALSE, eval=FALSE}
 adsl %>%

diff --git a/README.md b/README.md
@@ -76,7 +76,7 @@ to any validators or data reviewers.
 - Coerces variables to only numeric or character types
 - Display format support for numeric float and date/time values
 - Variables names are ≤ 8 characters.
-- Variable labels are ≤ 200 characters.
+- Variable labels are ≤ 40 characters.
 - Data set labels are ≤ 40 characters.
 - Presence of non-ASCII characters in Variable Names, Labels or data set
  labels.
@@ -99,7 +99,7 @@ To do this we will need to do the following:
 - Write out a version 5 xpt file
 
 All of which can be done using a well-defined specification file and the
-`xportr` package!
+`{xportr}` package!
 
 First we will start with our `ADSL` dataset created in R. This example
 `ADSL` dataset is taken from the
@@ -131,7 +131,24 @@ var_spec <- readxl::read_xlsx(spec_path, sheet = "Variables") %>%
 ```
 
 Each `xportr_` function has been written in a way to take in a part of
-the specification file and apply that piece to the dataset.
+the specification file and apply that piece to the dataset. Setting
+`verbose = "warn"` will send appropriate warning message to the console.
+We have suppressed the warning for the sake of brevity.
+
+``` r
+adsl %>%
+ xportr_type(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_length(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_label(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_order(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_format(var_spec, "ADSL", verbose = "warn") %>%
+ xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")
+```
+
+The `xportr_metadata()` function can reduce duplication by setting the
+variable specification and domain explicitly at the top of a pipeline.
+If you would like to use the `verbose` argument, you will need to set in
+each function call.
 
 ``` r
 adsl %>%

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -5,7 +5,7 @@ template:
  params:
  bootswatch: sandstone
 search:
- exclude: ['news/index.html']
+ exclude: ["news/index.html"]
 news:
  cran_dates: true
 
@@ -18,39 +18,41 @@ navbar:
  href: https://pharmaverse.slack.com/archives/C030EB2M4GM
  aria-label: slack
 
-
 reference:
-- title: The six core xportr functions
-- contents:
- - xportr_type
- - xportr_length
- - xportr_label
- - xportr_write
- - xportr_format
- - xportr_order
-
-- title: xportr helper functions
-- contents:
- - label_log
- - length_log
- - type_log
- - var_names_log
- - var_ord_msg
- - xportr_logger
- - xportr_df_label
- - xportr_metadata
-
-- title: xportr
- navbar: ~
- contents:
- - xportr
-
-- title: internal
- contents:
- - cli_theme_tests
- - expect_attr_width
- - minimal_metadata
- - minimal_table
-
-
-
+ - title: The six core xportr functions
+ - contents:
+ - xportr_type
+ - xportr_length
+ - xportr_label
+ - xportr_write
+ - xportr_format
+ - xportr_order
+
+ - title: xportr helper functions
+ - contents:
+ - label_log
+ - length_log
+ - type_log
+ - var_names_log
+ - var_ord_msg
+ - xportr_logger
+ - xportr_df_label
+ - xportr_metadata
+
+ - title: xportr example datasets and specification files
+ - contents:
+ - adsl
+ - var_spec
+
+ - title: internal
+ contents:
+ - cli_theme_tests
+ - expect_attr_width
+ - minimal_metadata
+ - minimal_table
+
+articles:
+ - title: ~
+ navbar: ~
+ contents:
+ - deepdive
diff --git a/data/adsl.rda b/data/adsl.rda
diff --git a/data/var_spec.rda b/data/var_spec.rda
diff --git a/example_data_specs/TDF_ADaM - Pilot 3 Team updated.xlsx b/example_data_specs/TDF_ADaM - Pilot 3 Team updated.xlsx
diff --git a/example_data_specs/TDF_ADaM_Pilot3.xlsx b/example_data_specs/TDF_ADaM_Pilot3.xlsx
diff --git a/example_data_specs/adadas.xpt b/example_data_specs/adadas.xpt
diff --git a/example_data_specs/adae.xpt b/example_data_specs/adae.xpt
diff --git a/example_data_specs/adlbc.xpt b/example_data_specs/adlbc.xpt
diff --git a/example_data_specs/adtte.xpt b/example_data_specs/adtte.xpt
diff --git a/example_data_specs/readme.md b/example_data_specs/readme.md
@@ -0,0 +1 @@
+Data taken from Pilot 3 Submission Study: https://github.com/RConsortium/submissions-pilot3-adam