diff --git a/cran-comments.md b/cran-comments.md index dc9a0c23..55ff521f 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,4 +1,6 @@ -## xportr 0.1.0 +## xportr 0.1.0 Submission 2 + +Per comments from Gregor Seyer, the optoins setting in the xportr.Rmd was removed. A grep was also run to check for other instances. Check Results: diff --git a/docs/404.html b/docs/404.html index c4a94e4f..398e0a4f 100644 --- a/docs/404.html +++ b/docs/404.html @@ -1,75 +1,35 @@ - - -
- + + + + -We will demonstrate the 6 main functions within xportr
:
The demo will make use of a small ADSL
data set that is
+apart of the {admiral}
+package. The script that generates this ADSL
dataset can be
+created by using this command
+admiral::use_ad_template("adsl")
.
The ADSL
has the following features:
The demo will make use of a small adsl data set that is included in this package. Other vignettes within this package make use of the CDISC Pilot Study Data.
-The adsl has the following features:
+To create a fully compliant v5 xpt ADSL
dataset, that
+was developed using R, we will need to apply the 6 main functions within
+the xportr
package:
xportr_type()
xportr_length()
xportr_order()
xportr_format()
xportr_label()
xportr_write()
-library(haven)
-library(dplyr)
-library(labelled)
-library(xportr)
-
+# Loading packages
+library(dplyr)
+library(labelled)
+library(xportr)
+library(admiral)
-adsl <- haven::read_sas( system.file("extdata", "adsl.sas7bdat", package="xportr"))
NOTE: Dataset can be created by using this command
+admiral::use_ad_template("adsl")
.
In order to make use of the functions within xportr
you will need to create two R data frame objects that contain your data set specifications. For our examples, we have referenced specifications files contained in the package. Here we have called those two objects var_spec and data_spec. Please note, the change of variable names for each. You will most likely need to do some pre-processing of your spec sheets after loading in the spec files for them to work appropriately with the xportr
functions.
In order to make use of the functions within xportr
you
+will need to create an R data frame that contains your specification
+file. You will most likely need to do some pre-processing of your spec
+sheets after loading in the spec files for them to work appropriately
+with the xportr
functions. Please see our example spec
+sheets in
+system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr")
+to see how xportr
expects the specification sheets.
-var_spec <- readxl::read_xlsx(
- system.file("specs", "ADaM_spec.xlsx", package="xportr"), sheet = "Variables") %>%
- dplyr::rename(type = "Data Type") %>%
- rlang::set_names(tolower)
-
-data_spec <- readxl::read_xlsx(
- system.file("specs", "ADaM_spec.xlsx", package="xportr"), sheet = "Datasets") %>%
- rlang::set_names(tolower) %>%
- dplyr::rename(label = "description")
var_spec <- readxl::read_xlsx(
+ system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr"), sheet = "Variables") %>%
+ dplyr::rename(type = "Data Type") %>%
+ rlang::set_names(tolower)
+
The spec file within this package contains 34 of the most CDISC ADaM data sets. Below is a quick snapshot of the specification file pertaining to the adsl data set, which we will make use of in the 5 xportr
functions below.
Below is a quick snapshot of the specification file pertaining to the
+ADSL
data set, which we will make use of in the 6
+xportr
function calls below. Take note of the order, label,
+type, length and format columns.
In order to be compliant with transport v5 specifications an xpt
file can only have two data types: character and numeric/dbl. Currently the adsl data set has chr, dbl, time and date.
In order to be compliant with transport v5 specifications an
+xpt
file can only have two data types: character and
+numeric/dbl. Currently the ADSL
data set has chr, dbl,
+time, factor and date.
look_for(adsl, details = TRUE)
- pos variable label col_type values
- 1 STUDYID — chr range: mid987650 - mid987650
- 2 SITEID — dbl range: 214356 - 214356
- 3 USUBJID — chr range: 987650.000001 - 987650.000024
- 4 SUBJID — dbl range: 1 - 24
- 5 COUNTRY — chr range: USA - USA
- 6 ACOUNTRY — chr range: UNITED STATES - UNITED STATES
- 7 AGE — dbl range: 27 - 62
- 8 AGEU — chr range: YEARS - YEARS
- 9 SEX — chr range: F - M
- 10 RACE — chr range: ASIAN - WHITE
- 11 RACEN — dbl range: 1 - 2
- 12 WEIGHTBL — dbl range: 56 - 88
- 13 TRT01A — chr range: Active - Placebo
- 14 TRT01AN — dbl range: 1 - 2
- 15 SAFFL — chr range: Y - Y
- 16 SCRDT — date range: 2018-04-08 - 2018-04-18
- 17 RANDDT — date range: 2018-04-08 - 2018-04-18
- 18 TRTSDT — date range: 2018-04-08 - 2018-04-18
- 19 TRTSTM — time range: 38820 - 74700
- 20 TRTEDT — date range: 2018-04-16 - 2018-05-01
- 21 TRTETM — time range: 33240 - 54000
- 22 BRTHDT — date range: 1956-06-30 - 1991-06-30
- 23 BRTHDTC — dbl range: 1956 - 1991
Using xport_type
and the supplied specification file, we can coerce the variables in the adsl set to be either numeric or character. A message is given if variables were not coerced, this is due to the variables not being in the specification file.
Using xport_type
and the supplied specification file, we
+can coerce the variables in the ADSL
set to be
+either numeric or character.
# .df <- adsl %>% mutate(`1BIGLong_Var` = "test")
-adsl_type <- xportr_type(.df = adsl, metacore = var_spec, domain = "ADSL", verbose = "message")
- Variable type(s) in `.df` don't match metadata: `STUDYID`, `SITEID`, `USUBJID`, `SUBJID`, `COUNTRY`, `AGE`, `AGEU`, `SEX`, `RACE`, `WEIGHTBL`, `TRT01A`, `TRT01AN`, `SAFFL`, `RANDDT`, `TRTSDT`, and `TRTEDT`
-
- ── Variable type mismatches found. ──
-
- ✓ 16 variables coerced
pos variable label col_type values
- 1 STUDYID — chr range: mid987650 - mid987650
- 2 SITEID — chr range: 214356 - 214356
- 3 USUBJID — chr range: 987650.000001 - 987650.000024
- 4 SUBJID — chr range: 1 - 9
- 5 COUNTRY — chr range: USA - USA
- 6 ACOUNTRY — chr range: UNITED STATES - UNITED STATES
- 7 AGE — dbl range: 27 - 62
- 8 AGEU — chr range: YEARS - YEARS
- 9 SEX — chr range: F - M
- 10 RACE — chr range: ASIAN - WHITE
- 11 RACEN — dbl range: 1 - 2
- 12 WEIGHTBL — dbl range: 56 - 88
- 13 TRT01A — chr range: Active - Placebo
- 14 TRT01AN — dbl range: 1 - 2
- 15 SAFFL — chr range: Y - Y
- 16 SCRDT — dbl range: 17629 - 17639
- 17 RANDDT — dbl range: 17629 - 17639
- 18 TRTSDT — dbl range: 17629 - 17639
- 19 TRTSTM — dbl range: 38820 - 74700
- 20 TRTEDT — dbl range: 17637 - 17652
- 21 TRTETM — dbl range: 33240 - 54000
- 22 BRTHDT — dbl range: -4933 - 7850
- 23 BRTHDTC — dbl range: 1956 - 1991
+
+adsl_type <- xportr_type(adsl, var_spec, domain = "ADSL", verbose = "message")
Now all appropriate types have been applied to the dataset as
+seen below.
look_for(adsl_type, details = TRUE)
+ pos variable label col_type values
+ 1 STUDYID — chr range: CDISCPILOT01 - CDISCPILOT01
+ 2 USUBJID — chr range: 01-701-1015 - 01-718-1427
+ 3 SUBJID — chr range: 1001 - 1448
+ 4 RFSTDTC — chr range: 2012-07-09 - 2014-09-02
+ 5 RFENDTC — chr range: 2012-09-01 - 2015-03-05
+ 6 RFXSTDTC — chr range: 2012-07-09 - 2014-09-02
+ 7 RFXENDTC — chr range: 2012-08-28 - 2015-03-05
+ 8 RFICDTC — chr range:
+ 9 RFPENDTC — chr range: 2012-08-13 - 2015-03-05T14:40
+ 10 DTHDTC — chr range: 2013-01-14 - 2014-11-01
+ 11 DTHFL — chr range: Y - Y
+ 12 SITEID — chr range: 701 - 718
+ 13 AGE — dbl range: 50 - 89
+ 14 AGEU — chr range: YEARS - YEARS
+ 15 SEX — chr range: F - M
+ 16 RACE — chr range: AMERICAN INDIAN OR ALASKA NATIVE - WHITE
+ 17 ETHNIC — chr range: HISPANIC OR LATINO - NOT HISPANIC OR LAT~
+ 18 ARMCD — chr range: Pbo - Xan_Lo
+ 19 ARM — chr range: Placebo - Xanomeline Low Dose
+ 20 ACTARMCD — chr range: Pbo - Xan_Lo
+ 21 ACTARM — chr range: Placebo - Xanomeline Low Dose
+ 22 COUNTRY — chr range: USA - USA
+ 23 DMDTC — chr range: 2012-07-06 - 2014-08-29
+ 24 DMDY — dbl range: -37 - -2
+ 25 TRT01P — chr range: Placebo - Xanomeline Low Dose
+ 26 TRT01A — chr range: Placebo - Xanomeline Low Dose
+ 27 TRTSDTM — dbl range: 1341792000 - 1409616000
+ 28 TRTEDTM — dbl range: 1346198399 - 1425599999
+ 29 TRTSDT — dbl range: 15530 - 16315
+ 30 TRTEDT — dbl range: 15580 - 16499
+ 31 TRTDURD — dbl range: 1 - 212
+ 32 SCRFDT — dbl range: 15565 - 16181
+ 33 EOSDT — dbl range: 15584 - 16499
+ 34 EOSSTT — chr range: COMPLETED - DISCONTINUED
+ 35 FRVDT — dbl range: 15754 - 16389
+ 36 DTHDT — dbl range: 15719 - 16375
+ 37 DTHDTF — chr range:
+ 38 DTHADY — dbl range: 12 - 175
+ 39 LDDTHELD — dbl range: 0 - 2
+ 40 LSTALVDT — dbl range: 15584 - 16499
+ 41 AGEGR1 — chr range: >=65 - 18-64
+ 42 SAFFL — chr range: Y - Y
+ 43 RACEGR1 — chr range: Non-white - White
+ 44 REGION1 — chr range: NA - NA
+ 45 LDDTHGR1 — chr range: <= 30 - <= 30
+ 46 DTH30FL — chr range: Y - Y
+ 47 DTHA30FL — chr range:
+ 48 DTHB30FL — chr range: Y - Y
Next we can apply the lengths from a variable level specification file to the data frame. xportr_length
will identify variables that are missing from your specification file. The function will also alert you to how many lengths have been applied successfully. Before we apply the lengths lets verify that no lengths have been applied to the original dataframe.
capture.output(str(adsl, give.head=TRUE)) %>%
- as_tibble() %>%
- head(n=7)
- # A tibble: 7 × 1
- value
- <chr>
- 1 "tibble [24 × 23] (S3: tbl_df/tbl/data.frame)"
- 2 " $ STUDYID : chr [1:24] \"mid987650\" \"mid987650\" \"mid987650\" \"mid98765…
- 3 " ..- attr(*, \"format.sas\")= chr \"$\""
- 4 " $ SITEID : num [1:24] 214356 214356 214356 214356 214356 ..."
- 5 " ..- attr(*, \"format.sas\")= chr \"BEST\""
- 6 " $ USUBJID : chr [1:24] \"987650.000001\" \"987650.000002\" \"987650.000003\…
- 7 " ..- attr(*, \"format.sas\")= chr \"$\""
No lengths have been applied to the variables as seen in the printout for the first 3 variables. Let’s now use xportr_length
to apply our lengths from the specification file.
Next we can apply the lengths from a variable level specification
+file to the data frame. xportr_length
will identify
+variables that are missing from your specification file. The function
+will also alert you to how many lengths have been applied successfully.
+Before we apply the lengths lets verify that no lengths have been
+applied to the original dataframe.
adsl_length <- adsl %>% xportr_length(var_spec, "ADSL", "message")
-
- ── Variable lengths missing from metadata. ──
-
- ✓ 7 lengths resolved
- Variable(s) present in `.df` but doesn't exist in `metacore`.
- x Problem with `ACOUNTRY`, `RACEN`, `SCRDT`, `TRTSTM`, `TRTETM`, `BRTHDT`, and `BRTHDTC`
+str(adsl)
tibble [306 × 48] (S3: tbl_df/tbl/data.frame)
+ $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
+ ..- attr(*, "label")= chr "Study Identifier"
+ $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
+ ..- attr(*, "label")= chr "Unique Subject Identifier"
+ $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ...
+ ..- attr(*, "label")= chr "Subject Identifier for the Study"
+ $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
+ ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
+ $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
+ ..- attr(*, "label")= chr "Subject Reference End Date/Time"
+ $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
+ ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
+ $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
+ ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
+ $ RFICDTC : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "label")= chr "Date/Time of Informed Consent"
+ $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
+ ..- attr(*, "label")= chr "Date/Time of End of Participation"
+ $ DTHDTC : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "label")= chr "Date/Time of Death"
+ $ DTHFL : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "label")= chr "Subject Death Flag"
+ $ SITEID : chr [1:306] "701" "701" "701" "701" ...
+ ..- attr(*, "label")= chr "Study Site Identifier"
+ $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
+ ..- attr(*, "label")= chr "Age"
+ $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
+ ..- attr(*, "label")= chr "Age Units"
+ $ SEX : chr [1:306] "F" "M" "M" "M" ...
+ ..- attr(*, "label")= chr "Sex"
+ $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
+ ..- attr(*, "label")= chr "Race"
+ $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
+ ..- attr(*, "label")= chr "Ethnicity"
+ $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
+ ..- attr(*, "label")= chr "Planned Arm Code"
+ $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Planned Arm"
+ $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
+ ..- attr(*, "label")= chr "Actual Arm Code"
+ $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Actual Arm"
+ $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
+ ..- attr(*, "label")= chr "Country"
+ $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
+ ..- attr(*, "label")= chr "Date/Time of Collection"
+ $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
+ ..- attr(*, "label")= chr "Study Day of Collection"
+ $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Planned Arm"
+ $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Actual Arm"
+ $ TRTSDTM : iso_dtm[1:306], format: "2014-01-02" "2012-08-05" ...
+ $ TRTEDTM : iso_dtm[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
+ $ TRTSDT : Date[1:306], format: "2014-01-02" "2012-08-05" ...
+ $ TRTEDT : Date[1:306], format: "2014-07-02" "2012-09-01" ...
+ $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
+ $ SCRFDT : Date[1:306], format: NA NA ...
+ $ EOSDT : Date[1:306], format: "2014-07-02" "2012-09-02" ...
+ $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
+ $ FRVDT : Date[1:306], format: NA "2013-02-18" ...
+ $ DTHDT : Date[1:306], format: NA NA ...
+ $ DTHDTF : chr [1:306] NA NA NA NA ...
+ $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
+ $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
+ $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
+ $ AGEGR1 : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
+ $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ...
+ $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
+ $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
+ $ LDDTHGR1: chr [1:306] NA NA NA NA ...
+ $ DTH30FL : chr [1:306] NA NA NA NA ...
+ $ DTHA30FL: chr [1:306] NA NA NA NA ...
+ $ DTHB30FL: chr [1:306] NA NA NA NA ...
capture.output(str(adsl_length, give.head=TRUE)) %>%
- as_tibble() %>%
- head(n=7)
- # A tibble: 7 × 1
- value
- <chr>
- 1 "tibble [24 × 23] (S3: tbl_df/tbl/data.frame)"
- 2 " $ STUDYID : chr [1:24] \"mid987650\" \"mid987650\" \"mid987650\" \"mid98765…
- 3 " ..- attr(*, \"format.sas\")= chr \"$\""
- 4 " ..- attr(*, \"SASlength\")= num 21"
- 5 " $ SITEID : num [1:24] 214356 214356 214356 214356 214356 ..."
- 6 " ..- attr(*, \"format.sas\")= chr \"BEST\""
- 7 " ..- attr(*, \"SASlength\")= num 5"
No lengths have been applied to the variables as seen in the printout
+- the lengths would be in the attr
part of each variables.
+Let’s now use xportr_length
to apply our lengths from the
+specification file.
+adsl_length <- adsl %>% xportr_length(var_spec, domain = "ADSL", "message")
Lengths have been successfully applied as viewed for the first 3 variables.
+
+str(adsl_length)
tibble [306 × 48] (S3: tbl_df/tbl/data.frame)
+ $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
+ ..- attr(*, "label")= chr "Study Identifier"
+ ..- attr(*, "width")= num 21
+ $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
+ ..- attr(*, "label")= chr "Unique Subject Identifier"
+ ..- attr(*, "width")= num 30
+ $ SUBJID : chr [1:306] "1015" "1023" "1028" "1033" ...
+ ..- attr(*, "label")= chr "Subject Identifier for the Study"
+ ..- attr(*, "width")= num 8
+ $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
+ ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
+ ..- attr(*, "width")= num 19
+ $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
+ ..- attr(*, "label")= chr "Subject Reference End Date/Time"
+ ..- attr(*, "width")= num 19
+ $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
+ ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
+ ..- attr(*, "width")= num 19
+ $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
+ ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
+ ..- attr(*, "width")= num 19
+ $ RFICDTC : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "label")= chr "Date/Time of Informed Consent"
+ ..- attr(*, "width")= num 19
+ $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
+ ..- attr(*, "label")= chr "Date/Time of End of Participation"
+ ..- attr(*, "width")= num 19
+ $ DTHDTC : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "label")= chr "Date/Time of Death"
+ ..- attr(*, "width")= num 19
+ $ DTHFL : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "label")= chr "Subject Death Flag"
+ ..- attr(*, "width")= num 2
+ $ SITEID : chr [1:306] "701" "701" "701" "701" ...
+ ..- attr(*, "label")= chr "Study Site Identifier"
+ ..- attr(*, "width")= num 5
+ $ AGE : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
+ ..- attr(*, "label")= chr "Age"
+ ..- attr(*, "width")= num 8
+ $ AGEU : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
+ ..- attr(*, "label")= chr "Age Units"
+ ..- attr(*, "width")= num 10
+ $ SEX : chr [1:306] "F" "M" "M" "M" ...
+ ..- attr(*, "label")= chr "Sex"
+ ..- attr(*, "width")= num 1
+ $ RACE : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
+ ..- attr(*, "label")= chr "Race"
+ ..- attr(*, "width")= num 60
+ $ ETHNIC : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
+ ..- attr(*, "label")= chr "Ethnicity"
+ ..- attr(*, "width")= num 100
+ $ ARMCD : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
+ ..- attr(*, "label")= chr "Planned Arm Code"
+ ..- attr(*, "width")= num 20
+ $ ARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Planned Arm"
+ ..- attr(*, "width")= num 200
+ $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
+ ..- attr(*, "label")= chr "Actual Arm Code"
+ ..- attr(*, "width")= num 20
+ $ ACTARM : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Actual Arm"
+ ..- attr(*, "width")= num 200
+ $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
+ ..- attr(*, "label")= chr "Country"
+ ..- attr(*, "width")= num 3
+ $ DMDTC : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
+ ..- attr(*, "label")= chr "Date/Time of Collection"
+ ..- attr(*, "width")= num 19
+ $ DMDY : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
+ ..- attr(*, "label")= chr "Study Day of Collection"
+ ..- attr(*, "width")= num 8
+ $ TRT01P : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Planned Arm"
+ ..- attr(*, "width")= num 40
+ $ TRT01A : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
+ ..- attr(*, "label")= chr "Description of Actual Arm"
+ ..- attr(*, "width")= num 40
+ $ TRTSDTM : iso_dtm[1:306], format: "2014-01-02" "2012-08-05" ...
+ $ TRTEDTM : iso_dtm[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
+ $ TRTSDT : Date[1:306], format: "2014-01-02" "2012-08-05" ...
+ $ TRTEDT : Date[1:306], format: "2014-07-02" "2012-09-01" ...
+ $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
+ ..- attr(*, "width")= num 8
+ $ SCRFDT : Date[1:306], format: NA NA ...
+ $ EOSDT : Date[1:306], format: "2014-07-02" "2012-09-02" ...
+ $ EOSSTT : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
+ ..- attr(*, "width")= num 200
+ $ FRVDT : Date[1:306], format: NA "2013-02-18" ...
+ $ DTHDT : Date[1:306], format: NA NA ...
+ $ DTHDTF : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "width")= num 2
+ $ DTHADY : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
+ ..- attr(*, "width")= num 8
+ $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
+ ..- attr(*, "width")= num 8
+ $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
+ $ AGEGR1 : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
+ ..- attr(*, "width")= num 20
+ $ SAFFL : chr [1:306] "Y" "Y" "Y" "Y" ...
+ ..- attr(*, "width")= num 2
+ $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
+ ..- attr(*, "width")= num 200
+ $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
+ ..- attr(*, "width")= num 80
+ $ LDDTHGR1: chr [1:306] NA NA NA NA ...
+ ..- attr(*, "width")= num 200
+ $ DTH30FL : chr [1:306] NA NA NA NA ...
+ ..- attr(*, "width")= num 200
+ $ DTHA30FL: chr [1:306] NA NA NA NA ...
+ ..- attr(*, "width")= num 200
+ $ DTHB30FL: chr [1:306] NA NA NA NA ...
+ ..- attr(*, "width")= num 200
+ - attr(*, "_xportr.df_arg_")= chr "ADSL"
+Note the additional attr(*, "width")=
after each
+variable with the width. These have been directly applied from the
+specification file that we loaded above!
Please observe that our adsl data does not have any labels associated with it. A CDISC compliant data set needs to have each variable with a variable label.
-look_for(adsl, details = FALSE)
- pos variable label
- 1 STUDYID —
- 2 SITEID —
- 3 USUBJID —
- 4 SUBJID —
- 5 COUNTRY —
- 6 ACOUNTRY —
- 7 AGE —
- 8 AGEU —
- 9 SEX —
- 10 RACE —
- 11 RACEN —
- 12 WEIGHTBL —
- 13 TRT01A —
- 14 TRT01AN —
- 15 SAFFL —
- 16 SCRDT —
- 17 RANDDT —
- 18 TRTSDT —
- 19 TRTSTM —
- 20 TRTEDT —
- 21 TRTETM —
- 22 BRTHDT —
- 23 BRTHDTC —
Using the xport_label
function we can take the specifications file and label all the variables available. xportr_label
will produce a warning message if you the variable in the data set is not in the specification file.
adsl_update <- adsl %>% xportr_label(var_spec, "ADSL", "message")
-
- ── Variable labels missing from metadata. ──
-
- ✓ 7 labels skipped
- Variable(s) present in `.df` but doesn't exist in `metacore`.
- x Problem with `ACOUNTRY`, `RACEN`, `SCRDT`, `TRTSTM`, `TRTETM`, `BRTHDT`, and `BRTHDTC`
- Warning: Length of variable label must be 40 characters or less.
- x Problem with `BMIBLG1N` and `EXACSTDD`.
look_for(adsl_update, details = FALSE)
- pos variable label
- 1 STUDYID Study Identifier
- 2 SITEID Study Site Identifier
- 3 USUBJID Unique Subject Identifier
- 4 SUBJID Subject Identifier for the Study
- 5 COUNTRY Country
- 6 ACOUNTRY
- 7 AGE Age
- 8 AGEU Age Units
- 9 SEX Sex
- 10 RACE Race
- 11 RACEN
- 12 WEIGHTBL Baseline Weight
- 13 TRT01A Actual Treatment for Period 01
- 14 TRT01AN Actual Treatment for Period 01 (N)
- 15 SAFFL Safety Population Flag
- 16 SCRDT
- 17 RANDDT Date of Randomization
- 18 TRTSDT Date of First Exposure to Treatment
- 19 TRTSTM
- 20 TRTEDT Date of Last Exposure to Treatment
- 21 TRTETM
- 22 BRTHDT
- 23 BRTHDTC
Please note that the order of the ADSL
variables, see
+above, does not match specification file order column. We can quickly
+remedy this with a call to xportr_order()
. Note that the
+variable SITEID
has been moved as well as many others to
+match the specification file order column.
+adsl_order <- xportr_order(adsl,var_spec, domain = "ADSL", verbose = "message")
An appropriate data set label must be supplied as well. Currently, the adsl data set has the label ADSL, but it needs the label Subject-Level Analysis Dataset to be compliant with most clinical data set validator application. Here we make use of the data_spec object to supply the apropriate label for the adsl data set.
-adsl_df_lbl <- adsl %>% xportr_df_label(data_spec, "ADSL")
-adsl %>% xportr_varnames("message")
-
- Variable Name Validation passed! No renaming necessary.
-
-
- ── 0 of 23 (0%) variables were renamed ──
-
- # A tibble: 24 × 23
- STUDYID SITEID USUBJID SUBJID COUNTRY ACOUNTRY AGE AGEU SEX RACE RACEN
- <chr> <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
- 1 mid9876… 214356 987650… 1 USA UNITED … 35 YEARS M ASIAN 2
- 2 mid9876… 214356 987650… 2 USA UNITED … 62 YEARS M WHITE 1
- 3 mid9876… 214356 987650… 3 USA UNITED … 27 YEARS F ASIAN 2
- 4 mid9876… 214356 987650… 4 USA UNITED … 42 YEARS M ASIAN 2
- 5 mid9876… 214356 987650… 5 USA UNITED … 59 YEARS F WHITE 1
- 6 mid9876… 214356 987650… 6 USA UNITED … 28 YEARS M WHITE 1
- 7 mid9876… 214356 987650… 7 USA UNITED … 31 YEARS F ASIAN 2
- 8 mid9876… 214356 987650… 8 USA UNITED … 43 YEARS F WHITE 1
- 9 mid9876… 214356 987650… 9 USA UNITED … 35 YEARS M ASIAN 2
- 10 mid9876… 214356 987650… 10 USA UNITED … 62 YEARS M WHITE 1
- # … with 14 more rows, and 12 more variables: WEIGHTBL <dbl>, TRT01A <chr>,
- # TRT01AN <dbl>, SAFFL <chr>, SCRDT <date>, RANDDT <date>, TRTSDT <date>,
- # TRTSTM <time>, TRTEDT <date>, TRTETM <time>, BRTHDT <date>, BRTHDTC <dbl>
-attr(adsl, "label")
+
+xportr_format()
+
+Now we apply formats to the dataset. These will typically be
+DATE9.
, DATETIME20
or TIME5
, but
+many others can be used. Notice that 8 Date/Time variables are missing a
+format in our ADSL
dataset. Here we just take a peak at a
+few TRT
variables, which have a NULL
+format.
+
+attr(adsl$TRTSDT, "format.sas")
+ NULL
+attr(adsl$TRTEDT, "format.sas")
+ NULL
+attr(adsl$TRTSDTM, "format.sas")
+ NULL
+attr(adsl$TRTEDTM, "format.sas")
NULL
-
-
+Using our xportr_format()
we apply our formats.
+
+adsl_fmt <- adsl %>% xportr_format(var_spec, domain = "ADSL", "message")
+
There are several constraints placed upon variable naming. According to modern CDISC Implementation guides and the latest FDA conformance guide, all ADaM variable names must be:
-no more than 8 characters in length
start with a letter (not an underscore)
be comprised of only uppercase letters (A-Z), numerals (0-9)
free of non-ASCII text, symbols, or underscores. Note: Underscores are permitted only for legacy studies started on or before Dec 17th, 2016
For strings containing variable names xportr_tidy_rename()
was designed to identify and rename variables into compliance. Thus, you can use this function to rename the spec file variables or column names on the fly. Below, we use it’s default options to tidy names for submission compliance.
renamed_var_spec <- var_spec %>%
- filter(dataset == "ADSL") %>%
- mutate(tidy_variable = xportr_tidy_rename(variable)
- ) %>%
- select(order, dataset, variable, tidy_variable, tidyselect::everything())
-
- The following variable name validation checks failed:
- * Cannot contain any lowercase characters Variables `POPyFL`, `REGIONy`, `COVARy`, `STRATy`, `STRATyN`, `RACEGRy`, `RACEGRyN`, `RFICyDT`, `TRxxAGy`, `TRxxAGyN`, `EXNPxxM`, `EXNPxxMC`, `FENOBLGy`, `EOSBLGy`, `ICSENTGx`, and `OCSENTGx`.
-
- ── 16 of 124 (12.9%) variables were renamed ──
-
- Var 20: 'POPyFL' was renamed to 'POPYFL'
- Var 58: 'REGIONy' was renamed to 'REGIONY'
- Var 64: 'COVARy' was renamed to 'COVARY'
- Var 104: 'STRATy' was renamed to 'STRATY'
- Var 105: 'STRATyN' was renamed to 'STRATYN'
- Var 110: 'RACEGRy' was renamed to 'RACEGRY'
- Var 111: 'RACEGRyN' was renamed to 'RACEGRYN'
- Var 112: 'RFICyDT' was renamed to 'RFICYDT'
- Var 113: 'TRxxAGy' was renamed to 'TRXXAGY'
- Var 114: 'TRxxAGyN' was renamed to 'TRXXAGYN'
- Var 118: 'EXNPxxM' was renamed to 'EXNPXXM'
- Var 119: 'EXNPxxMC' was renamed to 'EXNPXXMC'
- Var 120: 'FENOBLGy' was renamed to 'FENOBLGY'
- Var 121: 'EOSBLGy' was renamed to 'EOSBLGY'
- Var 122: 'ICSENTGx' was renamed to 'ICSENTGX'
- Var 124: 'OCSENTGx' was renamed to 'OCSENTGX'
-
- All renamed variables passed validation.
Note the above messages detail the rule(s) and variable names that were out of compliance, the number of renamed variables, and even the old and newly tidied names. The function uses a step-wise renaming algorithm to maintain the original variable names characteristics as much as possible. Below is the view of the the old variable names juxtaposed the new.
+Please observe that our ADSL
dataset is missing many
+variable labels. Sometimes these labels can be lost while using R’s
+function. However, A CDISC compliant data set needs to have each
+variable with a variable label.
look_for(adsl, details = FALSE)
+ pos variable label
+ 1 STUDYID Study Identifier
+ 2 USUBJID Unique Subject Identifier
+ 3 SUBJID Subject Identifier for the Study
+ 4 RFSTDTC Subject Reference Start Date/Time
+ 5 RFENDTC Subject Reference End Date/Time
+ 6 RFXSTDTC Date/Time of First Study Treatment
+ 7 RFXENDTC Date/Time of Last Study Treatment
+ 8 RFICDTC Date/Time of Informed Consent
+ 9 RFPENDTC Date/Time of End of Participation
+ 10 DTHDTC Date/Time of Death
+ 11 DTHFL Subject Death Flag
+ 12 SITEID Study Site Identifier
+ 13 AGE Age
+ 14 AGEU Age Units
+ 15 SEX Sex
+ 16 RACE Race
+ 17 ETHNIC Ethnicity
+ 18 ARMCD Planned Arm Code
+ 19 ARM Description of Planned Arm
+ 20 ACTARMCD Actual Arm Code
+ 21 ACTARM Description of Actual Arm
+ 22 COUNTRY Country
+ 23 DMDTC Date/Time of Collection
+ 24 DMDY Study Day of Collection
+ 25 TRT01P Description of Planned Arm
+ 26 TRT01A Description of Actual Arm
+ 27 TRTSDTM —
+ 28 TRTEDTM —
+ 29 TRTSDT —
+ 30 TRTEDT —
+ 31 TRTDURD —
+ 32 SCRFDT —
+ 33 EOSDT —
+ 34 EOSSTT —
+ 35 FRVDT —
+ 36 DTHDT —
+ 37 DTHDTF —
+ 38 DTHADY —
+ 39 LDDTHELD —
+ 40 LSTALVDT —
+ 41 AGEGR1 —
+ 42 SAFFL —
+ 43 RACEGR1 —
+ 44 REGION1 —
+ 45 LDDTHGR1 —
+ 46 DTH30FL —
+ 47 DTHA30FL —
+ 48 DTHB30FL —
The function xportr_varnames()
takes xportr_tidy_rename
a step further to help users change the data.frame columns directly. It also is slightly less flexible from a customization perspective, attempting to follow the submission constraints precisely where as xportr_tidy_rename
can be used for a wider breadth of renaming applications. Executing the code below, we create a new data.frame called adsl_renamed
, but the variable names are already compliant. The identical
function below shows us that nothing has changed.
adsl_renamed <- adsl %>% xportr_varnames()
-
- Variable Name Validation passed! No renaming necessary.
-
- ── 0 of 23 (0%) variables were renamed ──
-
-identical(adsl, adsl_renamed)
- [1] TRUE
In the interest of showcasing xportr_varnames
capabilities, we create a fictional data.frame adxx
riddled with non-compliant variable names below. In fact, adxx
only has one valid variable name: “STUDYID”. Calling the function shows this data.frame’s variables violate all four compliance rules and proceeds to rename all but “STUDYID”.
varnames <- c("", "STUDYID", "studyid", "subject id", "1c. ENT", "1b. Eyes",
- "1d. Lungs", "1e. Heart", "year number", "1a. Skin_Desc")
-adxx <- data.frame(matrix(0, ncol = 10, nrow = 3))
-colnames(adxx) <- varnames
-
-xportr_varnames(adxx) # default behavior
-
- The following variable name validation checks failed:
- * Must be 8 characters or less: Variables `subject id`, `1d. Lungs`, `1e. Heart`, `year number`, and `1a. Skin_Desc`.
- * Must start with a letter: Variables `1c. ENT`, `1b. Eyes`, `1d. Lungs`, `1e. Heart`, and `1a. Skin_Desc`.
- * Cannot contain any non-ASCII, symbol or underscore characters: Variables `subject id`, `1c. ENT`, `1b. Eyes`, `1d. Lungs`, `1e. Heart`, `year number`, and `1a. Skin_Desc`.
- * Cannot contain any lowercase characters Variables ``, `studyid`, `subject id`, `1c. ENT`, `1b. Eyes`, `1d. Lungs`, `1e. Heart`, `year number`, and `1a. Skin_Desc`.
-
- ── 9 of 10 (90%) variables were renamed ──
-
- Var 1: '' was renamed to 'V1'
- Var 3: 'studyid' was renamed to 'STUDYID2'
- Var 4: 'subject id' was renamed to 'SUBJECTD'
- Var 5: '1c. ENT' was renamed to 'ENT1C'
- Var 6: '1b. Eyes' was renamed to 'EYES1B'
- Var 7: '1d. Lungs' was renamed to 'LUNGS1D'
- Var 8: '1e. Heart' was renamed to 'HEART1E'
- Var 9: 'year number' was renamed to 'YEARNUMB'
- Var 10: '1a. Skin_Desc' was renamed to 'SKNDSC1A'
-
- All renamed variables passed validation.
- V1 STUDYID STUDYID2 SUBJECTD ENT1C EYES1B LUNGS1D HEART1E YEARNUMB SKNDSC1A
- 1 0 0 0 0 0 0 0 0 0 0
- 2 0 0 0 0 0 0 0 0 0 0
- 3 0 0 0 0 0 0 0 0 0 0
You may have observed in the previous output that all variables starting with a number were bundled and relocated to the END of the term. For example, 1d. Lungs" became “LUNGS1D”. To maintain the numeric prefix at the start (or left-hand side), users can set a letter [A-Z] to prefix the numeric prefix. Below we set letter_for_num_prefix
to “x” and relo_2_end
to FALSE
.
Second, if your organization holds an ontology of controlled terms, you can then map non-compliant variable names to any desired result using the dict_dat
argument. It’s essentially provides a “find any replace” functionality for any instance of the term(s) found in the data. In the example below, we want “subject id” to be portrayed as “SUBJID”. Thus, we created a data.frame called my_dictionary
that maps that relationship.
my_dictionary <- data.frame(original_varname = "subject id", dict_varname = "SUBJID")
-xportr_varnames(adxx,
- relo_2_end = FALSE,
- letter_for_num_prefix = "x",
- dict_dat = my_dictionary) # 'SUBJID' used
-
- The following variable name validation checks failed:
- * Must be 8 characters or less: Variables `subject id`, `1d. Lungs`, `1e. Heart`, `year number`, and `1a. Skin_Desc`.
- * Must start with a letter: Variables `1c. ENT`, `1b. Eyes`, `1d. Lungs`, `1e. Heart`, and `1a. Skin_Desc`.
- * Cannot contain any non-ASCII, symbol or underscore characters: Variables `subject id`, `1c. ENT`, `1b. Eyes`, `1d. Lungs`, `1e. Heart`, `year number`, and `1a. Skin_Desc`.
- * Cannot contain any lowercase characters Variables ``, `studyid`, `subject id`, `1c. ENT`, `1b. Eyes`, `1d. Lungs`, `1e. Heart`, `year number`, and `1a. Skin_Desc`.
-
- ── 9 of 10 (90%) variables were renamed ──
-
- Var 1: '' was renamed to 'V1'
- Var 3: 'studyid' was renamed to 'STUDYID2'
- Var 4: 'subject id' was renamed to 'SUBJID'
- Var 5: '1c. ENT' was renamed to 'X1CENT'
- Var 6: '1b. Eyes' was renamed to 'X1BEYES'
- Var 7: '1d. Lungs' was renamed to 'X1DLUNGS'
- Var 8: '1e. Heart' was renamed to 'X1EHEART'
- Var 9: 'year number' was renamed to 'YEARNUMB'
- Var 10: '1a. Skin_Desc' was renamed to 'X1SKNDSC'
-
- All renamed variables passed validation.
- V1 STUDYID STUDYID2 SUBJID X1CENT X1BEYES X1DLUNGS X1EHEART YEARNUMB X1SKNDSC
- 1 0 0 0 0 0 0 0 0 0 0
- 2 0 0 0 0 0 0 0 0 0 0
- 3 0 0 0 0 0 0 0 0 0 0
Please review the documentation using ?xportr_tidy_rename
OR ?xportr_varnames
to learn how the abbreviation algorithm works and to further customize the renaming of your variable names using a slurry of additional arguments.
Using the xport_label
function we can take the
+specifications file and label all the variables available.
+xportr_label
will produce a warning message if you the
+variable in the data set is not in the specification file.
+adsl_update <- adsl %>% xportr_label(var_spec, domain = "ADSL", "message")
look_for(adsl_update, details = FALSE)
+ pos variable label
+ 1 STUDYID Study Identifier
+ 2 USUBJID Unique Subject Identifier
+ 3 SUBJID Subject Identifier for the Study
+ 4 RFSTDTC Subject Reference Start Date/Time
+ 5 RFENDTC Subject Reference End Date/Time
+ 6 RFXSTDTC Date/Time of First Study Treatment
+ 7 RFXENDTC Date/Time of Last Study Treatment
+ 8 RFICDTC Date/Time of Informed Consent
+ 9 RFPENDTC Date/Time of End of Participation
+ 10 DTHDTC Date / Time of Death
+ 11 DTHFL Subject Death Flag
+ 12 SITEID Study Site Identifier
+ 13 AGE Age
+ 14 AGEU Age Units
+ 15 SEX Sex
+ 16 RACE Race
+ 17 ETHNIC Ethnicity
+ 18 ARMCD Planned Arm Code
+ 19 ARM Description of Planned Arm
+ 20 ACTARMCD Actual Arm Code
+ 21 ACTARM Description of Actual Arm
+ 22 COUNTRY Country
+ 23 DMDTC Date/Time of Collection
+ 24 DMDY Study Day of Collection
+ 25 TRT01P Planned Treatment for Period 01
+ 26 TRT01A Actual Treatment for Period 01
+ 27 TRTSDTM Datetime of First Exposure to Treatment
+ 28 TRTEDTM Datetime of Last Exposure to Treatment
+ 29 TRTSDT Date of First Exposure to Treatment
+ 30 TRTEDT Date of Last Exposure to Treatment
+ 31 TRTDURD Total Duration of Trt (days)
+ 32 SCRFDT Screen Failure Date
+ 33 EOSDT End of Study Date
+ 34 EOSSTT End of Study Status
+ 35 FRVDT Final Retrievel Visit Date
+ 36 DTHDT Death Date
+ 37 DTHDTF Date of Death Imputation Flag
+ 38 DTHADY Relative Day of Death
+ 39 LDDTHELD Elapsed Days from Last Dose to Death
+ 40 LSTALVDT Date Last Known Alive
+ 41 AGEGR1 Pooled Age Group 1
+ 42 SAFFL Safety Population Flag
+ 43 RACEGR1 Pooled Race Group 1
+ 44 REGION1 Geographic Region 1
+ 45 LDDTHGR1 Last Does to Death Group
+ 46 DTH30FL Under 30 Group
+ 47 DTHA30FL Over 30 Group
+ 48 DTHB30FL Over 30 plus 30 days Group
Finally, we arrive at exporting the R data frame object as a xpt file with the function xportr_write
. The xpt file will be written directly to your current working directory. To make it more interesting, we have put together all five function with the magrittr pipe. A user can now coerce, apply length, variable labels, data set label and write out their final xpt file in one pipe! Appropriate warnings and messages will be supplied to a user for any potential issues before sending off to standard clinical data set validator application or data reviewers.
-
-adsl %>%
- xportr_type(var_spec, "ADSL", "message") %>%
- xportr_length(var_spec, "ADSL", "message") %>%
- xportr_label(var_spec, "ADSL", "message") %>%
- xportr_df_label(data_spec, "ADSL") %>%
- xportr_varnames("message") %>%
- xportr_write("adsl.xpt")
Optionally, leave out xportr_varnames and instead use tidy_varnames = TRUE
in xportr_write() to accept the recommended defaults of xportr_tidy_rename
.
-# No xportr_varnames()!
-adsl %>%
- xportr_type(var_spec, "ADSL", "message") %>%
- xportr_length(var_spec, "ADSL", "message") %>%
- xportr_label(var_spec, "ADSL", "message") %>%
- xportr_df_label(data_spec, "ADSL") %>%
- xportr_write("adsl.xpt", tidy_varnames = TRUE)
Finally, we arrive at exporting the R data frame object as a xpt file
+with the function xportr_write()
. The xpt file will be
+written directly to your current working directory. To make it more
+interesting, we have put together all six functions with the magrittr
+pipe, %>%
. A user can now apply types, length, variable
+labels, formats, data set label and write out their final xpt file in
+one pipe! Appropriate warnings and messages will be supplied to a user
+to the console for any potential issues before sending off to standard
+clinical data set validator application or data reviewers.
+adsl %>%
+ xportr_type(var_spec, "ADSL", "message") %>%
+ xportr_length(var_spec, "ADSL", "message") %>%
+ xportr_label(var_spec, "ADSL", "message") %>%
+ xportr_order(var_spec, "ADSL", "message") %>%
+ xportr_format(var_spec, "ADSL", "message") %>%
+ xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")
That’s it! We now have a xpt file created in R with all appropriate +types, lengths, labels, ordering and formats from our specification +file.
+As always, we welcome your feedback. If you spot a bug, would like to +see a new feature, or if any documentation is unclear - submit an issue +on xportr’s +Github page.
Developed by Eli Miller, Vignesh Thanikachalam, Ben Straub, Ross Didenko.
+ +Developed by Eli Miller, Vignesh Thanikachalam, Ben Straub, Ross Didenko.
DESCRIPTION
+ Vignesh Thanikachalam. Author, maintainer. -
-Ben Straub. Author. -
-Ross Didenko. Author. -
-Atorus/GSK JPT. Copyright holder. -
-Miller E, Thanikachalam V, Straub B, Didenko R (2022). +xportr: Utilities to Output CDISC SDTM/ADaM XPT Files. +R package version 0.1.0, https://github.com/atorus-research/xportr. +
+@Manual{, + title = {xportr: Utilities to Output CDISC SDTM/ADaM XPT Files}, + author = {Eli Miller and Vignesh Thanikachalam and Ben Straub and Ross Didenko}, + year = {2022}, + note = {R package version 0.1.0}, + url = {https://github.com/atorus-research/xportr}, +}
Welcome to xportr
! We have designed xportr
to help get your xpt files ready for transport either to a clinical data set validator application or to a regulatory agency This package has the functionality to associate all metadata information to a local R data frame, perform data set level validation checks and convert into a transport v5 file(xpt).
As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr’s Github page.
-Welcome to xportr
! We have designed xportr
to help get your xpt files ready for transport either to a clinical data set validator application or to a regulatory agency. This package has the functionality to associate metadata information to a local R data frame, perform data set level validation checks and convert into a transport v5 file(xpt).
As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr’s GitHub page.
+
-devtools::install_github("https://github.com/atorus-research/xportr.git")
devtools::install_github("https://github.com/atorus-research/xportr.git", ref = "main")
xportr
is designed for clinical programmers to create CDISC compliant xpt files- ADaM or SDTM. Essentially, this package has two big components to it - writing xpt files with well-defined metadata and checking compliance of the data sets. The first set of tools are designed to allow a clinical programmer to build a CDISC compliant xpt file directly from R. The second set of tools are to perform checks on your data sets before you send them off to any validators or data reviewers.
xportr
is designed for clinical programmers to create CDISC compliant xpt files- ADaM or SDTM. Essentially, this package has two big components to it
The first set of tools are designed to allow a clinical programmer to build a CDISC compliant xpt file directly from R. The second set of tools are to perform checks on your data sets before you send them off to any validators or data reviewers.
NOTE: Each check has associated messages and warning.
-The first example involves an ADSL data set in the .sas7bdat
format with associated specification in the .xlsx
format.
Objective: Create a fully compliant v5 xpt ADSL
dataset that was developed using R.
To do this we will need to do the following:
+All of which can be done using a well-defined specification file and the xportr
package!
First we will start with our ADSL
dataset created in R. This example ADSL
dataset is taken from the {admiral}
package. The script that generates this ADSL
dataset can be created by using this command admiral::use_ad_template("adsl")
. This ADSL
dataset has 306 observations and 48 variables.
-adsl <- haven::read_sas("inst/extdata/adsl.sas7bdat")
+library(dplyr)
+library(admiral)
+library(xportr)
-var_spec <- readxl::read_xlsx("inst/specs/ADaM_spec.xlsx", sheet = "Variables") %>%
- dplyr::rename(type = "Data Type") %>%
- rlang::set_names(tolower)
-
-data_spec <- readxl::read_xlsx("inst/specs/ADaM_spec.xlsx", sheet = "Datasets") %>%
- rlang::set_names(tolower) %>%
- dplyr::rename(label = "description")
-
-adsl %>%
- xportr_type(var_spec, "ADSL", "message") %>%
- xportr_length(var_spec, "ADSL", "message") %>%
- xportr_label(var_spec, "ADSL", "message") %>%
- xportr_df_label(data_spec, "ADSL") %>%
- xportr_write("adsl.xpt")
Please check out the Get Started for more information.
-We are in talks with other Pharma companies involved with the {pharmaverse}
to enhance this package to play well with other downstream and upstream packages.
We have created a dummy specification file called ADaM_admiral_spec.xlsx
found in the specs
folder of this package. You can use system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr")
to access this file.
+spec_path <- system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr")
+
+var_spec <- readxl::read_xlsx(spec_path, sheet = "Variables") %>%
+ dplyr::rename(type = "Data Type") %>%
+ rlang::set_names(tolower)
Each xportr_
function has been written in a way to take in a part of the specification file and apply that piece to the dataset.
+adsl %>%
+ xportr_type(var_spec, "ADSL") %>%
+ xportr_length(var_spec, "ADSL") %>%
+ xportr_label(var_spec, "ADSL") %>%
+ xportr_order(var_spec, "ADSL") %>%
+ xportr_format(var_spec, "ADSL") %>%
+ xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")
That’s it! We now have a xpt file created in R with all appropriate types, lengths, labels, ordering and formats. Please check out the Get Started for more information and detailed walk through of each xportr_
function.
We are in talks with other Pharma companies involved with the {pharmaverse}
to enhance this package to play well with other downstream and upstream packages.
Developed by Eli Miller, Vignesh Thanikachalam, Ben Straub, Ross Didenko.
+ +Developed by Eli Miller, Vignesh Thanikachalam, Ben Straub, Ross Didenko.