Skip to content

Commit

Permalink
docs: #84 finsihing core function examples
Browse files Browse the repository at this point in the history
  • Loading branch information
bms63 committed Jun 7, 2023
1 parent e03ca10 commit 6c373e7
Showing 1 changed file with 56 additions and 18 deletions.
74 changes: 56 additions & 18 deletions vignettes/deepdive.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ local({

# Introduction

This vignette will explore in detail all the possibilities of the `{xportr}` package for applying information from a metadata object to a data sets using the core `{xportr}` functions.
This vignette will explore in detail all the possibilities of the `{xportr}` package for applying information from a metadata object to an R created dataset using the core `{xportr}` functions.

We will also explore the following:

* What goes in a Submission to a Health Authority?
* What goes in a Submission to a Health Authority and the role `{xportr}` plays in that Submission?
* What is `{xportr}` validating behind the scenes?
* Breakdown of `{xportr}` and a ADaM dataset specification file.
* Using `options()` and `xportr_metadata()` to enhance your `{xportr}` experience.
Expand Down Expand Up @@ -78,9 +78,9 @@ As both Data Packages need compliant `xpt` files, we feel that `{xportr}` can pl

## What is `{xportr}` validating in these Data Packages?

The `xpt` Version 5 files form the backbone of any successful Submission and are govern by quite a lot of rules and suggested guidelines. As you are preparing your packages for submission the suite of `{xportr}` functions and `xprotr_write()` help to check that your datasets are submission compliant. The package checks many of the latest rules laid out in the [Study Data Technical Conformance Guide](https://www.fda.gov/regulatory-information/search-fda-guidance-documents/study-data-technical-conformance-guide-technical-specifications-document), but please note that it is not yet an exhaustive list of checks. We envision that users are also submitting their `xpts` and metadata to additional validation software.
The `xpt` Version 5 files form the backbone of any successful Submission and are govern by quite a lot of rules and suggested guidelines. As you are preparing your data packages for submission the suite of `{xportr}` functions and `xprotr_write()` help to check that your datasets are submission compliant. The package checks many of the latest rules laid out in the [Study Data Technical Conformance Guide](https://www.fda.gov/regulatory-information/search-fda-guidance-documents/study-data-technical-conformance-guide-technical-specifications-document), but please note that it is not yet an exhaustive list of checks. We envision that users are also submitting their `xpts` and metadata to additional validation software.

Each of the core `{xportr}` functions for applying labels, types, formats, order and lengths provide feedback to users on submission compliance. However, a final check is implemented when `xportr_write()` is called to create the `xpt`. `xportr_write()` calls [`xpt_validate()`](https://github.com/atorus-research/xportr/blob/231e959b84aa0f1e71113c85332de33a827e650a/R/utils-xportr.R#L174), which is a behind the scenes/non-exported function not available to users that does a final check for compliance. At the time of `{xportr} v0.3` we are checking the following when a user writes out an `xpt` file.:
Each of the core `{xportr}` functions for applying labels, types, formats, order and lengths provide feedback to users on submission compliance. However, a final check is implemented when `xportr_write()` is called to create the `xpt`. `xportr_write()` calls [`xpt_validate()`](https://github.com/atorus-research/xportr/blob/231e959b84aa0f1e71113c85332de33a827e650a/R/utils-xportr.R#L174), which is a behind the scenes/non-exported function not available to users that does a final check for compliance. At the time of `{xportr} v0.3.0` we are checking the following when a user writes out an `xpt` file.:

<img src="xpt_validate.png" alt="validate" style="width:800px;"/>

Expand Down Expand Up @@ -168,11 +168,11 @@ adsl %>%

For the next six sections, we are going to explore the Warnings and Errors messages generated by the `{xportr}` core functions. To better explore these, we will either manipulate the ADaM dataset or specification file to help showcase the ability of the `{xportr}` functions to detect issues.

**NOTE:** We have made the ADSL, `adsl` and Spec, `var_spec` available in this package. Users can find additionl datasets and specification files on our [repo](https://github.com/atorus-research/xportr) in the `example_data_specs` folder. This is to keep the package to a minimum size.
**NOTE:** We have made the ADSL, `adsl`, and Specificaion File, `var_spec` available in this package. Users can find additionl datasets and specification files on our [repo](https://github.com/atorus-research/xportr) in the `example_data_specs` folder. This is to keep the package to a minimum size.

### Setting up our metadata object

First, lets read in the specification file and call it `var_spec`. We will also do some slight manipulation to the columns names by doing all lower case and changing `Data Type` to `type`. You can also use `options()` for this step as well. The `var_spec` object has five dataset specification files in in stack ontop of each other. We will make use of the `ADSL` section.
First, lets read in the specification file and call it `var_spec`. Note that we are not using `options()` here. We will do some slight manipulation to the columns names by doing all lower case and changing `Data Type` to `type`. You can also use `options()` for this step as well. The `var_spec` object has five dataset specification files in in stack ontop of each other. We will make use of the `ADSL` section. You can make use of the Search field above the dataset column to subset the specification file for `ADSL`

```{r}
var_spec <- var_spec %>%
Expand All @@ -190,7 +190,8 @@ columns2hide <- c(
datatable(
var_spec,
rownames = FALSE,
extensions = "Buttons", options = list(
filter = 'top',
options = list(
dom = "Bfrtip",
columnDefs = list(list(visible = FALSE, targets = columns2hide))
)
Expand All @@ -199,7 +200,7 @@ datatable(

## `xportr_type()`

We are going to explore the type column in the metadata object. A submission to a Health Authority should only have character and numeric types in the data. In the `ADSL` data we will have several columns that are in the Date type: `TRTSDT`, `TRTEDT`, `DISONSDT`, `VISIT1DT` and `RFENDT` and we will change one variable type to a factor variable for educational purposes.
We are going to explore the type column in the metadata object. A submission to a Health Authority should only have character and numeric types in the data. In the `ADSL` data we have several columns that are in the Date type: `TRTSDT`, `TRTEDT`, `DISONSDT`, `VISIT1DT` and `RFENDT`. We will change one variable type to a [factor variable](https://forcats.tidyverse.org/), which is a common data structure in R.

```{r}
adsl_fct <- adsl %>%
Expand All @@ -224,21 +225,21 @@ adsl_type_glimpse <- adsl_type %>%
select(STUDYID, TRTSDT, TRTEDT, DISONSDT, VISIT1DT, RFENDT)
```

Success! As we can see below the `xportr_type()` function applied the types from the metadata object to the below columns converting them all to the proper type.
Success! As we can see below the `xportr_type()` function applied the types from the metadata object to the below columns converting them all to the proper type. The functions in `{xportr}` also display this coercion to the user in the console, which is seen above.

```{r, echo = TRUE}
glimpse(adsl_type_glimpse)
```

Note that the `xportr_type(verbpse = "warn")` was set so the function has provided feedback, which would show up in the console, on which variables were converted as a warning message. However, you can set `verbose = 'stop'` so that the types are not applied as the data does not match what is in the specification file.
Note that the `xportr_type(verbpse = "warn")` was set so the function has provided feedback, which would show up in the console, on which variables were converted as a warning message. However, you can set `verbose = 'stop'` so that the types are not applied as the data does not match what is in the specification file. Using `verbose = 'stop'` will instantly stop the processing of this function and not create the object.

```{r, echo = TRUE, error = TRUE}
adsl_type <- xportr_type(.df = adsl, metadata = var_spec, domain = "ADSL", verbose = "stop")
```

## `xportr_length()`

Next we will use `xportr_length()` to apply the length column of the _metadata object_ to `ADSL` dataset.
Next we will use `xportr_length()` to apply the length column of the _metadata object_ to the `ADSL` dataset. Using the `str()` function we have displayed al the variables with their attributes. You can see that each variable has a label, but there is no information on the lengths of the variable.

```{r, max.height='300px', attr.output='.numberLines', echo = FALSE}
str(adsl)
Expand All @@ -251,10 +252,16 @@ TODO: There is no message to users about how many lengths were applied to the da
adsl_length <- xportr_length(.df = adsl, metadata = var_spec, domain = "ADSL", verbose = "warn")
```

Using the `xportr_length()` function with `verbose = 'warn'` we can apply the length column to all the columns in the dataset. The function detects that two variables, `TRTDUR` and `DCREASCD` are missing from the metadata file. Note that the variables have slight misspellings differences in the dataset and metadata, which is a great catch!

Using the `str()` function, you can see below that the `xportr_length()` function successfully applied all the lengths of the variable to the variables in the dataset.


```{r, max.height='300px', attr.output='.numberLines', echo = FALSE}
str(adsl_length)
```

Just like we did for `xportr_type()`, setting `verbose = 'stop'` immediately stops R from processing the lengths. Here the function detects the missing variables and will not apply any lengths to the dataset until corrective action is applied.

```{r, echo = TRUE, error = TRUE}
adsl_length <- xportr_length(.df = adsl, metadata = var_spec, domain = "ADSL", verbose = "stop")
Expand All @@ -263,26 +270,53 @@ adsl_length <- xportr_length(.df = adsl, metadata = var_spec, domain = "ADSL", v

## `xportr_label()`

TODO: Incorrect label applied, but label still applied along with 48 other labels. We should give user feedback on the labels still being applied.
As you are creating your dataset in R you will often find that R removes the label of your variable. Using `xportr_label()` you can easily re-apply all your labels to your variables in one quick action.

For this example, we are going to manipulate both the metadata and the `ADSL` dataset:

TODO: Incorrect label applied, none and message still give warning when I have asked it not to do that.
* The metatdata will have the variable `TRTSDT` label be greater than 40 characters.
* The `ADSL` dataset will have all its labels stripped from it.

TODO: Weird characters in outputs.
Remember in the length example, the labels were on the original dataset as seen in the `str()` output.

```{r, echo = TRUE}
var_spec_lbl <- var_spec %>%
mutate(label = if_else(variable == "TRTSDT",
"Length of variable label must be 40 characters or less", label
))
adsl_lbl <- xportr_label(adsl, var_spec_lbl, "ADSL", verbose = "warn")
adsl_lbl <- adsl
adsl_lbl[] <- lapply(adsl_lbl, function(x) { attributes(x) <- NULL; x })
```

## `xportr_order()`
We have successfully removed all the labels.

TODO: I think there is something wrong with `xportr_order` as it is reordering the entire dataframe to something I don't fully understand.
```{r, max.height='300px', attr.output='.numberLines', echo = FALSE}
str(adsl_lbl)
```

Using `xportr_label()` we will apply all the labels from our metadata to the dataset. Please note again that we are using `verbose = 'warn'` and the same two issues for `TRTDUR` and `DCREASCD` are reported as missing from the metadata file. An additional message is sent around the `TRTSDT` label having a length of greater than 40.

```{r}
adsl_lbl <- xportr_label(.df = adsl_lbl, metadata = var_spec_lbl, domain = "ADSL", verbose = "warn")
```

Success! All labels have been applied that are present in the both the metadata and the dataset. However, please note that the `TRTSDT` variable has the label with characters greater than 40.

```{r, max.height='300px', attr.output='.numberLines', echo = FALSE}
str(adsl_lbl)
```

Just like we did for the other functions, setting `verbose = 'stop'` immediately stops R from processing the labels Here the function detects the missing variables and labels greater than 40 and will not apply any labels to the dataset until corrective action is applied.

```{r, echo = TRUE, error = TRUE}
adsl_label <- xportr_label(.df = adsl_lbl, metadata = var_spec_lbl, domain = "ADSL", verbose = "stop")
```


## `xportr_order()`

TODO: What about a check on have a non-numeric value in the ordering column? I put an X in there and it did not care.

```{r}
library(dplyr)
Expand All @@ -301,6 +335,10 @@ adsl_ord <- xportr_order(.df = adsl, metadata = var_spec, domain = "ADSL", verbo
glimpse(adsl_ord)
```

TODO: I think there is something wrong with `xportr_order` as it is reordering the entire dataframe to something I don't fully understand.

TODO: What about a check on have a non-numeric value in the ordering column? I put an X in there and it did not care.

## `xportr_format()`

TODO: No warning issue for incorrect format type. I put in a "DATA" format and it applied the format even though it is not a valid one.
Expand Down

0 comments on commit 6c373e7

Please sign in to comment.