diff --git a/docs/404.html b/docs/404.html index 3231886..9853c7d 100644 --- a/docs/404.html +++ b/docs/404.html @@ -32,7 +32,7 @@ madshapR - 1.0.3.0003 + 1.0.3 diff --git a/docs/articles/index.html b/docs/articles/index.html index e85433d..09d16c5 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -17,7 +17,7 @@ madshapR - 1.0.3.0003 + 1.0.3 diff --git a/docs/articles/madshapR-vignette.html b/docs/articles/madshapR-vignette.html index b80e0ea..8a1e80a 100644 --- a/docs/articles/madshapR-vignette.html +++ b/docs/articles/madshapR-vignette.html @@ -33,7 +33,7 @@ madshapR - 1.0.3.0003 + 1.0.3 @@ -91,14 +91,14 @@

madshapR-vignette

madshapR

-

The goal of madshapR is to provide functions to support rigorous -processes in data cleaning, evaluation, and documentation across -datasets from different studies based on Maelstrom Research guidelines. -The package includes the core functions to evaluate and format the main -inputs that define the process, diagnose errors, and summarize and -evaluate datasets and their associated data dictionaries. The main -outputs are clean datasets and associated metadata, and tabular and -visual summary reports.

+

The madshapR package provides functions for efficient data cleaning, +evaluation, and documentation across different datasets. It was +developed to support work at Maelstrom Research and includes functions +to evaluate and summarize datasets and their associated data +dictionaries, identify potential issues in content and structure, and +prepare datasets and metadata for further processing. The key outputs +provided by the functions are formatted datasets, standardized metadata, +and tabular and visual summary reports.

Get started @@ -107,15 +107,18 @@

Get startedInstall the package

-# To update the R package in your R environment you may first need to remove 
-# it, and use the exit command quit() to terminate the current R session.
-
-# To install the R package:
+# To install madshapR:
 install.packages('madshapR')
-library(madshapR) 
 
-#if you need help with the package, please use:
-madshapR_help()
+library(madshapR) +# If you need help with the package, please use: +madshapR_help() + +# Downloadable templates are available here +madshapR_templates() + +# Demo files are available here, along with an online demonstration process +madshapR_DEMO
diff --git a/docs/authors.html b/docs/authors.html index db044fd..5f5841d 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -17,7 +17,7 @@ madshapR - 1.0.3.0003 + 1.0.3 @@ -66,19 +66,35 @@

Authors

Maelstrom-research group. Copyright holder, funder.

-
  • -

    Alexandre Trottier. Contributor. -

    -
  • -
  • -

    Tina Wey. Contributor. -

    -
  • -
  • -

    Samuel El Bouzaïdi Tiali. Contributor. -

    -
  • + +
    +
    +

    Acknowledgment

    + + +We extend our heartfelt appreciation to all the members of the Maelstrom Research +team and our valued partners who have played an integral role in making this project +possible. Special thanks go to: + + + +Thank you for your outstanding contributions. + +
    +
    + +

    Citation

    @@ -89,13 +105,13 @@

    Citation

    Fabre G (2023). madshapR: Support Technical Processes Following 'Maelstrom Research' Standards. -R package version 1.0.3.0003, https://github.com/maelstrom-research/madshapR. +R package version 1.0.3, https://github.com/maelstrom-research/madshapR.

    @Manual{,
       title = {madshapR: Support Technical Processes Following 'Maelstrom Research' Standards},
       author = {Guillaume Fabre},
       year = {2023},
    -  note = {R package version 1.0.3.0003},
    +  note = {R package version 1.0.3},
       url = {https://github.com/maelstrom-research/madshapR},
     }
    diff --git a/docs/index.html b/docs/index.html index 5400787..c7bd345 100644 --- a/docs/index.html +++ b/docs/index.html @@ -41,7 +41,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -91,7 +91,7 @@
    -

    The goal of madshapR is to provide functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports.

    +

    The madshapR package provides functions for efficient data cleaning, evaluation, and documentation across different datasets. It was developed to support work at Maelstrom Research and includes functions to evaluate and summarize datasets and their associated data dictionaries, identify potential issues in content and structure, and prepare datasets and metadata for further processing. The key outputs provided by the functions are formatted datasets, standardized metadata, and tabular and visual summary reports.

    Get started @@ -100,15 +100,18 @@

    Get startedInstall the package

    -# To update the R package in your R environment you may first need to remove it, 
    -# and use the exit command quit() to terminate the current R session.
    -
    -# To install the R package:
    +# To install madshapR:
     install.packages('madshapR')
    -library(madshapR) 
     
    +library(madshapR)
     # If you need help with the package, please use:
    -madshapR_help()
    +madshapR_help() + +# Downloadable templates are available here +madshapR_templates() + +# Demo files are available here, along with an online demonstration process +madshapR_DEMO
    @@ -144,7 +147,6 @@

    Developers

    diff --git a/docs/news/index.html b/docs/news/index.html index 1bd6ee7..2ba6a58 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -17,7 +17,7 @@ madshapR - 1.0.3.0003 + 1.0.3 @@ -61,13 +61,13 @@

    Changelog

    Bug fixes and improvements

    -

    Some of the tests were made with another package (Rmonize) which as ‘madshapR’ as a dependence.

    +

    Some of the tests were made with another package (Rmonize) which as “madshapR” as a dependence.

    Enhance reports

    New functions

    -

    Deprecated functions

    @@ -163,7 +164,7 @@

    Unit tests a

    check_data_dict_categories(), check_data_dict_missing_categories(), check_data_dict_taxonomy(), check_data_dict_variables(), check_data_dict_valueType(), check_dataset_categories(), check_dataset_valueType(), check_dataset_variables(), check_name_standards()

    -

    Summarise information in dataset and data dictionaries

    +

    Summarize information in dataset and data dictionaries

    These helper functions evaluate content of a dataset and/or data dictionary to extract from them summary statistics and elements such as missing values, NA, category names, etc. These informations are stored in a tibble that can be use to summary inputs.

    dataset_preprocess(), summary_variables(), summary_variables_categorical(),summary_variables_date(), summary_variables_numeric(),summary_variables_text()

    diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 1cf2f00..75aaad2 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -3,5 +3,5 @@ pkgdown: 2.0.7 pkgdown_sha: ~ articles: madshapR-vignette: madshapR-vignette.html -last_built: 2023-12-05T16:45Z +last_built: 2023-12-14T18:15Z diff --git a/docs/reference/as_category.html b/docs/reference/as_category.html new file mode 100644 index 0000000..5c9cc77 --- /dev/null +++ b/docs/reference/as_category.html @@ -0,0 +1,147 @@ + +Validate and coerce any object as a categorical variable. — as_category • madshapR + + +
    +
    + + + +
    +
    + + +
    +

    [Experimental] +Converts a vector object to a categorical object, typically a column in a +data frame. The categories come from non-missing values present in the +object and are added to an associated data dictionary (when present).

    +
    + +
    +
    as_category(x)
    +
    + +
    +

    Arguments

    +
    x
    +

    A vector object to be coerced to categorical.

    + +
    +
    +

    Value

    + + +

    A vector with class haven_labelled.

    +
    +
    +

    See also

    + +
    + +
    +

    Examples

    +
    {
    +
    +library(dplyr)
    +mtcars <- tibble(mtcars)
    +as_category(mtcars[['cyl']])
    +
    +head(mtcars %>% mutate(cyl = as_category(cyl)))
    +
    +
    +}
    +#> 
    +#> Attaching package: 'dplyr'
    +#> The following objects are masked from 'package:stats':
    +#> 
    +#>     filter, lag
    +#> The following objects are masked from 'package:base':
    +#> 
    +#>     intersect, setdiff, setequal, union
    +#> # A tibble: 6 × 11
    +#>     mpg cyl        disp    hp  drat    wt  qsec    vs    am  gear  carb
    +#>   <dbl> <dbl+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    +#> 1  21   6 [6]       160   110  3.9   2.62  16.5     0     1     4     4
    +#> 2  21   6 [6]       160   110  3.9   2.88  17.0     0     1     4     4
    +#> 3  22.8 4 [4]       108    93  3.85  2.32  18.6     1     1     4     1
    +#> 4  21.4 6 [6]       258   110  3.08  3.22  19.4     1     0     3     1
    +#> 5  18.7 8 [8]       360   175  3.15  3.44  17.0     0     0     3     2
    +#> 6  18.1 6 [6]       225   105  2.76  3.46  20.2     1     0     3     1
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/as_data_dict.html b/docs/reference/as_data_dict.html index 9b1a7d0..3db4738 100644 --- a/docs/reference/as_data_dict.html +++ b/docs/reference/as_data_dict.html @@ -1,8 +1,8 @@ -Validate and coerce any object as data dictionary — as_data_dict • madshapRValidate and coerce any object as a data dictionary — as_data_dict • madshapR @@ -20,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -56,16 +56,16 @@
    -

    Validates the input object as a valid data dictionary and coerces it with -the appropriate madshapR::class attribute. This function mainly helps -validate input within other functions of the package but could be used to -check if an object is valid for use in a function.

    +

    Checks if an object is a valid data dictionary and returns it with the +appropriate madshapR::class attribute. This function mainly helps validate +inputs within other functions of the package but could be used to check if +an object is valid for use in a function.

    @@ -75,14 +75,14 @@

    Validate and coerce any object as data dictionary

    Arguments

    object
    -

    A potential valid data dictionary to be coerced.

    +

    A potential data dictionary object to be coerced.

    Value

    -

    A list of data frame(s) with Rmonize::class 'data_dict'.

    +

    A list of data frame(s) with madshapR::class 'data_dict'.

    Details

    diff --git a/docs/reference/as_data_dict_mlstr.html b/docs/reference/as_data_dict_mlstr.html index 934815f..fb64f0a 100644 --- a/docs/reference/as_data_dict_mlstr.html +++ b/docs/reference/as_data_dict_mlstr.html @@ -1,5 +1,5 @@ -Validate and coerce an object to an Opal data dictionary format — as_data_dict_mlstr • madshapRValidate and coerce any object as an Opal data dictionary format — as_data_dict_mlstr • madshapR madshapR - 1.0.3.0003 + 1.0.3
    @@ -57,7 +57,7 @@
    @@ -81,10 +81,9 @@

    Arguments

    as_data_dict
    -

    Whether the output data dictionary has a simple -data dictionary structure or not (meaning has a Maelstrom data dictionary -structure, compatible with Maelstrom Research ecosystem, including Opal). -FALSE by default.

    +

    Whether the input data dictionary should not be coerced +with specific format restrictions for compatibility with other +Maelstrom Research software. FALSE by default.

    name_standard
    @@ -97,7 +96,7 @@

    Arguments

    Value

    -

    A list of data frame(s) with Rmonize::class 'data_dict_mlstr'.

    +

    A list of data frame(s) with madshapR::class 'data_dict_mlstr'.

    Details

    @@ -111,7 +110,7 @@

    Details

    variable and name.

    The object may be specifically formatted to be compatible with additional Maelstrom Research software, -in particular Opal environments.

    +in particular Opal environments.

    See also

    diff --git a/docs/reference/as_data_dict_shape.html b/docs/reference/as_data_dict_shape.html index 3d6f420..9939900 100644 --- a/docs/reference/as_data_dict_shape.html +++ b/docs/reference/as_data_dict_shape.html @@ -1,5 +1,5 @@ -Validate and coerce an object to a workable data dictionary structure — as_data_dict_shape • madshapRValidate and coerce any object as a workable data dictionary structure — as_data_dict_shape • madshapRValidate and coerce an object to dataset format — as_dataset • madshapRValidate and coerce any object as a dataset — as_dataset • madshapR @@ -20,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -56,16 +56,16 @@
    -

    Confirms that the input object is a valid dataset and returns it as a dataset -with the appropriate madshapR::class attribute. This function mainly helps -validate inputs within other functions of the package but could be used to -check if a dataset is valid.

    +

    Checks if an object is a valid dataset and returns it with the appropriate +madshapR::class attribute. This function mainly helps validate inputs +within other functions of the package but could be used separately to check +if a dataset is valid.

    @@ -75,28 +75,27 @@

    Validate and coerce an object to dataset format

    Arguments

    object
    -

    A potential dataset to be coerced.

    +

    A potential dataset object to be coerced.

    col_id
    -

    A character string specifying the name(s) of the column(s) -which refer to key identifier of the dataset. The column(s) can be named -or indicated by position.

    +

    An optional character string specifying the name(s) or +position(s) of the column(s) used as identifiers.

    Value

    -

    A list of data frame(s) with Rmonize::class 'dataset'.

    +

    A data frame with madshapR::class 'dataset'.

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -107,165 +106,31 @@

    Examples

    {
     
     # use madshapR_DEMO provided by the package
    +library(dplyr)
     
    -###### Example 1: a dataset can have an id column(s) which is specified as
    -# an attribute. 
    +###### Example 1: A dataset can have an id column specified as an attribute. 
     dataset <- as_dataset(madshapR_DEMO$dataset_MELBOURNE, col_id = "id")
    +glimpse(dataset)
    +
    +###### Example 2: Any data frame can be a dataset by definition.
    +glimpse(as_dataset(iris, col_id = "Species"))
     
    -###### Example 2: any data frame can be a dataset by definition.
    -as_dataset(iris, col_id = "Species")
     }
    -#>        Species Sepal.Length Sepal.Width Petal.Length Petal.Width
    -#> 1       setosa          5.1         3.5          1.4         0.2
    -#> 2       setosa          4.9         3.0          1.4         0.2
    -#> 3       setosa          4.7         3.2          1.3         0.2
    -#> 4       setosa          4.6         3.1          1.5         0.2
    -#> 5       setosa          5.0         3.6          1.4         0.2
    -#> 6       setosa          5.4         3.9          1.7         0.4
    -#> 7       setosa          4.6         3.4          1.4         0.3
    -#> 8       setosa          5.0         3.4          1.5         0.2
    -#> 9       setosa          4.4         2.9          1.4         0.2
    -#> 10      setosa          4.9         3.1          1.5         0.1
    -#> 11      setosa          5.4         3.7          1.5         0.2
    -#> 12      setosa          4.8         3.4          1.6         0.2
    -#> 13      setosa          4.8         3.0          1.4         0.1
    -#> 14      setosa          4.3         3.0          1.1         0.1
    -#> 15      setosa          5.8         4.0          1.2         0.2
    -#> 16      setosa          5.7         4.4          1.5         0.4
    -#> 17      setosa          5.4         3.9          1.3         0.4
    -#> 18      setosa          5.1         3.5          1.4         0.3
    -#> 19      setosa          5.7         3.8          1.7         0.3
    -#> 20      setosa          5.1         3.8          1.5         0.3
    -#> 21      setosa          5.4         3.4          1.7         0.2
    -#> 22      setosa          5.1         3.7          1.5         0.4
    -#> 23      setosa          4.6         3.6          1.0         0.2
    -#> 24      setosa          5.1         3.3          1.7         0.5
    -#> 25      setosa          4.8         3.4          1.9         0.2
    -#> 26      setosa          5.0         3.0          1.6         0.2
    -#> 27      setosa          5.0         3.4          1.6         0.4
    -#> 28      setosa          5.2         3.5          1.5         0.2
    -#> 29      setosa          5.2         3.4          1.4         0.2
    -#> 30      setosa          4.7         3.2          1.6         0.2
    -#> 31      setosa          4.8         3.1          1.6         0.2
    -#> 32      setosa          5.4         3.4          1.5         0.4
    -#> 33      setosa          5.2         4.1          1.5         0.1
    -#> 34      setosa          5.5         4.2          1.4         0.2
    -#> 35      setosa          4.9         3.1          1.5         0.2
    -#> 36      setosa          5.0         3.2          1.2         0.2
    -#> 37      setosa          5.5         3.5          1.3         0.2
    -#> 38      setosa          4.9         3.6          1.4         0.1
    -#> 39      setosa          4.4         3.0          1.3         0.2
    -#> 40      setosa          5.1         3.4          1.5         0.2
    -#> 41      setosa          5.0         3.5          1.3         0.3
    -#> 42      setosa          4.5         2.3          1.3         0.3
    -#> 43      setosa          4.4         3.2          1.3         0.2
    -#> 44      setosa          5.0         3.5          1.6         0.6
    -#> 45      setosa          5.1         3.8          1.9         0.4
    -#> 46      setosa          4.8         3.0          1.4         0.3
    -#> 47      setosa          5.1         3.8          1.6         0.2
    -#> 48      setosa          4.6         3.2          1.4         0.2
    -#> 49      setosa          5.3         3.7          1.5         0.2
    -#> 50      setosa          5.0         3.3          1.4         0.2
    -#> 51  versicolor          7.0         3.2          4.7         1.4
    -#> 52  versicolor          6.4         3.2          4.5         1.5
    -#> 53  versicolor          6.9         3.1          4.9         1.5
    -#> 54  versicolor          5.5         2.3          4.0         1.3
    -#> 55  versicolor          6.5         2.8          4.6         1.5
    -#> 56  versicolor          5.7         2.8          4.5         1.3
    -#> 57  versicolor          6.3         3.3          4.7         1.6
    -#> 58  versicolor          4.9         2.4          3.3         1.0
    -#> 59  versicolor          6.6         2.9          4.6         1.3
    -#> 60  versicolor          5.2         2.7          3.9         1.4
    -#> 61  versicolor          5.0         2.0          3.5         1.0
    -#> 62  versicolor          5.9         3.0          4.2         1.5
    -#> 63  versicolor          6.0         2.2          4.0         1.0
    -#> 64  versicolor          6.1         2.9          4.7         1.4
    -#> 65  versicolor          5.6         2.9          3.6         1.3
    -#> 66  versicolor          6.7         3.1          4.4         1.4
    -#> 67  versicolor          5.6         3.0          4.5         1.5
    -#> 68  versicolor          5.8         2.7          4.1         1.0
    -#> 69  versicolor          6.2         2.2          4.5         1.5
    -#> 70  versicolor          5.6         2.5          3.9         1.1
    -#> 71  versicolor          5.9         3.2          4.8         1.8
    -#> 72  versicolor          6.1         2.8          4.0         1.3
    -#> 73  versicolor          6.3         2.5          4.9         1.5
    -#> 74  versicolor          6.1         2.8          4.7         1.2
    -#> 75  versicolor          6.4         2.9          4.3         1.3
    -#> 76  versicolor          6.6         3.0          4.4         1.4
    -#> 77  versicolor          6.8         2.8          4.8         1.4
    -#> 78  versicolor          6.7         3.0          5.0         1.7
    -#> 79  versicolor          6.0         2.9          4.5         1.5
    -#> 80  versicolor          5.7         2.6          3.5         1.0
    -#> 81  versicolor          5.5         2.4          3.8         1.1
    -#> 82  versicolor          5.5         2.4          3.7         1.0
    -#> 83  versicolor          5.8         2.7          3.9         1.2
    -#> 84  versicolor          6.0         2.7          5.1         1.6
    -#> 85  versicolor          5.4         3.0          4.5         1.5
    -#> 86  versicolor          6.0         3.4          4.5         1.6
    -#> 87  versicolor          6.7         3.1          4.7         1.5
    -#> 88  versicolor          6.3         2.3          4.4         1.3
    -#> 89  versicolor          5.6         3.0          4.1         1.3
    -#> 90  versicolor          5.5         2.5          4.0         1.3
    -#> 91  versicolor          5.5         2.6          4.4         1.2
    -#> 92  versicolor          6.1         3.0          4.6         1.4
    -#> 93  versicolor          5.8         2.6          4.0         1.2
    -#> 94  versicolor          5.0         2.3          3.3         1.0
    -#> 95  versicolor          5.6         2.7          4.2         1.3
    -#> 96  versicolor          5.7         3.0          4.2         1.2
    -#> 97  versicolor          5.7         2.9          4.2         1.3
    -#> 98  versicolor          6.2         2.9          4.3         1.3
    -#> 99  versicolor          5.1         2.5          3.0         1.1
    -#> 100 versicolor          5.7         2.8          4.1         1.3
    -#> 101  virginica          6.3         3.3          6.0         2.5
    -#> 102  virginica          5.8         2.7          5.1         1.9
    -#> 103  virginica          7.1         3.0          5.9         2.1
    -#> 104  virginica          6.3         2.9          5.6         1.8
    -#> 105  virginica          6.5         3.0          5.8         2.2
    -#> 106  virginica          7.6         3.0          6.6         2.1
    -#> 107  virginica          4.9         2.5          4.5         1.7
    -#> 108  virginica          7.3         2.9          6.3         1.8
    -#> 109  virginica          6.7         2.5          5.8         1.8
    -#> 110  virginica          7.2         3.6          6.1         2.5
    -#> 111  virginica          6.5         3.2          5.1         2.0
    -#> 112  virginica          6.4         2.7          5.3         1.9
    -#> 113  virginica          6.8         3.0          5.5         2.1
    -#> 114  virginica          5.7         2.5          5.0         2.0
    -#> 115  virginica          5.8         2.8          5.1         2.4
    -#> 116  virginica          6.4         3.2          5.3         2.3
    -#> 117  virginica          6.5         3.0          5.5         1.8
    -#> 118  virginica          7.7         3.8          6.7         2.2
    -#> 119  virginica          7.7         2.6          6.9         2.3
    -#> 120  virginica          6.0         2.2          5.0         1.5
    -#> 121  virginica          6.9         3.2          5.7         2.3
    -#> 122  virginica          5.6         2.8          4.9         2.0
    -#> 123  virginica          7.7         2.8          6.7         2.0
    -#> 124  virginica          6.3         2.7          4.9         1.8
    -#> 125  virginica          6.7         3.3          5.7         2.1
    -#> 126  virginica          7.2         3.2          6.0         1.8
    -#> 127  virginica          6.2         2.8          4.8         1.8
    -#> 128  virginica          6.1         3.0          4.9         1.8
    -#> 129  virginica          6.4         2.8          5.6         2.1
    -#> 130  virginica          7.2         3.0          5.8         1.6
    -#> 131  virginica          7.4         2.8          6.1         1.9
    -#> 132  virginica          7.9         3.8          6.4         2.0
    -#> 133  virginica          6.4         2.8          5.6         2.2
    -#> 134  virginica          6.3         2.8          5.1         1.5
    -#> 135  virginica          6.1         2.6          5.6         1.4
    -#> 136  virginica          7.7         3.0          6.1         2.3
    -#> 137  virginica          6.3         3.4          5.6         2.4
    -#> 138  virginica          6.4         3.1          5.5         1.8
    -#> 139  virginica          6.0         3.0          4.8         1.8
    -#> 140  virginica          6.9         3.1          5.4         2.1
    -#> 141  virginica          6.7         3.1          5.6         2.4
    -#> 142  virginica          6.9         3.1          5.1         2.3
    -#> 143  virginica          5.8         2.7          5.1         1.9
    -#> 144  virginica          6.8         3.2          5.9         2.3
    -#> 145  virginica          6.7         3.3          5.7         2.5
    -#> 146  virginica          6.7         3.0          5.2         2.3
    -#> 147  virginica          6.3         2.5          5.0         1.9
    -#> 148  virginica          6.5         3.0          5.2         2.0
    -#> 149  virginica          6.2         3.4          5.4         2.3
    -#> 150  virginica          5.9         3.0          5.1         1.8
    +#> Rows: 19
    +#> Columns: 6
    +#> $ id         <dbl> 377943, 497013, 927676, 995667, 21829, 209432, 272983, 5806…
    +#> $ Gender     <dbl> 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 2
    +#> $ BMI        <dbl> 221, 1850655594, 2457679588, 1571539833, 1855378065, 159317…
    +#> $ age        <dbl> 52, 49, 43, 59, 40, 47, -888, 53, 35, 40, 41, 34, 48, 43, -…
    +#> $ smo_status <dbl> 1, 2, 3, -77, NA, 2, -77, 2, 1, 1, NA, 3, 2, 1, 2, 1, NA, 1…
    +#> $ prg_curr   <dbl> 0, -77, -77, 1, 0, -77, 8, 0, 0, -77, -77, 1, 1, 9, -77, -7…
    +#> Rows: 150
    +#> Columns: 5
    +#> $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
    +#> $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
    +#> $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
    +#> $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
    +#> $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
     
     
    diff --git a/docs/reference/as_dossier.html b/docs/reference/as_dossier.html index 5c99524..b694897 100644 --- a/docs/reference/as_dossier.html +++ b/docs/reference/as_dossier.html @@ -1,7 +1,7 @@ -Validate and coerce an object to dossier format — as_dossier • madshapRValidate and coerce any object as a dossier (list of dataset(s)) — as_dossier • madshapRValidate and coerce an object to taxonomy format — as_taxonomy • madshapRValidate and coerce any object as a taxonomy — as_taxonomy • madshapRValidate and coerce an object according to a given valueType — as_valueType • madshapRValidate and coerce any object according to a given valueType — as_valueType • madshapRApply a data dictionary to a dataset — data_dict_apply • madshapRApply a data dictionary to a dataset — data_dict_apply • madshapR @@ -20,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -62,10 +62,10 @@

    Apply a data dictionary to a dataset

    -

    Applies a data dictionary to a data structure, creating a labelled dataset. -All previous attributes will be preserved. For factors, the attribute -'levels' will be transformed into attribute 'labels' and values will be -recast into appropriate datatypes.

    +

    Applies a data dictionary to a dataset, creating a labelled dataset with +variable attributes. Any previous attributes will be preserved. For +variables that are factors, variables will be transformed into +haven-labelled variables.

    @@ -75,29 +75,28 @@

    Apply a data dictionary to a dataset

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations -associated to its data dictionary.

    +

    A dataset object.

    data_dict
    -

    A list of data frame(s) representing meta data of an -associated dataset. Automatically generated if not provided.

    +

    A list of data frame(s) representing metadata of the input +dataset. Automatically generated if not provided.

    Value

    -

    A data frame identifying the dataset with the data dictionary applied to each -variable as attributes.

    +

    A labelled data frame with metadata as attributes, specified for each +variable from the input data dictionary.

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -112,7 +111,7 @@

    Details

    @@ -123,31 +122,18 @@

    Examples

    dataset <- madshapR_DEMO$dataset_MELBOURNE data_dict <- as_data_dict_mlstr(madshapR_DEMO$data_dict_MELBOURNE) -data_dict_apply(dataset, data_dict) +head(data_dict_apply(dataset, data_dict)) } -#> # A tibble: 19 × 6 -#> id Gender BMI age smo_status prg_curr -#> * <dbl> <dbl+lbl> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> -#> 1 377943 2 [Female] 221 52 1 [neve… 0 [not… -#> 2 497013 1 [Male] 1850655594 49 2 [curr… -77 [not… -#> 3 927676 1 [Male] 2457679588 43 3 [form… -77 [not… -#> 4 995667 2 [Female] 1571539833 59 -77 [don'… 1 [cur… -#> 5 21829 2 [Female] 1855378065 40 NA 0 [not… -#> 6 209432 1 [Male] 1593177939 47 2 [curr… -77 [not… -#> 7 272983 2 [Female] 2865839945 -888 [don't want to answer] -77 [don'… 8 [Don… -#> 8 580632 2 [Female] 1973602923 53 2 [curr… 0 [not… -#> 9 304624 2 [Female] NA 35 1 [neve… 0 [not… -#> 10 637551 1 [Male] 2126602756 40 1 [neve… -77 [not… -#> 11 279817 1 [Male] 1719871484 41 NA -77 [not… -#> 12 235415 2 [Female] 2169671091 34 3 [form… 1 [cur… -#> 13 373673 2 [Female] 1474286366 48 2 [curr… 1 [cur… -#> 14 485098 2 [Female] 2437040385 43 1 [neve… 9 [Don… -#> 15 299427 1 [Male] NA -888 [don't want to answer] 2 [curr… -77 [not… -#> 16 854073 1 [Male] 2420491247 41 1 [neve… -77 [not… -#> 17 197666 2 [Female] 1773788018 33 NA 1 [cur… -#> 18 130327 2 [Female] 1915254927 57 1 [neve… 9 [Don… -#> 19 220050 2 [Female] 1876765116 50 2 [curr… 1 [cur… +#> # A tibble: 6 × 6 +#> id Gender BMI age smo_status prg_curr +#> <dbl> <dbl+lbl> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> +#> 1 377943 2 [Female] 221 52 1 [never smoked] 0 [not cu… +#> 2 497013 1 [Male] 1850655594 49 2 [current smoker] -77 [not ap… +#> 3 927676 1 [Male] 2457679588 43 3 [former smoker] -77 [not ap… +#> 4 995667 2 [Female] 1571539833 59 -77 [don't want to answer] 1 [curren… +#> 5 21829 2 [Female] 1855378065 40 NA 0 [not cu… +#> 6 209432 1 [Male] 1593177939 47 2 [current smoker] -77 [not ap…
    diff --git a/docs/reference/data_dict_collapse.html b/docs/reference/data_dict_collapse.html index a36859a..37e7db5 100644 --- a/docs/reference/data_dict_collapse.html +++ b/docs/reference/data_dict_collapse.html @@ -27,7 +27,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -94,7 +94,7 @@

    Transform multi-row category column(s) to single rows and join to "Variables

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be +

    A list of data frame(s) representing metadata to be transformed.

    diff --git a/docs/reference/data_dict_evaluate.html b/docs/reference/data_dict_evaluate.html index 0a87eb0..f03374b 100644 --- a/docs/reference/data_dict_evaluate.html +++ b/docs/reference/data_dict_evaluate.html @@ -1,9 +1,8 @@ -Generate a quality assessment report of a data dictionary — data_dict_evaluate • madshapRGenerate an assessment report for a data dictionary — data_dict_evaluate • madshapR @@ -21,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -57,17 +56,16 @@
    -

    Assesses the content and structure of a data dictionary and reports potential -issues to facilitate the assessment of input data. -The report can be used to help assess data structure, presence of fields, -coherence across elements, and taxonomy or data dictionary formats. This -report is compatible with Excel and can be exported as an Excel spreadsheet.

    +

    Assesses the content and structure of a data dictionary and generates reports +of the results. The report can be used to help assess data dictionary +structure, presence of fields, coherence across elements, and taxonomy +or data dictionary formats.

    @@ -77,26 +75,25 @@

    Generate a quality assessment report of a data dictionary

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be evaluated.

    +

    A list of data frame(s) representing metadata to be evaluated.

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    as_data_dict_mlstr
    -

    Whether the output data dictionary has a simple -data dictionary structure or not (meaning has a Maelstrom data dictionary -structure, compatible with Maelstrom Research ecosystem, including Opal). -TRUE by default.

    +

    Whether the input data dictionary should be coerced +with specific format restrictions for compatibility with other +Maelstrom Research software. TRUE by default.

    Value

    -

    A list of data frames of report for one data dictionary.

    +

    A list of data frames containing assessment reports.

    Details

    @@ -117,7 +114,7 @@

    Details

    available online.

    The object may be specifically formatted to be compatible with additional Maelstrom Research software, -in particular Opal environments.

    +in particular Opal environments.

    @@ -125,9 +122,10 @@

    Examples

    {
     
     # use madshapR_DEMO provided by the package
    +library(dplyr)
     
     data_dict <- madshapR_DEMO$`data_dict_TOKYO - errors`
    -data_dict_evaluate(data_dict)
    +glimpse(data_dict_evaluate(data_dict))
     
     }
     #> - DATA DICTIONARY ASSESSMENT: data_dict --------------
    @@ -145,51 +143,26 @@ 

    Examples

    #> #> - WARNING MESSAGES (if any): -------------------------------------------- #> -#> $`Data dictionary summary` -#> # A tibble: 14 × 11 -#> index name `label:en` valueType `Categories::table` `Categories::label:en` -#> <int> <chr> <chr> <chr> <chr> <chr> -#> 1 1 part_id id of the… text NA NA -#> 2 2 gndr gndr boolean "Male = DEMO ; \nF… "Male = Male ; \nFema… -#> 3 3 height height decimal "SKIP PATTERN = DE… "SKIP PATTERN = SKIP … -#> 4 4 weight… weight_ms NA "-88 = DEMO ; \n-9… "-88 = Don’t want to … -#> 5 5 weight… weight_dc integer NA NA -#> 6 6 weight… weight_dc integr NA NA -#> 7 7 dob NA date NA NA -#> 8 8 prg_evr prg_ever integer NA NA -#> 9 9 empty empty integer NA NA -#> 10 10 opente… opentext character NA NA -#> 11 11 opente… NA NA NA NA -#> 12 12 NA no name integer "6 = DEMO" "6 = Inaccurate" -#> 13 NA prg_ev… NA NA "0 = DEMO ; \n1 = … "0 = never pregnant ;… -#> 14 NA weight… NA NA "-99 = DEMO" "-99 = Don’t want to … -#> # ℹ 5 more variables: `Categories::missing` <chr>, table <chr>, -#> # `description:en` <chr>, unit <chr>, `copy units` <chr> -#> -#> $`Data dictionary assessment` -#> # A tibble: 18 × 6 -#> sheet col_name name_var Quality assessment co…¹ value suggestion -#> <chr> <chr> <chr> <chr> <chr> <chr> -#> 1 Variables copy units NA [INFO] - Possible dupl… unit… NA -#> 2 Variables label:en dob [ERR] - The column `la… NA NA -#> 3 Variables label:en opentext [ERR] - The column `la… NA NA -#> 4 Variables name opentext [ERR] - duplicated var… NA NA -#> 5 Variables name row number: 12 [ERR] - missing variab… NA NA -#> 6 Variables name weight dc [INFO] - Incompatible … NA NA -#> 7 Variables unit NA [INFO] - Possible dupl… unit… NA -#> 8 Variables valueType gndr [ERR] - valueType con… bool… text -#> 9 Variables valueType height [ERR] - valueType con… deci… text -#> 10 Variables valueType opentext [ERR] - Incompatible v… char… NA -#> 11 Variables valueType weight_dc [ERR] - Incompatible v… inte… NA -#> 12 Categories label:en 1 [ERR] - The column `la… NA NA -#> 13 Categories missing prg_ever [ERR] - incompatible v… FLASE NA -#> 14 Categories variable prg_ever [ERR] - Categories not… NA NA -#> 15 Categories variable prg_ever [ERR] - In 'name', the… row … NA -#> 16 Categories variable prg_ever [ERR] - Category names… -7 NA -#> 17 Categories variable weight_sm [ERR] - Categories not… NA NA -#> 18 Categories variable NA [ERR] - In 'variable',… row … NA -#> # ℹ abbreviated name: ¹​`Quality assessment comment` -#> +#> List of 2 +#> $ Data dictionary summary : tibble [14 × 11] (S3: tbl_df/tbl/data.frame) +#> ..$ index : int [1:14] 1 2 3 4 5 6 7 8 9 10 ... +#> ..$ name : chr [1:14] "part_id" "gndr" "height" "weight_ms" ... +#> ..$ label:en : chr [1:14] "id of the participant" "gndr" "height" "weight_ms" ... +#> ..$ valueType : chr [1:14] "text" "boolean" "decimal" NA ... +#> ..$ Categories::table : chr [1:14] NA "Male = DEMO ; \nFemale = DEMO ; \n-77 = DEMO" "SKIP PATTERN = DEMO" "-88 = DEMO ; \n-99 = DEMO" ... +#> ..$ Categories::label:en: chr [1:14] NA "Male = Male ; \nFemale = Female ; \n-77 = Don’t want to answer" "SKIP PATTERN = SKIP PATTERN" "-88 = Don’t want to answer ; \n-99 = Do not remember" ... +#> ..$ Categories::missing : chr [1:14] NA "Male = FALSE ; \nFemale = FALSE ; \n-77 = TRUE" "SKIP PATTERN = TRUE" "-88 = TRUE ; \n-99 = TRUE" ... +#> ..$ table : chr [1:14] "DEMO" "DEMO" "DEMO" "DEMO" ... +#> ..$ description:en : chr [1:14] "id of the participant" "gender of the participant" "height of the participant" "weight of the participant - measured" ... +#> ..$ unit : chr [1:14] NA NA "cm" "kg" ... +#> ..$ copy units : chr [1:14] NA NA "cm" "kg" ... +#> $ Data dictionary assessment: tibble [18 × 6] (S3: tbl_df/tbl/data.frame) +#> ..$ sheet : chr [1:18] "Variables" "Variables" "Variables" "Variables" ... +#> ..$ col_name : chr [1:18] "copy units" "label:en" "label:en" "name" ... +#> ..$ name_var : chr [1:18] NA "dob" "opentext" "opentext" ... +#> ..$ Quality assessment comment: chr [1:18] "[INFO] - Possible duplicated columns" "[ERR] - The column `label(:xx)` must exist contain no 'NA' values" "[ERR] - The column `label(:xx)` must exist contain no 'NA' values" "[ERR] - duplicated variable name" ... +#> ..$ value : chr [1:18] "unit ; copy units" NA NA NA ... +#> ..$ suggestion : chr [1:18] NA NA NA NA ...
    diff --git a/docs/reference/data_dict_expand.html b/docs/reference/data_dict_expand.html index c427ea4..96e48ea 100644 --- a/docs/reference/data_dict_expand.html +++ b/docs/reference/data_dict_expand.html @@ -27,7 +27,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -94,12 +94,12 @@

    Transform single-row category information to multiple rows as element

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be -transformed. Automatically generated if not provided.

    +

    A list of data frame(s) representing metadata to be +transformed.

    from
    -

    Symbol identifying the name of the element (data frame) to take +

    A symbol identifying the name of the element (data frame) to take column(s) from. Default is 'Variables'.

    @@ -110,7 +110,7 @@

    Arguments

    to
    -

    Symbol identifying the name of the element (data frame) to create +

    A symbol identifying the name of the element (data frame) to create column(s) to. Default is 'Categories'.

    diff --git a/docs/reference/data_dict_extract.html b/docs/reference/data_dict_extract.html index def807c..1944aff 100644 --- a/docs/reference/data_dict_extract.html +++ b/docs/reference/data_dict_extract.html @@ -1,11 +1,7 @@ -Create a data dictionary from a dataset — data_dict_extract • madshapRGenerate a data dictionary from a dataset — data_dict_extract • madshapR @@ -23,7 +19,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -59,19 +55,15 @@
    -

    Creates a data dictionary in a format compliant with formats used in -Maelstrom Research ecosystem, including Opal (with 'Variables' and -'Categories' in separate data frames and standard columns in each) from any -dataset in data frame format. If the input dataset has no associated -metadata, a data dictionary with minimal required information is created -from the column (variable) names to create the data dictionary structure -required for the package. All columns except variable names will be blank.

    +

    Generates a data dictionary from a dataset. If the dataset variables have no +associated metadata, a minimum data dictionary is created by using variable +attributes.

    @@ -81,30 +73,28 @@

    Create a data dictionary from a dataset

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations which -contains meta data as attributes.

    +

    A dataset object.

    as_data_dict_mlstr
    -

    Whether the output data dictionary has a simple -data dictionary structure or not (meaning has a Maelstrom data dictionary -structure, compatible with Maelstrom Research ecosystem, including Opal). -TRUE by default.

    +

    Whether the input data dictionary should be coerced +with specific format restrictions for compatibility with other +Maelstrom Research software. TRUE by default.

    Value

    -

    A list of data frame(s) identifying a data dictionary.

    +

    A list of data frame(s) representing metadata of the dataset variables.

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -118,7 +108,7 @@

    Details

    variable and name.

    The object may be specifically formatted to be compatible with additional Maelstrom Research software, -in particular Opal environments.

    +in particular Opal environments.

    diff --git a/docs/reference/data_dict_filter.html b/docs/reference/data_dict_filter.html index 987f76f..aefa6d7 100644 --- a/docs/reference/data_dict_filter.html +++ b/docs/reference/data_dict_filter.html @@ -19,7 +19,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -78,8 +78,8 @@

    Subset data dictionary by row values

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be -transformed.

    +

    A list of data frame(s) representing metadata to be +filtered.

    filter_var
    @@ -139,7 +139,7 @@

    Examples

    data_dict_nest <- data_dict_list_nest(data_dict_list, name_group = 'table') ###### Example 1 search and filter through a column in 'Variables' element -data_dict_filter(data_dict_nest,filter_var = "valueType == 'integer'") +data_dict_filter(data_dict_nest,filter_var = "valueType == 'text'") ###### Example 2 search and filter through a column in 'Categories' element data_dict_filter(data_dict_nest,filter_cat = "missing == TRUE") diff --git a/docs/reference/data_dict_group_by.html b/docs/reference/data_dict_group_by.html index 8a204d1..14dbd33 100644 --- a/docs/reference/data_dict_group_by.html +++ b/docs/reference/data_dict_group_by.html @@ -21,7 +21,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -77,7 +77,7 @@

    Group listed data dictionaries by specified column names

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be +

    A list of data frame(s) representing metadata to be transformed.

    diff --git a/docs/reference/data_dict_group_split.html b/docs/reference/data_dict_group_split.html index ddc6184..661d1db 100644 --- a/docs/reference/data_dict_group_split.html +++ b/docs/reference/data_dict_group_split.html @@ -23,7 +23,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -81,7 +81,7 @@

    Split grouped data dictionaries into a named list

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be +

    A list of data frame(s) representing metadata to be transformed.

    @@ -134,71 +134,16 @@

    Examples

    data_dict_list_nest(data_dict_list, name_group = 'table') %>% data_dict_group_by(col = "table") - data_dict_group_split(data_dict_nest,col = "table") +glimpse(data_dict_group_split(data_dict_nest,col = "table")) } -#> $dataset_MELBOURNE -#> $dataset_MELBOURNE$Variables -#> # A tibble: 6 × 7 -#> table index name `label:en` `description:en` valueType unit -#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> -#> 1 dataset_MELBOURNE 1 id id id of the parti… text NA -#> 2 dataset_MELBOURNE 2 Gender Gender Gender integer NA -#> 3 dataset_MELBOURNE 3 BMI BMI Body Mass Index decimal kg/m… -#> 4 dataset_MELBOURNE 4 age age Age of Particip… integer years -#> 5 dataset_MELBOURNE 5 smo_status smo_curr Whether the par… integer NA -#> 6 dataset_MELBOURNE 6 prg_curr prg_curr Are you current… integer NA -#> -#> $dataset_MELBOURNE$Categories -#> # A tibble: 12 × 5 -#> table variable name `label:en` missing -#> <chr> <chr> <chr> <chr> <chr> -#> 1 dataset_MELBOURNE age -888 don't want to answer TRUE -#> 2 dataset_MELBOURNE Gender 1 Male FALSE -#> 3 dataset_MELBOURNE Gender 2 Female FALSE -#> 4 dataset_MELBOURNE prg_curr 0 not currently pregnant FALSE -#> 5 dataset_MELBOURNE prg_curr 1 currently pregnant FALSE -#> 6 dataset_MELBOURNE prg_curr 8 Don’t want to answer TRUE -#> 7 dataset_MELBOURNE prg_curr 9 Don’t know TRUE -#> 8 dataset_MELBOURNE prg_curr -77 not applicable TRUE -#> 9 dataset_MELBOURNE smo_status 1 never smoked FALSE -#> 10 dataset_MELBOURNE smo_status 2 current smoker FALSE -#> 11 dataset_MELBOURNE smo_status 3 former smoker FALSE -#> 12 dataset_MELBOURNE smo_status -77 don't want to answer TRUE -#> -#> -#> $dataset_TOKYO -#> $dataset_TOKYO$Variables -#> # A tibble: 9 × 7 -#> table index name `label:en` `description:en` valueType unit -#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> -#> 1 dataset_TOKYO 1 part_id id of the part… id of the parti… text NA -#> 2 dataset_TOKYO 2 gndr gndr gender of the p… text NA -#> 3 dataset_TOKYO 3 height height height of the p… integer cm -#> 4 dataset_TOKYO 4 weight_ms weight_ms weight of the p… integer kg -#> 5 dataset_TOKYO 5 weight_dc weight_dc weight of the p… decimal kg -#> 6 dataset_TOKYO 6 dob dob date of birth o… date years -#> 7 dataset_TOKYO 7 prg_ever prg_ever whether the par… integer NA -#> 8 dataset_TOKYO 8 empty empty empty column integer NA -#> 9 dataset_TOKYO 9 opentext opentext open text text NA -#> -#> $dataset_TOKYO$Categories -#> # A tibble: 11 × 5 -#> table variable name `label:en` missing -#> <chr> <chr> <chr> <chr> <chr> -#> 1 dataset_TOKYO gndr Male Male FALSE -#> 2 dataset_TOKYO gndr Female Female FALSE -#> 3 dataset_TOKYO gndr -77 Don’t want to answer TRUE -#> 4 dataset_TOKYO weight_ms -88 Don’t want to answer TRUE -#> 5 dataset_TOKYO weight_ms -99 Don’t know TRUE -#> 6 dataset_TOKYO prg_ever 0 never pregnant FALSE -#> 7 dataset_TOKYO prg_ever 1 pregnant once or more FALSE -#> 8 dataset_TOKYO prg_ever 2 currently pregnant FALSE -#> 9 dataset_TOKYO prg_ever 8 Don’t want to answer TRUE -#> 10 dataset_TOKYO prg_ever 9 Don’t know TRUE -#> 11 dataset_TOKYO prg_ever -7 not applicable TRUE -#> -#> +#> List of 2 +#> $ dataset_MELBOURNE:List of 2 +#> ..$ Variables : tibble [6 × 7] (S3: tbl_df/tbl/data.frame) +#> ..$ Categories: tibble [12 × 5] (S3: tbl_df/tbl/data.frame) +#> $ dataset_TOKYO :List of 2 +#> ..$ Variables : tibble [9 × 7] (S3: tbl_df/tbl/data.frame) +#> ..$ Categories: tibble [11 × 5] (S3: tbl_df/tbl/data.frame)
    diff --git a/docs/reference/data_dict_list_nest.html b/docs/reference/data_dict_list_nest.html index 8937e0c..37d4c18 100644 --- a/docs/reference/data_dict_list_nest.html +++ b/docs/reference/data_dict_list_nest.html @@ -18,7 +18,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -71,7 +71,7 @@

    Bind listed data dictionaries

    Arguments

    data_dict_list
    -

    A list of data frame(s) representing meta data to be +

    A list of data frame(s) representing metadata to be transformed.

    @@ -107,6 +107,8 @@

    Examples

    {
     
     # use madshapR_DEMO provided by the package
    +library(dplyr)
    +
     # Create a list of data dictionaries where the column 'table' is added to 
     # refer to the associated dataset. The object created is not a 
     # data dictionary per say, but can be used as a structure which can be 
    @@ -117,45 +119,24 @@ 

    Examples

    data_dict_2 <- madshapR_DEMO$data_dict_MELBOURNE) names(data_dict_list) = c("dataset_TOKYO","dataset_MELBOURNE") -data_dict_list_nest(data_dict_list, name_group = 'table') +glimpse(data_dict_list_nest(data_dict_list, name_group = 'table')) } -#> $Variables -#> # A tibble: 15 × 7 -#> table index name `label:en` `description:en` valueType unit -#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> -#> 1 dataset_TOKYO 1 part_id id of the… id of the parti… text NA -#> 2 dataset_TOKYO 2 gndr gndr gender of the p… text NA -#> 3 dataset_TOKYO 3 height height height of the p… integer cm -#> 4 dataset_TOKYO 4 weight_ms weight_ms weight of the p… integer kg -#> 5 dataset_TOKYO 5 weight_dc weight_dc weight of the p… decimal kg -#> 6 dataset_TOKYO 6 dob dob date of birth o… date years -#> 7 dataset_TOKYO 7 prg_ever prg_ever whether the par… integer NA -#> 8 dataset_TOKYO 8 empty empty empty column integer NA -#> 9 dataset_TOKYO 9 opentext opentext open text text NA -#> 10 dataset_MELBOURNE 1 id id id of the parti… text NA -#> 11 dataset_MELBOURNE 2 Gender Gender Gender integer NA -#> 12 dataset_MELBOURNE 3 BMI BMI Body Mass Index decimal kg/m… -#> 13 dataset_MELBOURNE 4 age age Age of Particip… integer years -#> 14 dataset_MELBOURNE 5 smo_stat… smo_curr Whether the par… integer NA -#> 15 dataset_MELBOURNE 6 prg_curr prg_curr Are you current… integer NA -#> -#> $Categories -#> # A tibble: 23 × 5 -#> table variable name `label:en` missing -#> <chr> <chr> <chr> <chr> <chr> -#> 1 dataset_TOKYO gndr Male Male FALSE -#> 2 dataset_TOKYO gndr Female Female FALSE -#> 3 dataset_TOKYO gndr -77 Don’t want to answer TRUE -#> 4 dataset_TOKYO weight_ms -88 Don’t want to answer TRUE -#> 5 dataset_TOKYO weight_ms -99 Don’t know TRUE -#> 6 dataset_TOKYO prg_ever 0 never pregnant FALSE -#> 7 dataset_TOKYO prg_ever 1 pregnant once or more FALSE -#> 8 dataset_TOKYO prg_ever 2 currently pregnant FALSE -#> 9 dataset_TOKYO prg_ever 8 Don’t want to answer TRUE -#> 10 dataset_TOKYO prg_ever 9 Don’t know TRUE -#> # ℹ 13 more rows -#> +#> List of 2 +#> $ Variables : tibble [15 × 7] (S3: tbl_df/tbl/data.frame) +#> ..$ table : chr [1:15] "dataset_TOKYO" "dataset_TOKYO" "dataset_TOKYO" "dataset_TOKYO" ... +#> ..$ index : chr [1:15] "1" "2" "3" "4" ... +#> ..$ name : chr [1:15] "part_id" "gndr" "height" "weight_ms" ... +#> ..$ label:en : chr [1:15] "id of the participant" "gndr" "height" "weight_ms" ... +#> ..$ description:en: chr [1:15] "id of the participant" "gender of the participant" "height of the participant" "weight of the participant - measured" ... +#> ..$ valueType : chr [1:15] "text" "text" "integer" "integer" ... +#> ..$ unit : chr [1:15] NA NA "cm" "kg" ... +#> $ Categories: tibble [23 × 5] (S3: tbl_df/tbl/data.frame) +#> ..$ table : chr [1:23] "dataset_TOKYO" "dataset_TOKYO" "dataset_TOKYO" "dataset_TOKYO" ... +#> ..$ variable: chr [1:23] "gndr" "gndr" "gndr" "weight_ms" ... +#> ..$ name : chr [1:23] "Male" "Female" "-77" "-88" ... +#> ..$ label:en: chr [1:23] "Male" "Female" "Don’t want to answer" "Don’t want to answer" ... +#> ..$ missing : chr [1:23] "FALSE" "FALSE" "TRUE" "TRUE" ...
    diff --git a/docs/reference/data_dict_match_dataset.html b/docs/reference/data_dict_match_dataset.html index 62920ea..9047e28 100644 --- a/docs/reference/data_dict_match_dataset.html +++ b/docs/reference/data_dict_match_dataset.html @@ -19,7 +19,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -78,18 +78,18 @@

    Inner join between a dataset and its associated data dictionary

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations.

    +

    A dataset object.

    data_dict
    -

    A list of data frame(s) representing meta data -associated to a dataset.

    +

    A list of data frame(s) representing metadata of the input +dataset.

    data_dict_apply
    -

    whether to apply the data dictionary to its dataset. -The resulting data frame will have for each column its associated meta data -as attributes. FALSE by default.

    +

    Whether data dictionary(ies) should be applied to +associated dataset(s), creating labelled dataset(s) with variable attributes. +Any previous attributes will be preserved. FALSE by default.

    output
    @@ -110,8 +110,8 @@

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -131,64 +131,26 @@

    Examples

    # use madshapR_DEMO provided by the package library(dplyr) + dataset <- madshapR_DEMO$dataset_MELBOURNE %>% select(-1) data_dict <- madshapR_DEMO$data_dict_MELBOURNE -data_dict_match_dataset(dataset, data_dict) +head(data_dict_match_dataset(dataset, data_dict, out = 'dataset')) +glimpse(data_dict_match_dataset(dataset, data_dict, out = 'data_dict')) } -#> $dataset -#> # A tibble: 19 × 5 -#> Gender BMI age smo_status prg_curr -#> <dbl> <dbl> <dbl> <dbl> <dbl> -#> 1 2 221 52 1 0 -#> 2 1 1850655594 49 2 -77 -#> 3 1 2457679588 43 3 -77 -#> 4 2 1571539833 59 -77 1 -#> 5 2 1855378065 40 NA 0 -#> 6 1 1593177939 47 2 -77 -#> 7 2 2865839945 -888 -77 8 -#> 8 2 1973602923 53 2 0 -#> 9 2 NA 35 1 0 -#> 10 1 2126602756 40 1 -77 -#> 11 1 1719871484 41 NA -77 -#> 12 2 2169671091 34 3 1 -#> 13 2 1474286366 48 2 1 -#> 14 2 2437040385 43 1 9 -#> 15 1 NA -888 2 -77 -#> 16 1 2420491247 41 1 -77 -#> 17 2 1773788018 33 NA 1 -#> 18 2 1915254927 57 1 9 -#> 19 2 1876765116 50 2 1 -#> -#> $data_dict -#> $data_dict$Variables -#> # A tibble: 5 × 6 -#> index name `label:en` `description:en` valueType unit -#> <dbl> <chr> <chr> <chr> <chr> <chr> -#> 1 2 Gender Gender Gender integer NA -#> 2 3 BMI BMI Body Mass Index decimal kg/m… -#> 3 4 age age Age of Participant integer years -#> 4 5 smo_status smo_curr Whether the participant is a curr… integer NA -#> 5 6 prg_curr prg_curr Are you currently pregnant ? integer NA -#> -#> $data_dict$Categories -#> # A tibble: 12 × 4 -#> variable name `label:en` missing -#> <chr> <dbl> <chr> <lgl> -#> 1 age -888 don't want to answer TRUE -#> 2 Gender 1 Male FALSE -#> 3 Gender 2 Female FALSE -#> 4 prg_curr 0 not currently pregnant FALSE -#> 5 prg_curr 1 currently pregnant FALSE -#> 6 prg_curr 8 Don’t want to answer TRUE -#> 7 prg_curr 9 Don’t know TRUE -#> 8 prg_curr -77 not applicable TRUE -#> 9 smo_status 1 never smoked FALSE -#> 10 smo_status 2 current smoker FALSE -#> 11 smo_status 3 former smoker FALSE -#> 12 smo_status -77 don't want to answer TRUE -#> -#> +#> List of 2 +#> $ Variables : tibble [5 × 6] (S3: tbl_df/tbl/data.frame) +#> ..$ index : num [1:5] 2 3 4 5 6 +#> ..$ name : chr [1:5] "Gender" "BMI" "age" "smo_status" ... +#> ..$ label:en : chr [1:5] "Gender" "BMI" "age" "smo_curr" ... +#> ..$ description:en: chr [1:5] "Gender" "Body Mass Index" "Age of Participant" "Whether the participant is a current smoker or not" ... +#> ..$ valueType : chr [1:5] "integer" "decimal" "integer" "integer" ... +#> ..$ unit : chr [1:5] NA "kg/m^2" "years" NA ... +#> $ Categories: tibble [12 × 4] (S3: tbl_df/tbl/data.frame) +#> ..$ variable: chr [1:12] "age" "Gender" "Gender" "prg_curr" ... +#> ..$ name : num [1:12] -888 1 2 0 1 8 9 -77 1 2 ... +#> ..$ label:en: chr [1:12] "don't want to answer" "Male" "Female" "not currently pregnant" ... +#> ..$ missing : logi [1:12] TRUE FALSE FALSE FALSE FALSE TRUE ...
    diff --git a/docs/reference/data_dict_pivot_longer.html b/docs/reference/data_dict_pivot_longer.html index 1a75403..3813a32 100644 --- a/docs/reference/data_dict_pivot_longer.html +++ b/docs/reference/data_dict_pivot_longer.html @@ -22,7 +22,7 @@ madshapR - 1.0.3.0003 + 1.0.3

    @@ -79,13 +79,13 @@

    Transform column(s) of a data dictionary from wide format to long format

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be +

    A list of data frame(s) representing metadata to be transformed.

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    diff --git a/docs/reference/data_dict_pivot_wider.html b/docs/reference/data_dict_pivot_wider.html index 9290977..aca23f9 100644 --- a/docs/reference/data_dict_pivot_wider.html +++ b/docs/reference/data_dict_pivot_wider.html @@ -22,7 +22,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -79,13 +79,13 @@

    Transform column(s) of a data dictionary from long format to wide format

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be +

    A list of data frame(s) representing metadata to be transformed.

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    diff --git a/docs/reference/data_dict_ungroup.html b/docs/reference/data_dict_ungroup.html index bfe5bbd..f93e348 100644 --- a/docs/reference/data_dict_ungroup.html +++ b/docs/reference/data_dict_ungroup.html @@ -21,7 +21,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -77,7 +77,7 @@

    Ungroup data dictionary

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data to be +

    A list of data frame(s) representing metadata to be transformed.

    diff --git a/docs/reference/data_extract.html b/docs/reference/data_extract.html index 48a7604..dc405b4 100644 --- a/docs/reference/data_extract.html +++ b/docs/reference/data_extract.html @@ -20,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -75,14 +75,13 @@

    Create an empty dataset from a data dictionary

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data of an -associated dataset (to be generated).

    +

    A list of data frame(s) representing metadata.

    data_dict_apply
    -

    Whether to apply the data dictionary to its dataset. -The resulting data frame will have for each column its associated meta data -as attributes. FALSE by default.

    +

    Whether data dictionary(ies) should be applied to +associated dataset(s), creating labelled dataset(s) with variable attributes. +Any previous attributes will be preserved. FALSE by default.

    @@ -94,7 +93,15 @@

    Value

    Details

    -

    A data dictionary contains the list of variables in a dataset and metadata +

    A dataset is a data table containing variables. A dataset object is a +data frame and can be associated with a data dictionary. If no +data dictionary is provided with a dataset, a minimum workable +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. +The id values must be non-missing and will be used in functions that +require it. If no identifier variable is specified, indexing is +handled automatically by the function.

    +

    A data dictionary contains the list of variables in a dataset and metadata about the variables and can be associated with a dataset. A data dictionary object is a list of data frame(s) named 'Variables' (required) and 'Categories' (if any). To be usable in any function, the data frame diff --git a/docs/reference/dataset_cat_as_labels.html b/docs/reference/dataset_cat_as_labels.html index 5113d4b..75125d5 100644 --- a/docs/reference/dataset_cat_as_labels.html +++ b/docs/reference/dataset_cat_as_labels.html @@ -18,7 +18,7 @@ madshapR - 1.0.3.0003 + 1.0.3

    @@ -71,13 +71,12 @@

    Apply data dictionary category labels to the associated dataset variables

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations -associated to its data dictionary.

    +

    A dataset object.

    data_dict
    -

    A list of data frame(s) representing meta data of an -associated dataset (to be generated).

    +

    A list of data frame(s) representing metadata of the input +dataset. Automatically generated if not provided.

    col_names
    @@ -97,8 +96,8 @@

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -116,160 +115,26 @@

    Details

    Examples

    {
     
    -dataset_cat_as_labels(iris['Sepal.Length'])
    +dataset = madshapR_DEMO$dataset_PARIS
    +data_dict = as_data_dict_mlstr(madshapR_DEMO$data_dict_PARIS)
    +dataset_cat_as_labels(dataset, data_dict, col_names = 'SEX')
     
     }
    -#>     Sepal.Length
    -#> 1            5.1
    -#> 2            4.9
    -#> 3            4.7
    -#> 4            4.6
    -#> 5            5.0
    -#> 6            5.4
    -#> 7            4.6
    -#> 8            5.0
    -#> 9            4.4
    -#> 10           4.9
    -#> 11           5.4
    -#> 12           4.8
    -#> 13           4.8
    -#> 14           4.3
    -#> 15           5.8
    -#> 16           5.7
    -#> 17           5.4
    -#> 18           5.1
    -#> 19           5.7
    -#> 20           5.1
    -#> 21           5.4
    -#> 22           5.1
    -#> 23           4.6
    -#> 24           5.1
    -#> 25           4.8
    -#> 26           5.0
    -#> 27           5.0
    -#> 28           5.2
    -#> 29           5.2
    -#> 30           4.7
    -#> 31           4.8
    -#> 32           5.4
    -#> 33           5.2
    -#> 34           5.5
    -#> 35           4.9
    -#> 36           5.0
    -#> 37           5.5
    -#> 38           4.9
    -#> 39           4.4
    -#> 40           5.1
    -#> 41           5.0
    -#> 42           4.5
    -#> 43           4.4
    -#> 44           5.0
    -#> 45           5.1
    -#> 46           4.8
    -#> 47           5.1
    -#> 48           4.6
    -#> 49           5.3
    -#> 50           5.0
    -#> 51           7.0
    -#> 52           6.4
    -#> 53           6.9
    -#> 54           5.5
    -#> 55           6.5
    -#> 56           5.7
    -#> 57           6.3
    -#> 58           4.9
    -#> 59           6.6
    -#> 60           5.2
    -#> 61           5.0
    -#> 62           5.9
    -#> 63           6.0
    -#> 64           6.1
    -#> 65           5.6
    -#> 66           6.7
    -#> 67           5.6
    -#> 68           5.8
    -#> 69           6.2
    -#> 70           5.6
    -#> 71           5.9
    -#> 72           6.1
    -#> 73           6.3
    -#> 74           6.1
    -#> 75           6.4
    -#> 76           6.6
    -#> 77           6.8
    -#> 78           6.7
    -#> 79           6.0
    -#> 80           5.7
    -#> 81           5.5
    -#> 82           5.5
    -#> 83           5.8
    -#> 84           6.0
    -#> 85           5.4
    -#> 86           6.0
    -#> 87           6.7
    -#> 88           6.3
    -#> 89           5.6
    -#> 90           5.5
    -#> 91           5.5
    -#> 92           6.1
    -#> 93           5.8
    -#> 94           5.0
    -#> 95           5.6
    -#> 96           5.7
    -#> 97           5.7
    -#> 98           6.2
    -#> 99           5.1
    -#> 100          5.7
    -#> 101          6.3
    -#> 102          5.8
    -#> 103          7.1
    -#> 104          6.3
    -#> 105          6.5
    -#> 106          7.6
    -#> 107          4.9
    -#> 108          7.3
    -#> 109          6.7
    -#> 110          7.2
    -#> 111          6.5
    -#> 112          6.4
    -#> 113          6.8
    -#> 114          5.7
    -#> 115          5.8
    -#> 116          6.4
    -#> 117          6.5
    -#> 118          7.7
    -#> 119          7.7
    -#> 120          6.0
    -#> 121          6.9
    -#> 122          5.6
    -#> 123          7.7
    -#> 124          6.3
    -#> 125          6.7
    -#> 126          7.2
    -#> 127          6.2
    -#> 128          6.1
    -#> 129          6.4
    -#> 130          7.2
    -#> 131          7.4
    -#> 132          7.9
    -#> 133          6.4
    -#> 134          6.3
    -#> 135          6.1
    -#> 136          7.7
    -#> 137          6.3
    -#> 138          6.4
    -#> 139          6.0
    -#> 140          6.9
    -#> 141          6.7
    -#> 142          6.9
    -#> 143          5.8
    -#> 144          6.8
    -#> 145          6.7
    -#> 146          6.7
    -#> 147          6.3
    -#> 148          6.5
    -#> 149          6.2
    -#> 150          5.9
    +#> Processing of : SEX
    +#> # A tibble: 24 × 7
    +#>    ID           SEX              BMI   AGE   SMO SMO_QTY PRG_EVER
    +#>  * <chr>        <chr+lbl>      <dbl> <dbl> <dbl>   <dbl>    <dbl>
    +#>  1 Paris_687393 Femme [1] 2224298583    52     1      32        0
    +#>  2 Paris_585666 Homme [0] 1523935376    49     1       8       -8
    +#>  3 Paris_75802  Homme [0] 2266888359    43     1      48       -8
    +#>  4 Paris_412072 Femme [1]         NA    59     1      11        0
    +#>  5 Paris_404333 Femme [1] 2618221463    40     1      18        1
    +#>  6 Paris_554985 Homme [0] 1598702068    47     1       7       -8
    +#>  7 Paris_714168 Femme [1] 1904634522    46     1      18       NA
    +#>  8 Paris_145477 Femme [1]  168600545    53     0      -8        1
    +#>  9 Paris_202076 Femme [1] 2992421287    35     1      36        1
    +#> 10 Paris_847235 Homme [0] 2064553154    NA     0      -8       -8
    +#> # ℹ 14 more rows
     
     
    diff --git a/docs/reference/dataset_evaluate.html b/docs/reference/dataset_evaluate.html index 3594510..1b45d38 100644 --- a/docs/reference/dataset_evaluate.html +++ b/docs/reference/dataset_evaluate.html @@ -1,9 +1,8 @@ -Generate a quality assessment report of a dataset — dataset_evaluate • madshapRGenerate an assessment report for a dataset — dataset_evaluate • madshapR @@ -21,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3 @@ -57,17 +56,16 @@
    -

    Assesses the content and structure of a dataset and reports possible issues -in the dataset and data dictionary to facilitate assessment of input data. -The report can be used to help assess data structure, presence of fields, -coherence across elements, and taxonomy or data dictionary formats. This -report is compatible with Excel and can be exported as an Excel spreadsheet.

    +

    Assesses the content and structure of a dataset object and generates reports +of the results. This function can be used to evaluate data structure, +presence of specific fields, coherence across elements, and data dictionary +formats.

    @@ -84,30 +82,28 @@

    Generate a quality assessment report of a dataset

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations -associated to its data dictionary.

    +

    A dataset object.

    data_dict
    -

    A list of data frame(s) representing meta data of an -associated dataset. Automatically generated if not provided.

    +

    A list of data frame(s) representing metadata of the input +dataset. Automatically generated if not provided.

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    dataset_name

    A character string specifying the name of the dataset -(internally used in the function dossier_evaluate()).

    +(used internally in the function dossier_evaluate()).

    as_data_dict_mlstr
    -

    Whether the output data dictionary has a simple -data dictionary structure or not (meaning has a Maelstrom data dictionary -structure, compatible with Maelstrom Research ecosystem, including Opal). -TRUE by default.

    +

    Whether the input data dictionary should be coerced +with specific format restrictions for compatibility with other +Maelstrom Research software. TRUE by default.

    .dataset_name
    @@ -118,7 +114,7 @@

    Arguments

    Value

    -

    A list of data frames of report for one data dictionary.

    +

    A list of data frames containing assessment reports.

    Details

    @@ -133,8 +129,8 @@

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -147,7 +143,7 @@

    Details

    available online.

    The object may be specifically formatted to be compatible with additional Maelstrom Research software, -in particular Opal environments.

    +in particular Opal environments.

    See also

    @@ -161,10 +157,12 @@

    Examples

    # use madshapR_DEMO provided by the package library(dplyr) -###### Example : any data frame can be summarized -dataset <- - as_dataset(madshapR_DEMO$`dataset_TOKYO - errors with data`,col_id = 'part_id') - dataset_evaluate(dataset,as_data_dict_mlstr = FALSE) +###### Example : Any data frame can be summarized +dataset <- as_dataset( + madshapR_DEMO$`dataset_TOKYO - errors with data`, + col_id = 'part_id') + +glimpse(dataset_evaluate(dataset,as_data_dict_mlstr = FALSE)) } #> - DATA DICTIONARY ASSESSMENT: data_dict -------------- @@ -188,38 +186,23 @@

    Examples

    #> #> - WARNING MESSAGES (if any): ------------------------------------------------- #> -#> $`Data dictionary summary` -#> # A tibble: 9 × 4 -#> index name label typeof -#> <int> <chr> <chr> <chr> -#> 1 1 part_id part_id character -#> 2 2 gndr gndr character -#> 3 3 height height double -#> 4 4 weight_ms weight_ms double -#> 5 5 weight_dc weight_dc double -#> 6 6 dob dob character -#> 7 7 prg_ever prg_ever double -#> 8 8 empty empty logical -#> 9 9 opentext opentext character -#> -#> $`Data dictionary assessment` -#> # A tibble: 2 × 5 -#> sheet col_name name_var `Quality assessment comment` value -#> <chr> <chr> <chr> <chr> <chr> -#> 1 Variables label NA [INFO] - Possible duplicated columns name ; label -#> 2 Variables name NA [INFO] - Possible duplicated columns name ; label -#> -#> $`Dataset assessment` -#> # A tibble: 6 × 4 -#> `index in data dict.` name `Quality assessment comment` value -#> <int> <chr> <chr> <chr> -#> 1 3 height Unique value in the column 191 -#> 2 4 weight_ms [INFO] - Possible duplicated columns weig… -#> 3 7 prg_ever [INFO] - Possible duplicated columns weig… -#> 4 8 empty Empty column NA -#> 5 NA NA [INFO] - Empty participant(s) (Except p… ID008 -#> 6 NA NA [INFO] - Empty participant(s) (Except p… ID016 -#> +#> List of 3 +#> $ Data dictionary summary : tibble [9 × 4] (S3: tbl_df/tbl/data.frame) +#> ..$ index : int [1:9] 1 2 3 4 5 6 7 8 9 +#> ..$ name : chr [1:9] "part_id" "gndr" "height" "weight_ms" ... +#> ..$ label : chr [1:9] "part_id" "gndr" "height" "weight_ms" ... +#> ..$ typeof: chr [1:9] "character" "character" "double" "double" ... +#> $ Data dictionary assessment: tibble [2 × 5] (S3: tbl_df/tbl/data.frame) +#> ..$ sheet : chr [1:2] "Variables" "Variables" +#> ..$ col_name : chr [1:2] "label" "name" +#> ..$ name_var : chr [1:2] NA NA +#> ..$ Quality assessment comment: chr [1:2] "[INFO] - Possible duplicated columns" "[INFO] - Possible duplicated columns" +#> ..$ value : chr [1:2] "name ; label" "name ; label" +#> $ Dataset assessment : tibble [6 × 4] (S3: tbl_df/tbl/data.frame) +#> ..$ index in data dict. : int [1:6] 3 4 7 8 NA NA +#> ..$ name : chr [1:6] "height" "weight_ms" "prg_ever" "empty" ... +#> ..$ Quality assessment comment: chr [1:6] "Unique value in the column" "[INFO] - Possible duplicated columns" "[INFO] - Possible duplicated columns" "Empty column" ... +#> ..$ value : chr [1:6] "191" "weight_ms ; prg_ever" "weight_ms ; prg_ever" NA ...
    diff --git a/docs/reference/dataset_preprocess.html b/docs/reference/dataset_preprocess.html index ff2c8b9..9a0bc4f 100644 --- a/docs/reference/dataset_preprocess.html +++ b/docs/reference/dataset_preprocess.html @@ -28,7 +28,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -91,13 +91,12 @@

    Generate an evaluation of all variable values in a dataset

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations -associated to its data dictionary.

    +

    A dataset object.

    data_dict
    -

    A list of data frame(s) representing meta data of an -associated dataset. Automatically generated if not provided.

    +

    A list of data frame(s) representing metadata of the input +dataset. Automatically generated if not provided.

    @@ -120,8 +119,8 @@

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -135,24 +134,19 @@

    See also

    Examples

    {
      
    -###### Example : any data frame can be a dataset by definition.
    -dataset_preprocess(iris)
    +###### Example : Any data frame can be a dataset by definition.
    +head(dataset_preprocess(iris))
     
     }
    -#> # A tibble: 750 × 10
    -#>    index name       `Categorical variable` valid_class value_var_occur value_var
    -#>    <int> <chr>      <chr>                  <chr>                 <dbl> <chr>    
    -#>  1     1 Sepal.Len… no                     3_Valid ot…               1 5.1      
    -#>  2     1 Sepal.Len… no                     3_Valid ot…               1 4.9      
    -#>  3     1 Sepal.Len… no                     3_Valid ot…               1 4.7      
    -#>  4     1 Sepal.Len… no                     3_Valid ot…               1 4.6      
    -#>  5     1 Sepal.Len… no                     3_Valid ot…               1 5        
    -#>  6     1 Sepal.Len… no                     3_Valid ot…               1 5.4      
    -#>  7     1 Sepal.Len… no                     3_Valid ot…               1 4.6      
    -#>  8     1 Sepal.Len… no                     3_Valid ot…               1 5        
    -#>  9     1 Sepal.Len… no                     3_Valid ot…               1 4.4      
    -#> 10     1 Sepal.Len… no                     3_Valid ot…               1 4.9      
    -#> # ℹ 740 more rows
    +#> # A tibble: 6 × 10
    +#>   index name        `Categorical variable` valid_class value_var_occur value_var
    +#>   <int> <chr>       <chr>                  <chr>                 <dbl> <chr>    
    +#> 1     1 Sepal.Leng… no                     3_Valid ot…               1 5.1      
    +#> 2     1 Sepal.Leng… no                     3_Valid ot…               1 4.9      
    +#> 3     1 Sepal.Leng… no                     3_Valid ot…               1 4.7      
    +#> 4     1 Sepal.Leng… no                     3_Valid ot…               1 4.6      
    +#> 5     1 Sepal.Leng… no                     3_Valid ot…               1 5        
    +#> 6     1 Sepal.Leng… no                     3_Valid ot…               1 5.4      
     #> # ℹ 4 more variables: index_value <int>, cat_index <int>, cat_label <chr>,
     #> #   index_in_dataset <int>
     
    diff --git a/docs/reference/dataset_summarize.html b/docs/reference/dataset_summarize.html
    index 1f9669e..e8248d7 100644
    --- a/docs/reference/dataset_summarize.html
    +++ b/docs/reference/dataset_summarize.html
    @@ -1,11 +1,9 @@
     
    -Generate a report and summary of a dataset — dataset_summarize • madshapRGenerate an assessment report and summary of a dataset — dataset_summarize • madshapR
    @@ -23,7 +21,7 @@
           
           
             madshapR
    -        1.0.3.0003
    +        1.0.3
           
         
    @@ -59,19 +57,17 @@
    -

    Assesses and summarizes the content and structure of a dataset and data -dictionary and reports potential issues to facilitate the assessment of -input. The report can be used to help assess data structure, presence of -fields, coherence across elements, and taxonomy or data dictionary formats. -The summary provides additional information about variable distributions and -descriptive statistics. This report is compatible with Excel and can be -exported as an Excel spreadsheet.

    +

    Assesses and summarizes the content and structure of a dataset and generates +reports of the results. This function can be used to evaluate data structure, +presence of specific fields, coherence across elements, and data dictionary +formats, and to summarize additional information about variable distributions +and descriptive statistics.

    @@ -89,24 +85,23 @@

    Generate a report and summary of a dataset

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations -associated to its data dictionary.

    +

    A dataset object.

    data_dict
    -

    A list of data frame(s) representing meta data of an -associated dataset. Automatically generated if not provided.

    +

    A list of data frame(s) representing metadata of the input +dataset. Automatically generated if not provided.

    group_by

    A character string identifying the column in the dataset -to use as a grouping variable. Visual elements will be grouped by this +to use as a grouping variable. Elements will be grouped by this column.

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    dataset_name
    @@ -115,9 +110,8 @@

    Arguments

    valueType_guess
    -

    Whether the output should be generated based on more -precise valueType inferred from the data. FALSE by default -(will use the valueType declared).

    +

    Whether the output should include a more accurate +valueType that could be applied to the dataset. FALSE by default.

    .dataset_name
    @@ -128,7 +122,7 @@

    Arguments

    Value

    -

    A list of data frames of report for one data dictionary.

    +

    A list of data frames containing assessment reports and summaries.

    Details

    @@ -143,8 +137,8 @@

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -158,12 +152,13 @@

    Details

    The valueType is a declared property of a variable that is required in certain functions to determine handling of the variables. Specifically, valueType refers to the -OBiBa data type of a variable. +OBiBa data type of a variable. The valueType is specified in a data dictionary in a column 'valueType' and can be associated with variables as attributes. Acceptable valueTypes include 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The full list of OBiBa valueType possibilities and their correspondence with R data -types are available using valueType_list.

    +types are available using valueType_list. The valueType can be used to +coerce the variable to the corresponding data type.

    See also

    @@ -177,9 +172,9 @@

    Examples

    # use madshapR_DEMO provided by the package library(dplyr) -#' ###### Example : any data frame can be summarized +#' ###### Example : Any data frame can be summarized dataset <- iris['Sepal.Width'] -dataset_summarize(dataset) +glimpse(dataset_summarize(dataset)) } #> - DATA DICTIONARY ASSESSMENT: data_dict -------------- @@ -220,57 +215,53 @@

    Examples

    #> Summarize information for categorical variables #> Summarize global information (Overview) #> Generate report -#> $Overview -#> # A tibble: 15 × 2 -#> `Quality control of dataset` `(all)` -#> <chr> <chr> -#> 1 "Date" "2023-12-05" -#> 2 "Name of the dataset" "dataset" -#> 3 " Identifier Variable" "" -#> 4 " Variables" " " -#> 5 " Total number of variables (incl. identifier)" "1" -#> 6 " Total number of empty columns" "0" -#> 7 " Data type in dictionary (valueType)" " " -#> 8 " Nb. text variables" "0" -#> 9 " Nb. date variables" "0" -#> 10 " Nb. datetime variables" "0" -#> 11 " Nb. numerical variables" "1" -#> 12 " Nb. categorical variables" "0" -#> 13 " Rows" "(all)" -#> 14 " Total number of observations" "150" -#> 15 " Nb. distinct observations" "150" -#> -#> $`Data dictionary assessment` -#> # A tibble: 2 × 5 -#> sheet col_name name_var `Quality assessment comment` value -#> <chr> <chr> <chr> <chr> <chr> -#> 1 Variables label NA [INFO] - Possible duplicated columns name ; label -#> 2 Variables name NA [INFO] - Possible duplicated columns name ; label -#> -#> $`Variables summary (all)` -#> # A tibble: 1 × 13 -#> `index in data dict.` name `Quality assessment comment` label -#> <int> <chr> <chr> <chr> -#> 1 1 Sepal.Width NA Sepal.Width -#> # ℹ 9 more variables: `Data Dictionary valueType` <chr>, -#> # `Estimated dataset valueType` <chr>, `Actual dataset valueType` <chr>, -#> # `Categorical variable` <chr>, `Categories in data dictionary` <chr>, -#> # `Total number of observations` <dbl>, `Nb. distinct values` <int>, -#> # `% total Valid values` <dbl>, `% NA` <dbl> -#> -#> $`Numerical variable summary` -#> # A tibble: 1 × 22 -#> `index in data dict.` name `Quality assessment comment` label -#> <int> <chr> <chr> <chr> -#> 1 1 Sepal.Width NA Sepal.Width -#> # ℹ 18 more variables: `Data Dictionary valueType` <chr>, -#> # `Estimated dataset valueType` <chr>, `Actual dataset valueType` <chr>, -#> # `Categorical variable` <chr>, `Categories in data dictionary` <chr>, -#> # `Total number of observations` <dbl>, `Nb. distinct values` <int>, -#> # `% total Valid values` <dbl>, `% NA` <dbl>, -#> # `% Valid categorical values (if applicable)` <dbl>, -#> # `% Missing categorical values (if applicable)` <dbl>, MIN <dbl>, … -#> +#> List of 4 +#> $ Overview : tibble [15 × 2] (S3: tbl_df/tbl/data.frame) +#> ..$ Quality control of dataset: chr [1:15] "Date" "Name of the dataset" " Identifier Variable" " Variables" ... +#> ..$ (all) : chr [1:15] "2023-12-14" "dataset" "" " " ... +#> $ Data dictionary assessment: tibble [2 × 5] (S3: tbl_df/tbl/data.frame) +#> ..$ sheet : chr [1:2] "Variables" "Variables" +#> ..$ col_name : chr [1:2] "label" "name" +#> ..$ name_var : chr [1:2] NA NA +#> ..$ Quality assessment comment: chr [1:2] "[INFO] - Possible duplicated columns" "[INFO] - Possible duplicated columns" +#> ..$ value : chr [1:2] "name ; label" "name ; label" +#> $ Variables summary (all) : tibble [1 × 13] (S3: tbl_df/tbl/data.frame) +#> ..$ index in data dict. : int 1 +#> ..$ name : chr "Sepal.Width" +#> ..$ Quality assessment comment : chr NA +#> ..$ label : chr "Sepal.Width" +#> ..$ Data Dictionary valueType : chr "decimal" +#> ..$ Estimated dataset valueType : chr "decimal" +#> ..$ Actual dataset valueType : chr "decimal" +#> ..$ Categorical variable : chr "no" +#> ..$ Categories in data dictionary: chr NA +#> ..$ Total number of observations : num 150 +#> ..$ Nb. distinct values : int 23 +#> ..$ % total Valid values : num 100 +#> ..$ % NA : num 0 +#> $ Numerical variable summary: tibble [1 × 22] (S3: tbl_df/tbl/data.frame) +#> ..$ index in data dict. : int 1 +#> ..$ name : chr "Sepal.Width" +#> ..$ Quality assessment comment : chr NA +#> ..$ label : chr "Sepal.Width" +#> ..$ Data Dictionary valueType : chr "decimal" +#> ..$ Estimated dataset valueType : chr "decimal" +#> ..$ Actual dataset valueType : chr "decimal" +#> ..$ Categorical variable : chr "no" +#> ..$ Categories in data dictionary : chr NA +#> ..$ Total number of observations : num 150 +#> ..$ Nb. distinct values : int 23 +#> ..$ % total Valid values : num 100 +#> ..$ % NA : num 0 +#> ..$ % Valid categorical values (if applicable) : num NA +#> ..$ % Missing categorical values (if applicable): num NA +#> ..$ MIN : num 2 +#> ..$ Q1 : num 2.8 +#> ..$ MEDIAN : num 3 +#> ..$ Q3 : num 3.3 +#> ..$ MAX : num 4.4 +#> ..$ MEAN : num 3.06 +#> ..$ STDEV : num 0.436
    diff --git a/docs/reference/dataset_visualize.html b/docs/reference/dataset_visualize.html index aff87ad..a5c2f40 100644 --- a/docs/reference/dataset_visualize.html +++ b/docs/reference/dataset_visualize.html @@ -1,18 +1,7 @@ -Generate a web-based bookdown visual report of a dataset — dataset_visualize • madshapRGenerate a web-based visual report for a dataset — dataset_visualize • madshapR @@ -30,7 +19,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -66,26 +55,15 @@
    -

    Generates a visual report for a dataset in an HTML bookdown document. The -report provides figures and descriptive statistics for each variable to -facilitate the assessment of input data. Statistics and figures are -generated according to variable data type. The report can be used to help -assess data structure, coherence across elements, and taxonomy or -data dictionary formats. The summaries and figures provide additional -information about variable distributions and descriptive statistics. -The charts and tables are produced based on their data type. The variable -can be grouped using group_by parameter, which is a (categorical) column -in the dataset. The user may need to use as.factor() in this context. To -fasten the process (and allow recycling object in a workflow) the user can -feed the function with a dataset_summary, which is the output of the function -dataset_summarize() of the column(s) col and group_by. The summary -must have the same parameters to operate.

    +

    Generates a visual report of a dataset in an HTML bookdown +document, with summary figures and statistics for each variable. The report +outputs can be grouped by a categorical variable.

    @@ -106,8 +84,7 @@

    Generate a web-based bookdown visual report of a dataset

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations -associated to its data dictionary.

    +

    A dataset object.

    bookdown_path
    @@ -116,30 +93,29 @@

    Arguments

    data_dict
    -

    A list of data frame(s) representing meta data of an -associated dataset. Automatically generated if not provided.

    +

    A list of data frame(s) representing metadata of the input +dataset. Automatically generated if not provided.

    group_by
    -

    A character string identifying the column in each -dataset to use as a grouping variable. Visual elements will be grouped by -this column.

    +

    A character string identifying the column in the dataset +to use as a grouping variable. Elements will be grouped by this +column.

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    valueType_guess
    -

    Whether the output should be generated based on more -precise valueType inferred from the data. FALSE by default -(will use the valueType declared).

    +

    Whether the output should include a more accurate +valueType that could be applied to the dataset. FALSE by default.

    dataset_name

    A character string specifying the name of the dataset -(used for internal processes and programming).

    +(used internally in the function dossier_evaluate()).

    dataset_summary
    @@ -160,19 +136,17 @@

    Arguments

    Value

    -

    A bookdown folder containing files in the specified output folder. To -open the file in browser, open 'docs/index.html'. Or use -bookdown_open()

    - - +

    A folder containing files for the bookdown site. To open the bookdown site +in a browser, open 'docs/index.html', or use bookdown_open() with the +folder path.

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -187,12 +161,13 @@

    Details

    The valueType is a declared property of a variable that is required in certain functions to determine handling of the variables. Specifically, valueType refers to the -OBiBa data type of a variable. +OBiBa data type of a variable. The valueType is specified in a data dictionary in a column 'valueType' and can be associated with variables as attributes. Acceptable valueTypes include 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The full list of OBiBa valueType possibilities and their correspondence with R data -types are available using valueType_list.

    +types are available using valueType_list. The valueType can be used to +coerce the variable to the corresponding data type.

    A taxonomy is a classification schema that can be defined for variable attributes. A taxonomy is usually extracted from an Opal environment, and a @@ -203,7 +178,8 @@

    Details

    @@ -213,8 +189,9 @@

    Examples

    # You can use our demonstration files to run examples library(fs) +library(dplyr) -dataset <- madshapR_DEMO$dataset_TOKYO['height'] +dataset <- madshapR_DEMO$dataset_TOKYO['height'] %>% slice(0) dataset_summary <- madshapR_DEMO$`dataset_summary` if(dir_exists(tempdir())) dir_delete(tempdir()) @@ -244,10 +221,10 @@

    Examples

    #> 13/15 #> 14/15 [figures-plot12-1] #> 15/15 -#> "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS bookdownproj.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output bookdownproj.html --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\bookdown\rmarkdown\lua\custom-environment.lua" --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\latex-div.lua" --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\anchor-sections.lua" --metadata-file "C:\Users\guill\AppData\Local\Temp\RtmpIpvzZd\file3a207ace5c7e" --wrap preserve --standalone --section-divs --table-of-contents --toc-depth 3 --template "C:\Users\guill\AppData\Local\R\win-library\4.3\bookdown\templates\gitbook.html" --highlight-style pygments --number-sections --css style.css --mathjax --include-in-header "C:\Users\guill\AppData\Local\Temp\RtmpIpvzZd\rmarkdown-str3a20b226cf1.html" +#> "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS bookdownproj.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output bookdownproj.html --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\bookdown\rmarkdown\lua\custom-environment.lua" --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\latex-div.lua" --lua-filter "C:\Users\guill\AppData\Local\R\win-library\4.3\rmarkdown\rmarkdown\lua\anchor-sections.lua" --metadata-file "C:\Users\guill\AppData\Local\Temp\RtmpgDtwep\file258816b74025" --wrap preserve --standalone --section-divs --table-of-contents --toc-depth 3 --template "C:\Users\guill\AppData\Local\R\win-library\4.3\bookdown\templates\gitbook.html" --highlight-style pygments --number-sections --css style.css --mathjax --include-in-header "C:\Users\guill\AppData\Local\Temp\RtmpgDtwep\rmarkdown-str258867c36af0.html" #> #> -#> To edit your file, You can use the function `bookdown_open('C:\Users\guill\AppData\Local\Temp\RtmpIpvzZd')` +#> To edit your file, You can use the function `bookdown_open('C:\Users\guill\AppData\Local\Temp\RtmpgDtwep')` #> (Compatibility tested on Chrome, Edge and Mozilla) #> diff --git a/docs/reference/dataset_zap_data_dict.html b/docs/reference/dataset_zap_data_dict.html index 4ab3f0f..0e1ad45 100644 --- a/docs/reference/dataset_zap_data_dict.html +++ b/docs/reference/dataset_zap_data_dict.html @@ -19,7 +19,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -73,8 +73,7 @@

    Remove labels (attributes) from a data frame, leaving its unlabelled columns

    Arguments

    dataset
    -

    A data frame identifying the input dataset observations -associated to its data dictionary.

    +

    A dataset object.

    @@ -88,8 +87,8 @@

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -108,23 +107,18 @@

    Examples

    dataset <- madshapR_DEMO$dataset_TOKYO data_dict <- as_data_dict_mlstr(madshapR_DEMO$data_dict_TOKYO) dataset <- data_dict_apply(dataset,data_dict) -dataset_zap_data_dict(dataset) +head(dataset_zap_data_dict(dataset)) } -#> # A tibble: 50 × 9 -#> part_id gndr height weight_ms weight_dc dob prg_ever empty opentext -#> * <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <lgl> <chr> -#> 1 ID001 Male 191 63 NA 3/22/1990 -7 NA All chil… -#> 2 ID002 Female 176 NA 65 8/15/2001 0 NA grow up,… -#> 3 ID003 Female 154 NA 121 12/17/1996 2 NA years ol… -#> 4 ID004 Female 167 -88 NA 6/13/1990 1 NA flower a… -#> 5 ID005 Female 185 NA 45 12/17/1996 8 NA rather d… -#> 6 ID006 -77 171 57 NA 3/31/1981 -7 NA cried, O… -#> 7 ID007 Female 185 NA 58 4/19/1988 9 NA that pas… -#> 8 ID008 Female 171 NA 59 NA 2 NA that she… -#> 9 ID009 -77 169 52 NA 3/14/1976 -7 NA beginnin… -#> 10 ID010 Male 179 NA 62 10/19/1993 -7 NA All chil… -#> # ℹ 40 more rows +#> # A tibble: 6 × 9 +#> part_id gndr height weight_ms weight_dc dob prg_ever empty opentext +#> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <lgl> <chr> +#> 1 ID001 Male 191 63 NA 3/22/1990 -7 NA All child… +#> 2 ID002 Female 176 NA 65 8/15/2001 0 NA grow up, … +#> 3 ID003 Female 154 NA 121 12/17/1996 2 NA years old… +#> 4 ID004 Female 167 -88 NA 6/13/1990 1 NA flower an… +#> 5 ID005 Female 185 NA 45 12/17/1996 8 NA rather de… +#> 6 ID006 -77 171 57 NA 3/31/1981 -7 NA cried, Oh…

    diff --git a/docs/reference/deprecated.html b/docs/reference/deprecated.html index 1fadd1c..dfbc2a8 100644 --- a/docs/reference/deprecated.html +++ b/docs/reference/deprecated.html @@ -18,7 +18,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    diff --git a/docs/reference/dossier_create.html b/docs/reference/dossier_create.html index 33bd394..f11073f 100644 --- a/docs/reference/dossier_create.html +++ b/docs/reference/dossier_create.html @@ -1,7 +1,5 @@ -Create a dossier object from a list of dataset(s) — dossier_create • madshapRGenerate a dossier from a list of one or more datasets — dossier_create • madshapR @@ -19,7 +17,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -55,15 +53,13 @@
    -

    Assembles a dossier object from the listed datasets. A dossier is a list -containing at least one valid dataset and is the input used by key functions -of the package.

    +

    Generates a dossier object (list of one or more datasets).

    @@ -73,29 +69,28 @@

    Create a dossier object from a list of dataset(s)

    Arguments

    dataset_list
    -

    A list of data frame(s), identifying the input data -observations.

    +

    A list of data frame, each of them being dataset object.

    data_dict_apply
    -

    whether to apply the data dictionary to its dataset. -The resulting data frame will have for each column its associated meta data -as attributes. FALSE by default.

    +

    Whether data dictionary(ies) should be applied to +associated dataset(s), creating labelled dataset(s) with variable attributes. +Any previous attributes will be preserved. FALSE by default.

    Value

    -

    A list of data frame(s), each of them identifying datasets in a dossier.

    +

    A list of data frame(s), containing input dataset(s).

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    @@ -106,207 +101,61 @@

    Examples

    {
     
     # use madshapR_DEMO provided by the package
    +library(dplyr)
     
     ###### Example 1: datasets can be gathered into a dossier which is a list.
     dossier <- dossier_create(
      dataset_list = list(
        dataset_MELBOURNE = madshapR_DEMO$dataset_MELBOURNE,
        dataset_PARIS = madshapR_DEMO$dataset_PARIS ))
    +
    +glimpse(dossier)
         
    -###### Example 2: any data frame can be gathered into a dossier
    -dossier_create(list(iris, mtcars))
    +###### Example 2: Any data frame can be gathered into a dossier
    +glimpse(dossier_create(list(iris, mtcars)))
        
     }
    -#> $iris
    -#>     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    -#> 1            5.1         3.5          1.4         0.2     setosa
    -#> 2            4.9         3.0          1.4         0.2     setosa
    -#> 3            4.7         3.2          1.3         0.2     setosa
    -#> 4            4.6         3.1          1.5         0.2     setosa
    -#> 5            5.0         3.6          1.4         0.2     setosa
    -#> 6            5.4         3.9          1.7         0.4     setosa
    -#> 7            4.6         3.4          1.4         0.3     setosa
    -#> 8            5.0         3.4          1.5         0.2     setosa
    -#> 9            4.4         2.9          1.4         0.2     setosa
    -#> 10           4.9         3.1          1.5         0.1     setosa
    -#> 11           5.4         3.7          1.5         0.2     setosa
    -#> 12           4.8         3.4          1.6         0.2     setosa
    -#> 13           4.8         3.0          1.4         0.1     setosa
    -#> 14           4.3         3.0          1.1         0.1     setosa
    -#> 15           5.8         4.0          1.2         0.2     setosa
    -#> 16           5.7         4.4          1.5         0.4     setosa
    -#> 17           5.4         3.9          1.3         0.4     setosa
    -#> 18           5.1         3.5          1.4         0.3     setosa
    -#> 19           5.7         3.8          1.7         0.3     setosa
    -#> 20           5.1         3.8          1.5         0.3     setosa
    -#> 21           5.4         3.4          1.7         0.2     setosa
    -#> 22           5.1         3.7          1.5         0.4     setosa
    -#> 23           4.6         3.6          1.0         0.2     setosa
    -#> 24           5.1         3.3          1.7         0.5     setosa
    -#> 25           4.8         3.4          1.9         0.2     setosa
    -#> 26           5.0         3.0          1.6         0.2     setosa
    -#> 27           5.0         3.4          1.6         0.4     setosa
    -#> 28           5.2         3.5          1.5         0.2     setosa
    -#> 29           5.2         3.4          1.4         0.2     setosa
    -#> 30           4.7         3.2          1.6         0.2     setosa
    -#> 31           4.8         3.1          1.6         0.2     setosa
    -#> 32           5.4         3.4          1.5         0.4     setosa
    -#> 33           5.2         4.1          1.5         0.1     setosa
    -#> 34           5.5         4.2          1.4         0.2     setosa
    -#> 35           4.9         3.1          1.5         0.2     setosa
    -#> 36           5.0         3.2          1.2         0.2     setosa
    -#> 37           5.5         3.5          1.3         0.2     setosa
    -#> 38           4.9         3.6          1.4         0.1     setosa
    -#> 39           4.4         3.0          1.3         0.2     setosa
    -#> 40           5.1         3.4          1.5         0.2     setosa
    -#> 41           5.0         3.5          1.3         0.3     setosa
    -#> 42           4.5         2.3          1.3         0.3     setosa
    -#> 43           4.4         3.2          1.3         0.2     setosa
    -#> 44           5.0         3.5          1.6         0.6     setosa
    -#> 45           5.1         3.8          1.9         0.4     setosa
    -#> 46           4.8         3.0          1.4         0.3     setosa
    -#> 47           5.1         3.8          1.6         0.2     setosa
    -#> 48           4.6         3.2          1.4         0.2     setosa
    -#> 49           5.3         3.7          1.5         0.2     setosa
    -#> 50           5.0         3.3          1.4         0.2     setosa
    -#> 51           7.0         3.2          4.7         1.4 versicolor
    -#> 52           6.4         3.2          4.5         1.5 versicolor
    -#> 53           6.9         3.1          4.9         1.5 versicolor
    -#> 54           5.5         2.3          4.0         1.3 versicolor
    -#> 55           6.5         2.8          4.6         1.5 versicolor
    -#> 56           5.7         2.8          4.5         1.3 versicolor
    -#> 57           6.3         3.3          4.7         1.6 versicolor
    -#> 58           4.9         2.4          3.3         1.0 versicolor
    -#> 59           6.6         2.9          4.6         1.3 versicolor
    -#> 60           5.2         2.7          3.9         1.4 versicolor
    -#> 61           5.0         2.0          3.5         1.0 versicolor
    -#> 62           5.9         3.0          4.2         1.5 versicolor
    -#> 63           6.0         2.2          4.0         1.0 versicolor
    -#> 64           6.1         2.9          4.7         1.4 versicolor
    -#> 65           5.6         2.9          3.6         1.3 versicolor
    -#> 66           6.7         3.1          4.4         1.4 versicolor
    -#> 67           5.6         3.0          4.5         1.5 versicolor
    -#> 68           5.8         2.7          4.1         1.0 versicolor
    -#> 69           6.2         2.2          4.5         1.5 versicolor
    -#> 70           5.6         2.5          3.9         1.1 versicolor
    -#> 71           5.9         3.2          4.8         1.8 versicolor
    -#> 72           6.1         2.8          4.0         1.3 versicolor
    -#> 73           6.3         2.5          4.9         1.5 versicolor
    -#> 74           6.1         2.8          4.7         1.2 versicolor
    -#> 75           6.4         2.9          4.3         1.3 versicolor
    -#> 76           6.6         3.0          4.4         1.4 versicolor
    -#> 77           6.8         2.8          4.8         1.4 versicolor
    -#> 78           6.7         3.0          5.0         1.7 versicolor
    -#> 79           6.0         2.9          4.5         1.5 versicolor
    -#> 80           5.7         2.6          3.5         1.0 versicolor
    -#> 81           5.5         2.4          3.8         1.1 versicolor
    -#> 82           5.5         2.4          3.7         1.0 versicolor
    -#> 83           5.8         2.7          3.9         1.2 versicolor
    -#> 84           6.0         2.7          5.1         1.6 versicolor
    -#> 85           5.4         3.0          4.5         1.5 versicolor
    -#> 86           6.0         3.4          4.5         1.6 versicolor
    -#> 87           6.7         3.1          4.7         1.5 versicolor
    -#> 88           6.3         2.3          4.4         1.3 versicolor
    -#> 89           5.6         3.0          4.1         1.3 versicolor
    -#> 90           5.5         2.5          4.0         1.3 versicolor
    -#> 91           5.5         2.6          4.4         1.2 versicolor
    -#> 92           6.1         3.0          4.6         1.4 versicolor
    -#> 93           5.8         2.6          4.0         1.2 versicolor
    -#> 94           5.0         2.3          3.3         1.0 versicolor
    -#> 95           5.6         2.7          4.2         1.3 versicolor
    -#> 96           5.7         3.0          4.2         1.2 versicolor
    -#> 97           5.7         2.9          4.2         1.3 versicolor
    -#> 98           6.2         2.9          4.3         1.3 versicolor
    -#> 99           5.1         2.5          3.0         1.1 versicolor
    -#> 100          5.7         2.8          4.1         1.3 versicolor
    -#> 101          6.3         3.3          6.0         2.5  virginica
    -#> 102          5.8         2.7          5.1         1.9  virginica
    -#> 103          7.1         3.0          5.9         2.1  virginica
    -#> 104          6.3         2.9          5.6         1.8  virginica
    -#> 105          6.5         3.0          5.8         2.2  virginica
    -#> 106          7.6         3.0          6.6         2.1  virginica
    -#> 107          4.9         2.5          4.5         1.7  virginica
    -#> 108          7.3         2.9          6.3         1.8  virginica
    -#> 109          6.7         2.5          5.8         1.8  virginica
    -#> 110          7.2         3.6          6.1         2.5  virginica
    -#> 111          6.5         3.2          5.1         2.0  virginica
    -#> 112          6.4         2.7          5.3         1.9  virginica
    -#> 113          6.8         3.0          5.5         2.1  virginica
    -#> 114          5.7         2.5          5.0         2.0  virginica
    -#> 115          5.8         2.8          5.1         2.4  virginica
    -#> 116          6.4         3.2          5.3         2.3  virginica
    -#> 117          6.5         3.0          5.5         1.8  virginica
    -#> 118          7.7         3.8          6.7         2.2  virginica
    -#> 119          7.7         2.6          6.9         2.3  virginica
    -#> 120          6.0         2.2          5.0         1.5  virginica
    -#> 121          6.9         3.2          5.7         2.3  virginica
    -#> 122          5.6         2.8          4.9         2.0  virginica
    -#> 123          7.7         2.8          6.7         2.0  virginica
    -#> 124          6.3         2.7          4.9         1.8  virginica
    -#> 125          6.7         3.3          5.7         2.1  virginica
    -#> 126          7.2         3.2          6.0         1.8  virginica
    -#> 127          6.2         2.8          4.8         1.8  virginica
    -#> 128          6.1         3.0          4.9         1.8  virginica
    -#> 129          6.4         2.8          5.6         2.1  virginica
    -#> 130          7.2         3.0          5.8         1.6  virginica
    -#> 131          7.4         2.8          6.1         1.9  virginica
    -#> 132          7.9         3.8          6.4         2.0  virginica
    -#> 133          6.4         2.8          5.6         2.2  virginica
    -#> 134          6.3         2.8          5.1         1.5  virginica
    -#> 135          6.1         2.6          5.6         1.4  virginica
    -#> 136          7.7         3.0          6.1         2.3  virginica
    -#> 137          6.3         3.4          5.6         2.4  virginica
    -#> 138          6.4         3.1          5.5         1.8  virginica
    -#> 139          6.0         3.0          4.8         1.8  virginica
    -#> 140          6.9         3.1          5.4         2.1  virginica
    -#> 141          6.7         3.1          5.6         2.4  virginica
    -#> 142          6.9         3.1          5.1         2.3  virginica
    -#> 143          5.8         2.7          5.1         1.9  virginica
    -#> 144          6.8         3.2          5.9         2.3  virginica
    -#> 145          6.7         3.3          5.7         2.5  virginica
    -#> 146          6.7         3.0          5.2         2.3  virginica
    -#> 147          6.3         2.5          5.0         1.9  virginica
    -#> 148          6.5         3.0          5.2         2.0  virginica
    -#> 149          6.2         3.4          5.4         2.3  virginica
    -#> 150          5.9         3.0          5.1         1.8  virginica
    -#> 
    -#> $mtcars
    -#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    -#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    -#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    -#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    -#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    -#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    -#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    -#> Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    -#> Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    -#> Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    -#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    -#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    -#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    -#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    -#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    -#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    -#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    -#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    -#> Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    -#> Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    -#> AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    -#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    -#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    -#> Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    -#> Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    -#> Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    -#> Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    -#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    -#> Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    -#> 
    -#> attr(,"madshapR::class")
    -#> [1] "dossier"
    +#> List of 2
    +#>  $ dataset_MELBOURNE: tibble [19 × 6] (S3: tbl_df/tbl/data.frame)
    +#>   ..$ id        : num [1:19] 377943 497013 927676 995667 21829 ...
    +#>   ..$ Gender    : num [1:19] 2 1 1 2 2 1 2 2 2 1 ...
    +#>   ..$ BMI       : num [1:19] 2.21e+02 1.85e+09 2.46e+09 1.57e+09 1.86e+09 ...
    +#>   ..$ age       : num [1:19] 52 49 43 59 40 47 -888 53 35 40 ...
    +#>   ..$ smo_status: num [1:19] 1 2 3 -77 NA 2 -77 2 1 1 ...
    +#>   ..$ prg_curr  : num [1:19] 0 -77 -77 1 0 -77 8 0 0 -77 ...
    +#>   ..- attr(*, "madshapR::class")= chr "dataset"
    +#>  $ dataset_PARIS    : tibble [24 × 7] (S3: tbl_df/tbl/data.frame)
    +#>   ..$ ID      : chr [1:24] "Paris_687393" "Paris_585666" "Paris_75802" "Paris_412072" ...
    +#>   ..$ SEX     : num [1:24] 1 0 0 1 1 0 1 1 1 0 ...
    +#>   ..$ BMI     : num [1:24] 2.22e+09 1.52e+09 2.27e+09 NA 2.62e+09 ...
    +#>   ..$ AGE     : num [1:24] 52 49 43 59 40 47 46 53 35 NA ...
    +#>   ..$ SMO     : num [1:24] 1 1 1 1 1 1 1 0 1 0 ...
    +#>   ..$ SMO_QTY : num [1:24] 32 8 48 11 18 7 18 -8 36 -8 ...
    +#>   ..$ PRG_EVER: num [1:24] 0 -8 -8 0 1 -8 NA 1 1 -8 ...
    +#>   ..- attr(*, "madshapR::class")= chr "dataset"
    +#>  - attr(*, "madshapR::class")= chr "dossier"
    +#> List of 2
    +#>  $ iris  :'data.frame':	150 obs. of  5 variables:
    +#>   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    +#>   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    +#>   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    +#>   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    +#>   ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    +#>   ..- attr(*, "madshapR::class")= chr "dataset"
    +#>  $ mtcars:'data.frame':	32 obs. of  11 variables:
    +#>   ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
    +#>   ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
    +#>   ..$ disp: num [1:32] 160 160 108 258 360 ...
    +#>   ..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
    +#>   ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
    +#>   ..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
    +#>   ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
    +#>   ..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
    +#>   ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
    +#>   ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
    +#>   ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
    +#>   ..- attr(*, "madshapR::class")= chr "dataset"
    +#>  - attr(*, "madshapR::class")= chr "dossier"
     
     
    diff --git a/docs/reference/dossier_evaluate.html b/docs/reference/dossier_evaluate.html index 4797f31..ba8a4e6 100644 --- a/docs/reference/dossier_evaluate.html +++ b/docs/reference/dossier_evaluate.html @@ -1,10 +1,8 @@ -Generate a quality assessment report of a dossier (list of datasets) — dossier_evaluate • madshapRGenerate an assessment report of a dossier — dossier_evaluate • madshapR @@ -22,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -58,18 +56,16 @@
    -

    Assesses the content and structure of a dossier object (list of -datasets) and reports possible issues in the datasets and data dictionaries -to facilitate assessment of input data. -The report can be used to help assess data structure, presence of fields, -coherence across elements, and taxonomy or data dictionary formats.This -report is compatible with Excel and can be exported as an Excel spreadsheet.

    +

    Assesses the content and structure of a dossier object (list of datasets) +and generates reports of the results. This function can be used to evaluate +data structure, presence of specific fields, coherence across elements, and +data dictionary formats.

    @@ -83,27 +79,26 @@

    Arguments

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    as_data_dict_mlstr
    -

    Whether the output data dictionary has a simple -data dictionary structure or not (meaning has a Maelstrom data dictionary -structure, compatible with Maelstrom Research ecosystem, including Opal). -TRUE by default.

    +

    Whether the input data dictionary should be coerced +with specific format restrictions for compatibility with other +Maelstrom Research software. TRUE by default.

    Value

    -

    A list of data frames of report for each dataset.

    +

    A list of data frames containing assessment reports.

    Details

    A dossier is a named list containing at least one data frame or more, -each of them being datasets. The name of each tibble will be use as the +each of them being datasets. The name of each data frame will be use as the reference name of the dataset.

    A taxonomy is a classification schema that can be defined for variable attributes. A taxonomy is usually extracted from an @@ -114,7 +109,7 @@

    Details

    available online.

    The object may be specifically formatted to be compatible with additional Maelstrom Research software, -in particular Opal environments.

    +in particular Opal environments.

    @@ -126,10 +121,13 @@

    Examples

    ###### Example : a dataset list is a dossier by definition. -dataset <- - as_dataset(madshapR_DEMO$`dataset_TOKYO - errors with data`,col_id = 'part_id') + dataset <- as_dataset( + madshapR_DEMO$`dataset_TOKYO - errors with data`, + col_id = 'part_id') - dossier_evaluate(as_dossier(list(ds = dataset)),as_data_dict_mlstr = FALSE) + dossier <- as_dossier(list(dataset = dataset)) + + glimpse(dossier_evaluate(dossier,as_data_dict_mlstr = FALSE)) } #> - DOSSIER ASSESSMENT: ---------------------------------------------------- @@ -142,7 +140,7 @@

    Examples

    #> #> - WARNING MESSAGES (if any): -------------------------------------------- #> -#> - DATASET ASSESSMENT: ds -------------------------- +#> - DATASET ASSESSMENT: dataset -------------------------- #> Assess the presence of variable names both in dataset and data dictionary #> Assess the presence of possible duplicated variable in the dataset #> Assess the presence of duplicated participants in the dataset @@ -154,40 +152,11 @@

    Examples

    #> #> - WARNING MESSAGES (if any): ------------------------------------------------- #> -#> $ds -#> $ds$`Data dictionary summary` -#> # A tibble: 9 × 4 -#> index name label typeof -#> <int> <chr> <chr> <chr> -#> 1 1 part_id part_id character -#> 2 2 gndr gndr character -#> 3 3 height height double -#> 4 4 weight_ms weight_ms double -#> 5 5 weight_dc weight_dc double -#> 6 6 dob dob character -#> 7 7 prg_ever prg_ever double -#> 8 8 empty empty logical -#> 9 9 opentext opentext character -#> -#> $ds$`Data dictionary assessment` -#> # A tibble: 2 × 5 -#> sheet col_name name_var `Quality assessment comment` value -#> <chr> <chr> <chr> <chr> <chr> -#> 1 Variables label NA [INFO] - Possible duplicated columns name ; label -#> 2 Variables name NA [INFO] - Possible duplicated columns name ; label -#> -#> $ds$`Dataset assessment` -#> # A tibble: 6 × 4 -#> `index in data dict.` name `Quality assessment comment` value -#> <int> <chr> <chr> <chr> -#> 1 3 height Unique value in the column 191 -#> 2 4 weight_ms [INFO] - Possible duplicated columns weig… -#> 3 7 prg_ever [INFO] - Possible duplicated columns weig… -#> 4 8 empty Empty column NA -#> 5 NA NA [INFO] - Empty participant(s) (Except p… ID008 -#> 6 NA NA [INFO] - Empty participant(s) (Except p… ID016 -#> -#> +#> List of 1 +#> $ dataset:List of 3 +#> ..$ Data dictionary summary : tibble [9 × 4] (S3: tbl_df/tbl/data.frame) +#> ..$ Data dictionary assessment: tibble [2 × 5] (S3: tbl_df/tbl/data.frame) +#> ..$ Dataset assessment : tibble [6 × 4] (S3: tbl_df/tbl/data.frame)
    diff --git a/docs/reference/dossier_summarize.html b/docs/reference/dossier_summarize.html index 4aec8a2..f8ea7dc 100644 --- a/docs/reference/dossier_summarize.html +++ b/docs/reference/dossier_summarize.html @@ -1,11 +1,9 @@ -Generate a report and summary of a dossier (list of datasets) — dossier_summarize • madshapRGenerate an assessment report and summary of a dossier — dossier_summarize • madshapR @@ -23,7 +21,7 @@ madshapR - 1.0.3.0003 + 1.0.3 @@ -59,19 +57,17 @@

    Assesses and summarizes the content and structure of a dossier -(list of datasets) and reports potential issues to facilitate the -assessment of input data. The report can be used to help assess data -structure, presence of fields, coherence across elements, and taxonomy or -data dictionary formats. The summary provides additional information about -variable distributions and descriptive statistics. This report is compatible -with Excel and can be exported as an Excel spreadsheet.

    +(list of datasets) and generates reports of the results. This function can +be used to evaluate data structure, presence of specific fields, coherence +across elements, and data dictionary formats, and to summarize additional +information about variable distributions and descriptive statistics.

    @@ -90,32 +86,31 @@

    Arguments

    group_by
    -

    A character string identifying the column in the datasets -to use as a grouping variable. Visual elements will be grouped by this +

    A character string identifying the column in the dataset +to use as a grouping variable. Elements will be grouped by this column.

    taxonomy
    -

    An optional data frame identifying a variable -classification schema.

    +

    An optional data frame identifying a variable classification +schema.

    valueType_guess
    -

    Whether the output should be generated based on more -precise valueType inferred from the data. FALSE by default -(will use the valueType declared).

    +

    Whether the output should include a more accurate +valueType that could be applied to the dataset. FALSE by default.

    Value

    -

    A list of data frames of report for each listed dataset.

    +

    A list of data frames containing overall assessment reports and summaries grouped by dataset.

    Details

    A dossier is a named list containing at least one data frame or more, -each of them being datasets. The name of each tibble will be use as the +each of them being datasets. The name of each data frame will be use as the reference name of the dataset.

    A taxonomy is a classification schema that can be defined for variable attributes. A taxonomy is usually extracted from an @@ -127,12 +122,13 @@

    Details

    The valueType is a declared property of a variable that is required in certain functions to determine handling of the variables. Specifically, valueType refers to the -OBiBa data type of a variable. +OBiBa data type of a variable. The valueType is specified in a data dictionary in a column 'valueType' and can be associated with variables as attributes. Acceptable valueTypes include 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The full list of OBiBa valueType possibilities and their correspondence with R data -types are available using valueType_list.

    +types are available using valueType_list. The valueType can be used to +coerce the variable to the corresponding data type.

    diff --git a/docs/reference/drop_category.html b/docs/reference/drop_category.html new file mode 100644 index 0000000..6b7c6d5 --- /dev/null +++ b/docs/reference/drop_category.html @@ -0,0 +1,123 @@ + +Validate and coerce any object as a non-categorical variable. — drop_category • madshapR + + +
    +
    + + + +
    +
    + + +
    +

    [Experimental] +Converts a vector object to a non-categorical object, typically a column in a +data frame. The categories come from non-missing values present in the +object and are suppressed from an associated data dictionary (when present).

    +
    + +
    +
    drop_category(x)
    +
    + +
    +

    Arguments

    +
    x
    +

    object to be coerced.

    + +
    +
    +

    Value

    + + +

    A R object.

    +
    + +
    +

    Examples

    +
    {
    +
    +head(iris[['Species']])
    +head(drop_category(iris[['Species']]))
    +
    +}
    +#> [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/index.html b/docs/reference/index.html index 05e1c92..63fda9a 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -17,7 +17,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    @@ -61,33 +61,37 @@

    All functions

    +

    as_category()

    + +

    Validate and coerce any object as a categorical variable.

    +

    as_dataset()

    -

    Validate and coerce an object to dataset format

    +

    Validate and coerce any object as a dataset

    as_data_dict()

    -

    Validate and coerce any object as data dictionary

    +

    Validate and coerce any object as a data dictionary

    as_data_dict_mlstr()

    -

    Validate and coerce an object to an Opal data dictionary format

    +

    Validate and coerce any object as an Opal data dictionary format

    as_data_dict_shape()

    -

    Validate and coerce an object to a workable data dictionary structure

    +

    Validate and coerce any object as a workable data dictionary structure

    as_dossier()

    -

    Validate and coerce an object to dossier format

    +

    Validate and coerce any object as a dossier (list of dataset(s))

    as_taxonomy()

    -

    Validate and coerce an object to taxonomy format

    +

    Validate and coerce any object as a taxonomy

    as_valueType()

    -

    Validate and coerce an object according to a given valueType

    +

    Validate and coerce any object according to a given valueType

    bookdown_open

    @@ -143,7 +147,7 @@

    All functions

    dataset_evaluate()

    -

    Generate a quality assessment report of a dataset

    +

    Generate an assessment report for a dataset

    dataset_preprocess()

    @@ -151,11 +155,11 @@

    All functions

    dataset_summarize()

    -

    Generate a report and summary of a dataset

    +

    Generate an assessment report and summary of a dataset

    dataset_visualize()

    -

    Generate a web-based bookdown visual report of a dataset

    +

    Generate a web-based visual report for a dataset

    dataset_zap_data_dict()

    @@ -171,7 +175,7 @@

    All functions

    data_dict_evaluate()

    -

    Generate a quality assessment report of a data dictionary

    +

    Generate an assessment report for a data dictionary

    data_dict_expand()

    @@ -179,7 +183,7 @@

    All functions

    data_dict_extract()

    -

    Create a data dictionary from a dataset

    +

    Generate a data dictionary from a dataset

    data_dict_filter()

    @@ -219,15 +223,23 @@

    All functions

    dossier_create()

    -

    Create a dossier object from a list of dataset(s)

    +

    Generate a dossier from a list of one or more datasets

    dossier_evaluate()

    -

    Generate a quality assessment report of a dossier (list of datasets)

    +

    Generate an assessment report of a dossier

    dossier_summarize()

    -

    Generate a report and summary of a dossier (list of datasets)

    +

    Generate an assessment report and summary of a dossier

    + +

    drop_category()

    + +

    Validate and coerce any object as a non-categorical variable.

    + +

    is_category()

    + +

    Test if an object is a valid dataset

    is_dataset()

    @@ -247,7 +259,7 @@

    All functions

    is_dossier()

    -

    Test if an object is a valid dossier

    +

    Test if an object is a valid dossier (list of dataset(s))

    is_taxonomy()

    diff --git a/docs/reference/is_category.html b/docs/reference/is_category.html new file mode 100644 index 0000000..df5b5ff --- /dev/null +++ b/docs/reference/is_category.html @@ -0,0 +1,135 @@ + +Test if an object is a valid dataset — is_category • madshapR + + +
    +
    + + + +
    +
    + + +
    +

    Tests if the input object is a valid dataset. This function mainly helps +validate input within other functions of the package but could be used +to check if a dataset is valid.

    +

    [Experimental] +Test if vector object is a categorical variable, typically a column in a +data frame. This function mainly helps validate input within other functions +of the package.

    +
    + +
    +
    is_category(x, threshold = NULL)
    +
    + +
    +

    Arguments

    +
    x
    +

    object to be coerced.

    + + +
    threshold
    +

    Optional. The function returns TRUE if the number of unique +values in the input vector is lower.

    + +
    +
    +

    Value

    + + +

    A logical.

    +
    + +
    +

    Examples

    +
    {
    +
    +library(dplyr)
    +iris %>% summarise(across(everything(), is_category))
    +is_category(iris[['Species']])
    +
    +}
    +#> [1] TRUE
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/is_data_dict.html b/docs/reference/is_data_dict.html index 682fc9b..ee7a722 100644 --- a/docs/reference/is_data_dict.html +++ b/docs/reference/is_data_dict.html @@ -19,7 +19,7 @@ madshapR - 1.0.3.0003 + 1.0.3

    diff --git a/docs/reference/is_data_dict_mlstr.html b/docs/reference/is_data_dict_mlstr.html index 15743cf..af9ed20 100644 --- a/docs/reference/is_data_dict_mlstr.html +++ b/docs/reference/is_data_dict_mlstr.html @@ -20,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3
    diff --git a/docs/reference/is_data_dict_shape.html b/docs/reference/is_data_dict_shape.html index 2925ee3..25f3487 100644 --- a/docs/reference/is_data_dict_shape.html +++ b/docs/reference/is_data_dict_shape.html @@ -20,7 +20,7 @@ madshapR - 1.0.3.0003 + 1.0.3 diff --git a/docs/reference/is_dataset.html b/docs/reference/is_dataset.html index bd8ec40..7e38d0a 100644 --- a/docs/reference/is_dataset.html +++ b/docs/reference/is_dataset.html @@ -19,7 +19,7 @@ madshapR - 1.0.3.0003 + 1.0.3 @@ -87,8 +87,8 @@

    Details

    A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable -data dictionary will be generated as needed within relevant functions. An -identifier variable(s) for indexing can be specified by the user. +data dictionary will be generated as needed within relevant functions. +Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.

    diff --git a/docs/reference/is_dossier.html b/docs/reference/is_dossier.html index 5df5e52..7be9e1f 100644 --- a/docs/reference/is_dossier.html +++ b/docs/reference/is_dossier.html @@ -1,5 +1,5 @@ -Test if an object is a valid dossier — is_dossier • madshapRTest if an object is a valid dossier (list of dataset(s)) — is_dossier • madshapR