-
Notifications
You must be signed in to change notification settings - Fork 12
/
README.Rmd
237 lines (183 loc) · 9.14 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
title: "README-ContDataQC"
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE
, comment = "#>"
, fig.path = "README-"
)
```
```{r, echo = FALSE}
cat(paste0("Last Update: ",Sys.time()))
```
# ContDataQC
Quality control checks on continuous data. Example data is from a HOBO data
logger with 30 minute intervals.
# Badges
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/leppott/ContDataQC/graphs/commit-activity)
[![lifecycle](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://www.tidyverse.org/lifecycle/#stable)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://cran.r-project.org/web/licenses/MIT)
[![CodeFactor](https://www.codefactor.io/repository/github/leppott/ContDataQC/badge)](https://www.codefactor.io/repository/github/leppott/ContDataQC)
[![codecov](https://codecov.io/gh/leppott/ContDataQC/branch/master/graph/badge.svg)](https://codecov.io/gh/leppott/ContDataQC)
[![R-CMD-check](https://github.com/leppott/ContDataQC/workflows/R-CMD-check/badge.svg)](https://github.com/leppott/ContDataQC/actions)
[![GitHub issues](https://img.shields.io/github/issues/leppott/ContDataQC.svg)](https://GitHub.com/leppott/ContDataQC/issues/)
[![GitHub release](https://img.shields.io/github/release/leppott/ContDataQC.svg)](https://GitHub.com/leppott/ContDataQC/releases/)
[![Github all releases](https://img.shields.io/github/downloads/leppott/ContDataQC/total.svg)](https://GitHub.com/leppott/ContDataQC/releases/)
# Installation
To install the current version of the code from GitHub use the example below.
```{r Install_Basic, eval=FALSE}
if(!require(remotes)){install.packages("remotes")} #install if needed
remotes::install_github("leppott/ContDataQC")
```
The vignette (big help file) isn't created when installing from GitHub with the
above command. If you want the vignette download the compressed file from
GitHub and install from that file or install with the command below. The
"force = TRUE" command is used to ensure the package will install over and
existing install of the same version (e.g., the same version without the
vignettes).
```{r Install_Vignette, eval=FALSE}
if(!require(remotes)){install.packages("remotes")} #install if needed
remotes::install_github("leppott/ContDataQC", force = TRUE, build_vignettes = TRUE)
```
If dependent libraries do not install you can install them separately. This
happens occasionally. Without all of the packages the main package and/or the
vignettes will not install properly. Install the separate packages below and
then retry installing ContDataQC.
```{r Pkg_CRAN, eval=FALSE}
# libraries to be installed
pkg <- c("remotes" # install helper for non CRAN libraries
, "installr" # install helper
, "digest" # caused error in R v3.2.3 without it
, "dataRetrieval" # loads USGS data into R
, "knitr" # create documents in other formats (e.g., PDF or Word)
, "doBy" # summary stats
, "zoo" # z's ordered observations, use for rolling sd calc
, "htmltools" # needed for knitr and doesn't always install properly with Pandoc
, "rmarkdown" # needed for knitr and doesn't always install properly with Pandoc
, "htmltools" # a dependency that is sometimes missed.
, "evaluate" # a dependency that is sometimes missed.
, "highr" # a dependency that is sometimes missed.
, "rmarkdown" # a dependency that is sometimes missed.
, "ggplot2" # plots
, "installr" # used to install Pandoc
, "rsconnect"
, "rLakeAnalyzer"
, "shiny" # shiny example
, "survival"
, "shinyFiles" # more Shiny
, "shinyThemes" # more Shiny
, "XLConnect" # read Excel files
, "zip" # read/save zip files
)
#
lapply(pkg, function(x) install.packages(x))
```
Non-CRAN packages have to be installed separately from GitHub using remotes.
```{r Pkg_GitHub, eval=FALSE}
if(!require(remotes)){install.packages("remotes")} #install if needed
# non-CRAN packages
remotes::install_github("jasonelaw/iha", force = TRUE, build_vignettes = TRUE)
remotes::install_github("tsangyp/StreamThermal", force = TRUE, build_vignettes = TRUE)
```
Additionally Pandoc is required for creating the reports and needs to be
installed separately. If you have RStudio installed it comes with Pandoc and
you don't need to install Pandoc separately.
```{r Install_Pandoc, eval=FALSE}
## pandoc
if(!require(installr)){install.packages("installr")} #install if needed
installr::install.pandoc()
```
# Purpose
Built for a project for USEPA for Regional Monitoring Networks (RMN).
Takes as input continuous data from data loggers and QCs it by checking for
gross differences, spikes, rate of change differences, flat line (consecutive
same values), and data gaps. The `ContDataQC` package provides a organized
workflow to QC, aggregate, partition, and generate summary stats.
The code was presented at the following workshops. And further developed under
contract to USEPA.
* Oct 2015, SWPBA (Region 4 regional biologist meeting, Myrtle Beach, SC).
* Mar 2016, AMAAB (Region 3 regional biologist meeting, Cacapon, WV).
* Apr 2016, NWQMC (National Water Monitoring Council Conference, Tampa, FL).
Functions were developed to help data generators handle data from continuous
data sensors (e.g., HOBO data loggers).
From a single function, ContDataQC(), can QC, aggregate, or calculate summary
stats on data. `ContDataQC` Uses the USGS `dataRetrieval` library to get USGS
gage data. Reports are generated in Word (through the use of knitr and Pandoc).
# Usage
Every time R is launched the `ContDataQC` package needs to be loaded.
```{r LoadLibrary, eval=FALSE}
# load library and dependant libraries
require("ContDataQC")
```
The default working directory is based on how R was installed but is typically
the user's 'MyDocuments' folder. You can change it through the menu bar in R
(File - Change dir) or RStudio (Session - Set Working Directory). You can also
change it from the command line.
```{r Directory, eval=FALSE}
# if specify directory use "/" not "\" (as used in Windows) and leave off final "/" (example below).
#myDir.BASE <- "C:/Users/Erik.Leppo/Documents/NCEA_DataInfrastructure/Erik"
myDir.BASE <- getwd()
setwd(myDir.BASE)
```
# Planned Updates
* ~~Spell out "AW"" and other abbreviations (e.g., AirWater). 20170308. On hold.~~
* ~~Gaps in data not always evident in the plots. 20170308.~~
* ~~Use futile.logger to better log output for user. Issue #29. 20170606.~~
* Debug Aggregate operation. 20170919.
* Create CDFs. Similar to code already used in previous analyses by Lei. 20170919.
* PeriodStats(), add number and/or percent of observations above given threshold. 20170919.
* Fix mixed case issue with filenames in "file" versions of QCRaw, Aggregate, and Stats. 20170929.
+ More check data stuff.
+ Update vignette
+ Threshold number or pct on plot
+ Excel file update (see 9/28/2017 email)
* PeriodStats, standardize range of y-axis for each time period.
# Help
Every function has a help file with a working example. There is also a vignette
with descriptions and examples of all functions in the `ContDataQC` library.
```{r Help_Function, eval=FALSE}
# To get help on a function
# library(ContDataQC) # the library must be loaded before accessing help
?ContDataQC
```
To see all available functions in the package use the command below.
```{r Help_Index, eval=FALSE}
# To get index of help on all functions
# library(ContDataQC) # the library must be loaded before accessing help
help(package="ContDataQC")
```
The vignette file is located in the “doc” directory of the library in the R
install folder. Below is the path to the file on my PC. But it is much easier
to use the code below to call the vignette by name. There is also be a link to
the vignette at the top of the help index for the package.
"C:\\Programs\\R\\R-3.4.2\\library\\ContDataQC\\doc\\ContDataQC_Vignette.html"
```{r Help_Vignette, eval=FALSE}
vignette("ContDataQC_Vignette", package="ContDataQC")
```
If the vignette fails to show on your computer. Run the code below to reinstall
the package and specify the creation of the vignette. As noted above if
dependent packages are not installed the vignette will fail to build. See above
for installing packages.
```{r Install_Vignette2, eval=FALSE}
library(remotes)
install_github("leppott/ContDataQC", force=TRUE, build_vignettes=TRUE)
```
## Guide Videos
Guide videos were created for the ContDataQC package and posted on YouTube.
The Powerpoint slides (pptx), R code notebooks (html), and videos are hosted on
a companion GitHub site.
https://github.com/leppott/ContDataQC_Guide
YouTube video links below.
* Introduction
+ https://youtu.be/FJAv7g9GPHI
* Config
+ https://youtu.be/qbUgPczdfdo
* Basic Functions
+ https://youtu.be/zlq1YDPTBsw
* Gage Data
+ https://youtu.be/vXyvp9r2tv4
* Config File Modifications
+ https://youtu.be/LCHnFs-AdXc