An R package for D-vine copula based quantile regression using bivariate conditional copulas as described in the references.
It depends on the R-packages:
-
VineCopula,
gamCopula:
Including bivariate conditional copulas, where Kendall’s
$\tau$ or the copula parameter is linked to a partially linear model using GAM and/or spline smoothing. The bivariate conditional copula family set for modeling the dependencies consists of the Gaussian, Student-t, double Clayton type I-IV and double Gumbel type I-IV copula as implemented in the R-package gamCopula. The double Clayton and Gumbel copula type consist of additional rotated versions of the Clayton and Gumbel copula, respectively to cover negative dependence as well. - gamlss: Margins can be selected from a large set of parametric distribution families using generalized additive models for location, scale and shape (GAMLSS).
- gamlss.tr: GAMLSS extension to allow for truncated distribution families.
- gamlss.cens: GAMLSS extension to allow for censored distribution families.
- gamlss.dist: GAMLSS extension to allow for distribution families with transformed response.
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("jobstdavid/gamvinereg")
Below is an overview of the most important functions:
-
gamvinereg
: selects most relevant predictor variables from the given data to make predictions based on a D-vine GAM copula. It returns an object of classgamvinereg
. The class has the following methods:print
: a brief overview of the model statistics.summary
: a more detailed overview of the selected response margin distribution and bivariate conditional copulas.
-
predict
: make predictions at a given quantile level. -
cpit
: calculate the conditional probabililty integral transformed (PIT) values for the response given the predictor variables. -
cpdf
: calculate the conditional density values for the response given the predictor variables. -
gam_control
: configuration of the bivariate conditional copula.
# load package
library(gamvinereg)
#> Loading required package: gamlss
#> Loading required package: splines
#> Loading required package: gamlss.data
#>
#> Attaching package: 'gamlss.data'
#> The following object is masked from 'package:datasets':
#>
#> sleep
#> Loading required package: gamlss.dist
#> Loading required package: nlme
#> Loading required package: parallel
#> ********** GAMLSS Version 5.4-22 **********
#> For more on GAMLSS look at https://www.gamlss.com/
#> Type gamlssNews() to see new features/changes/bug fixes.
#> Loading required package: gamlss.tr
#> Loading required package: gamlss.cens
#> Loading required package: survival
# load data for station Hannover
data(station)
## generate margin distributions
# generate left-truncated at 0 normal distribution
gen.trun(par = 0, family = "NO", type = "left")
#> A truncated family of distributions from NO has been generated
#> and saved under the names:
#> dNOtr pNOtr qNOtr rNOtr NOtr
#> The type of truncation is left
#> and the truncation parameter is 0
# generate left-censored normal distribution
gen.cens(family = "NO", type = "left")
#> A censored family of distributions from NO has been generated
#> and saved under the names:
#> dNOlc pNOlc qNOlc NOlc
#> The type of censoring is left
# generate log-normal distribution
gen.Family(family = "NO", type = "log")
#> A log family of distributions from NO has been generated
#> and saved under the names:
#> dlogNO plogNO qlogNO rlogNO logNO
## margin specifications to select from
margins <- list(
# margin for variable obs (normal distribution)
list(list(family = NO(),
mu.formula = obs ~ sin1 + cos1,
sigma.formula = ~ sin1 + cos1)),
# margin for variable t2m_mean (normal distribution)
list(list(family = NO(),
mu.formula = t2m_mean ~ sin1 + cos1,
sigma.formula = ~ sin1 + cos1)),
# margins for variable t2m_sd (log-normal, left-truncated/censored at 0 normal distribution)
list(list(family = logNO(),
mu.formula = t2m_sd ~ sin1 + cos1,
sigma.formula = ~ sin1 + cos1),
list(family = NOtr(),
mu.formula = t2m_sd ~ sin1 + cos1,
sigma.formula = ~ sin1 + cos1),
list(family = NOlc(),
mu.formula = Surv(t2m_sd, t2m_sd > 0, type = "left") ~ sin1 + cos1,
sigma.formula = ~ sin1 + cos1))
)
# fit gamvinereg with time-dependent linear correlation model
(object <- gamvinereg(formula = obs ~ t2m_mean + t2m_sd,
data = station,
control = gam_control(formula = ~ sin1 + cos1),
margins = margins))
#> D-vine regression model: obs | t2m_mean, t2m_sd
#> GAM copula regression model: tau | sin1 + cos1
#> nobs = 2191, edf = 13, cll = -3454.46, caic = 6934.92, cbic = 7008.92
# predict median
med <- predict(object, newdata = station, alpha = 0.5)
# predict deciles
dec <- predict(object, newdata = station, alpha = 1:9/10)
# calculate conditional PIT values
u <- cpit(object, newdata = station)
# should be approximately uniform
hist(u, probability = TRUE)
# calculate conditional density values
d <- cpdf(object)
# calculate log-likelihood
sum(log(d))
#> [1] -3454.46
Feel free to contact jobstd@uni-hildesheim.de if you have any questions or suggestions.
Jobst, D., Möller, A., and Groß, J. (2023). D-Vine GAM Copula based Quantile Regression with Application to Ensemble Postprocessing. doi: 10.48550/arXiv.2309.05603.