Skip to content

Compute the Pearson correlation to be used in Gaussian copulas

License

Notifications You must be signed in to change notification settings

adknudson/PearsonCorrelationMatch.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PearsonCorrelationMatch

Aqua QA

Determine the correlation coefficient for a bivariate Gaussian copula so that the resulting samples following a Normal-to-anything (NORTA) step have the desired correlation.

This package is based on the paper by Xiao and Zhou, Matching a correlation coefficient by a Gaussian copula.

API

Only two methods are exported:

  • pearson_bounds
    • Determines the range of admissible Pearson correlations between two distributions
  • pearson_match
    • Computes the Pearson correlation coefficient to be used in a bivariate Gaussian copula

Usage

using PearsonCorrelationMatch, Distributions
using StatsFuns

p = 0.4 # target correlation

# Distributions can be continuous, discrete, or a mix
F = Gamma()
G = NegativeBinomial(20)

# estimate the pearson correlation bounds
pearson_bounds(F, G)
# (lower = -0.8385297744531974, upper = 0.9712817585733178)

# calculate the required input correlation
p_star = pearson_match(p, F, G)
# 0.4361868405991995

# apply the NORTA step
D = MultivariateNormal([0, 0], [1 p_star; p_star 1])
Z = rand(D, 1_000_000)
U = normcdf.(Z)
X1 = quantile.(F, U[1,:])
X2 = quantile.(G, U[2,:])

cor(X1, X2)
# 0.40007047985609534

Details

It is highly recommended that any user of this package reads the reference paper first. The algorithm uses Gauss-Hermite polynomial interpolation to approximate the solution of the double integral. Target correlations near the Frechet bounds may be highly sensitive to the degree of the Hermite polynomial.

Related Packages

While this package focuses on the bivariate case, it can be used to compute the input correlations between all pairs of marginal distributions. However, the resulting adjusted correlation matrix may not be positive definite. In that case, you can use the NearestCorrelationMatrix.jl library to fix the adjusted correlation matrix before applying the NORTA step.

References

  • Xiao, Q., & Zhou, S. (2019). Matching a correlation coefficient by a Gaussian copula. Communications in Statistics-Theory and Methods, 48(7), 1728-1747.