Skip to content

Bootstrap and Monte Carlo Hypothesis Testing with R

License

Notifications You must be signed in to change notification settings

ntguardian/MCHT

Repository files navigation

MCHT

Version 0.1.0

GitHub issues GitHub forks GitHub stars License Github All Releases

MCHT is a package implementing an interface for creating and using Monte Carlo tests. The primary function of the package is MCHTest(), which creates functions with S3 class MCHTest that perform a Monte Carlo test.

Installation

MCHT is not presently available on CRAN. You can download and install MCHT from GitHub using devtools via the R command devtools::install_github("ntguardian/MCHT").

Monte Carlo Hypothesis Testing

Monte Carlo testing is a form of hypothesis testing where the -values are computed using the empirical distribution of the test statistic computed from data simulated under the null hypothesis. These tests are used when the distribution of the test statistic under the null hypothesis is intractable or difficult to compute, or as an exact test (that is, a test where the distribution used to compute -values is appropriate for any sample size, not just large sample sizes).

Suppose that is the observed value of the test statistic and large values of are evidence against the null hypothesis; normally, -values would be computed as , where is the cumulative distribution functions and is the random variable version of . We cannot use for some reason; it's intractable, or the provided is only appropriate for large sample sizes.

Instead of using we will use , which is the empirical CDF of the same test statistic computed from simulated data following the distribution prescribed by the null hypothesis of the test. For the sake of simplicity in this presentation, assume that is a continuous random variable. Now our -value is , where where is the indicator function and is an independent random copy of computed from simulated data with a sample size of .

The power of these tests increase with (see [1]) but modern computers are able to simulate large quickly, so this is rarely an issue. The procedure above also assumes that there are no nuisance parameters and the distribution of can effectively be known precisely when the null hypothesis is true (and all other conditions of the test are met, such as distributional assumptions). A different procedure needs to be applied when nuisance parameters are not explicitly stated under the null hypothesis. [2] suggests a procedure using optimization techniques (recommending simulated annealing specifically) to adversarially select values for nuisance parameters valid under the null hypothesis that maximize the -value computed from the simulated data. This procedure is often called maximized Monte Carlo (MMC) testing. That is the procedure employed here. (In fact, the tests created by MCHTest() are the tests described in [2].) Unfortunately, MMC, while conservative and exact, has much less power than if the unknown parameters were known, perhaps due to the behavior of samples under distributions with parameter values distant from the true parameter values (see [3]).

Bootstrap Hypothesis Testing

Bootstrap statistical testing is very similar to Monte Carlo testing; the key difference is that bootstrap testing uses information from the sample. For example a parametric bootstrap test would estimate the parameters of the distribution the data is assumed to follow and generate datasets from that distribution using those estimates as the actual parameter values. A permutation test (like Fisher's permutation test; see [4]) would use the original dataset values but randomly shuffle the labeles (stating which sample an observation belongs to) to generate new data sets and thus new simulated test statistics. -values are essentially computed the same way.

Unlike Monte Carlo tests and MMC, these tests are not exact tests. That said, they often have good finite sample properties. (See [3].)

See the documentation for more details and references.

Examples

MCHTest() is the main function of the package and can create functions with S3 class MCHTest that perform Monte Carlo hypothesis tests.

For example, the code below creates the Monte Carlo equivalent of a -test.

library(MCHT)
#> .------..------..------..------.
#> |M.--. ||C.--. ||H.--. ||T.--. |
#> | (\/) || :/\: || :/\: || :/\: |
#> | :\/: || :\/: || (__) || (__) |
#> | '--'M|| '--'C|| '--'H|| '--'T|
#> `------'`------'`------'`------' v. 0.1.0
#> Type citation("MCHT") for citing this R package in publications
library(doParallel)
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel

registerDoParallel(detectCores())  # Necessary for parallelization, and if not
                                   # done the resulting function will complain
                                   # on the first use

ts <- function(x, mu = 0) {sqrt(length(x)) * (mean(x) - mu)/sd(x)}
sg <- function(x, mu = 0) {
  x <- x + mu
  ts(x)
}
rg <- rnorm

mc.t.test <- MCHTest(ts, sg, rg, seed = 20181001, test_params = "mu", 
                     lock_alternative = FALSE,
                     method = "Monte Carlo One Sample t-Test")

The object mc.t.test() is an S3 class, and a callable function.

class(mc.t.test)
#> [1] "MCHTest"

print() will print relevant information about the construction of the test.

mc.t.test
#> 
#> 	Details for Monte Carlo One Sample t-Test
#> 
#> Seed:  20181001 
#> Replications:  10000 
#> Tested Parameters:  mu 
#> Default mu:  0 
#> 
#> Memoisation enabled

Once this object is created, we can use it for performing hypothesis tests.

dat <- c(2.3, -0.13, 1.42, 1.51, 3.43, -0.96, 0.59, 0.62, 1.28, 4.07)

t.test(dat, mu = 1, alternative = "two.sided")  # For reference
#> 
#> 	One Sample t-test
#> 
#> data:  dat
#> t = 0.84975, df = 9, p-value = 0.4175
#> alternative hypothesis: true mean is not equal to 1
#> 95 percent confidence interval:
#>  0.3135303 2.5124697
#> sample estimates:
#> mean of x 
#>     1.413
mc.t.test(dat, mu = 1)
#> Loading required package: rngtools
#> Loading required package: pkgmaker
#> Loading required package: registry
#> 
#> Attaching package: 'pkgmaker'
#> The following object is masked from 'package:base':
#> 
#>     isFALSE
#> 
#> 	Monte Carlo One Sample t-Test
#> 
#> data:  dat
#> S = 0.84975, p-value = 0.9885
mc.t.test(dat, mu = 1, alternative = "two.sided")
#> 
#> 	Monte Carlo One Sample t-Test
#> 
#> data:  dat
#> S = 0.84975, p-value = 0.023
#> alternative hypothesis: true mu is not equal to 1

This is the simplest example; MCHTest() can create more involved Monte Carlo tests. See other documentation for details.

Planned Future Features

  • A function for making diagnostic-type plots for tests, such as a function creating a plot for the rejection probability function (RPF) as described in [5]
  • A function that accepts a MCHTest-class object and returns a function that, rather than returning a htest-class object, returns a function that will give the test statistic, simulated test statistics, and a -value, in a list; could be useful for diagnostic work.

References

  1. A. C. A. Hope, A simplified Monte Carlo test procedure, JRSSB, vol. 30 (1968) pp. 582-598
  2. J-M Dufour, Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics, Journal of Econometrics, vol. 133 no. 2 (2006) pp. 443-477
  3. J. G. MacKinnon, Bootstrap hypothesis testing in Handbook of computational econometrics (2009) pp. 183-213
  4. R. A. Fisher, The design of experiments (1935)
  5. R. Davidson and J. G. MacKinnon, The size distortion of bootstrap test, Econometric Theory, vol. 15 (1999) pp. 361-376

Releases

No releases published

Packages

No packages published