From 190a4eb51c5d4b46656df35ee4e860696f19c818 Mon Sep 17 00:00:00 2001 From: Robin van Emden Date: Sat, 25 Jul 2020 16:40:34 +0200 Subject: [PATCH] Regenerate docs and site --- docs/articles/cmabs.html | 2 +- docs/articles/cmabsoffline.html | 2 +- docs/articles/eckles_kaptein.html | 2 +- docs/articles/epsilongreedy.html | 2 +- docs/articles/introduction.html | 2 +- docs/articles/mabs.html | 2 +- docs/articles/ml10m.html | 2 +- docs/articles/offline_depaul_movies.html | 2 +- docs/articles/only_pkgdown/faq.html | 2 +- docs/articles/replication.html | 2 +- docs/articles/simpsons.html | 2 +- docs/articles/sutton_barto.html | 2 +- docs/articles/website_optimization.html | 2 +- docs/news/index.html | 5 +++-- docs/pkgdown.yml | 2 +- docs/reference/EpsilonFirstPolicy.html | 2 +- docs/reference/EpsilonGreedyPolicy.html | 2 +- docs/reference/Exp3Policy.html | 2 +- docs/reference/GradientPolicy.html | 2 +- docs/reference/OfflineDoublyRobustBandit.html | 3 +-- docs/reference/OfflinePropensityWeightingBandit.html | 6 ++---- docs/reference/RandomPolicy.html | 2 +- docs/reference/SoftmaxPolicy.html | 2 +- docs/reference/ThompsonSamplingPolicy.html | 2 +- docs/reference/UCB2Policy.html | 2 +- man/OfflineDoublyRobustBandit.Rd | 5 +---- man/OfflinePropensityWeightingBandit.Rd | 8 ++------ 27 files changed, 31 insertions(+), 40 deletions(-) diff --git a/docs/articles/cmabs.html b/docs/articles/cmabs.html index 0c6469e..91960fc 100644 --- a/docs/articles/cmabs.html +++ b/docs/articles/cmabs.html @@ -129,7 +129,7 @@

Demo: Basic Synthetic cMAB Policies

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/cmabs.Rmd diff --git a/docs/articles/cmabsoffline.html b/docs/articles/cmabsoffline.html index 1f4fd55..61e6c51 100644 --- a/docs/articles/cmabsoffline.html +++ b/docs/articles/cmabsoffline.html @@ -129,7 +129,7 @@

Demo: Offline cMAB LinUCB evaluation

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/cmabsoffline.Rmd diff --git a/docs/articles/eckles_kaptein.html b/docs/articles/eckles_kaptein.html index 1a3f3c2..7832807 100644 --- a/docs/articles/eckles_kaptein.html +++ b/docs/articles/eckles_kaptein.html @@ -129,7 +129,7 @@

Demo: MAB Replication Eckles & Kaptein (Bootstrap Thompson Sampling)

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/eckles_kaptein.Rmd diff --git a/docs/articles/epsilongreedy.html b/docs/articles/epsilongreedy.html index fa97a7f..0045d38 100644 --- a/docs/articles/epsilongreedy.html +++ b/docs/articles/epsilongreedy.html @@ -129,7 +129,7 @@

Demo: Basic Epsilon Greedy

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/epsilongreedy.Rmd diff --git a/docs/articles/introduction.html b/docs/articles/introduction.html index 1ad7f1a..232b650 100644 --- a/docs/articles/introduction.html +++ b/docs/articles/introduction.html @@ -129,7 +129,7 @@

Getting started: running simulations

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/introduction.Rmd diff --git a/docs/articles/mabs.html b/docs/articles/mabs.html index 7fed1ed..2098847 100644 --- a/docs/articles/mabs.html +++ b/docs/articles/mabs.html @@ -129,7 +129,7 @@

Demo: MAB Policies Comparison

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/mabs.Rmd diff --git a/docs/articles/ml10m.html b/docs/articles/ml10m.html index b426dba..0bf94e8 100644 --- a/docs/articles/ml10m.html +++ b/docs/articles/ml10m.html @@ -129,7 +129,7 @@

Demo: MovieLens 10M Dataset

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/ml10m.Rmd diff --git a/docs/articles/offline_depaul_movies.html b/docs/articles/offline_depaul_movies.html index 53beb62..758a729 100644 --- a/docs/articles/offline_depaul_movies.html +++ b/docs/articles/offline_depaul_movies.html @@ -129,7 +129,7 @@

Demo: Offline cMAB: CarsKit DePaul Movie Dataset

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/offline_depaul_movies.Rmd diff --git a/docs/articles/only_pkgdown/faq.html b/docs/articles/only_pkgdown/faq.html index e0598ee..4589128 100644 --- a/docs/articles/only_pkgdown/faq.html +++ b/docs/articles/only_pkgdown/faq.html @@ -129,7 +129,7 @@

Development FAQ

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/only_pkgdown/faq.Rmd diff --git a/docs/articles/replication.html b/docs/articles/replication.html index 9efde24..b75ec7d 100644 --- a/docs/articles/replication.html +++ b/docs/articles/replication.html @@ -129,7 +129,7 @@

Offline evaluation: Replication of Li et al 2010

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/replication.Rmd diff --git a/docs/articles/simpsons.html b/docs/articles/simpsons.html index f1f41e5..d5440b0 100644 --- a/docs/articles/simpsons.html +++ b/docs/articles/simpsons.html @@ -129,7 +129,7 @@

Demo: Bandits, Propensity Weighting & Simpson’s Paradox in R

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/simpsons.Rmd diff --git a/docs/articles/sutton_barto.html b/docs/articles/sutton_barto.html index 9b76839..a8c9f35 100644 --- a/docs/articles/sutton_barto.html +++ b/docs/articles/sutton_barto.html @@ -129,7 +129,7 @@

Demo: Replication Sutton & Barto, Reinforcement Learning: An Introduction, Chapter 2

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/sutton_barto.Rmd diff --git a/docs/articles/website_optimization.html b/docs/articles/website_optimization.html index 14516b3..9a91d05 100644 --- a/docs/articles/website_optimization.html +++ b/docs/articles/website_optimization.html @@ -129,7 +129,7 @@

Demo: Replication of John Myles White, Bandit Algorithms for Website Optimization

Robin van Emden

-

2020-07-24

+

2020-07-25

Source: vignettes/website_optimization.Rmd diff --git a/docs/news/index.html b/docs/news/index.html index 0ef8a71..9b46690 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -180,7 +180,8 @@

@@ -191,7 +192,7 @@

diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 7ff746c..d2142b3 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -15,5 +15,5 @@ articles: simpsons: simpsons.html sutton_barto: sutton_barto.html website_optimization: website_optimization.html -last_built: 2020-07-24T16:53Z +last_built: 2020-07-25T14:34Z diff --git a/docs/reference/EpsilonFirstPolicy.html b/docs/reference/EpsilonFirstPolicy.html index 4a4f64e..3446aa4 100644 --- a/docs/reference/EpsilonFirstPolicy.html +++ b/docs/reference/EpsilonFirstPolicy.html @@ -269,7 +269,7 @@

Examp bandit <- BasicBernoulliBandit$new(weights = weights) agent <- Agent$new(policy, bandit) -history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()

#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:01.111
#> Computing statistics.
+history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:01.161
#> Computing statistics.
plot(history, type = "cumulative")
plot(history, type = "arms")
#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:00.933
#> Computing statistics.
+history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:00.943
#> Computing statistics.
plot(history, type = "cumulative")
plot(history, type = "arms")
diff --git a/docs/reference/Exp3Policy.html b/docs/reference/Exp3Policy.html index 4748d1d..63131f1 100644 --- a/docs/reference/Exp3Policy.html +++ b/docs/reference/Exp3Policy.html @@ -262,7 +262,7 @@

Examp bandit <- BasicBernoulliBandit$new(weights = weights) agent <- Agent$new(policy, bandit) -history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:01.121
#> Computing statistics.
+history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:01.166
#> Computing statistics.
plot(history, type = "cumulative")
plot(history, type = "arms")
diff --git a/docs/reference/GradientPolicy.html b/docs/reference/GradientPolicy.html index 9a060bf..d468ae1 100644 --- a/docs/reference/GradientPolicy.html +++ b/docs/reference/GradientPolicy.html @@ -256,7 +256,7 @@

Examp bandit <- BasicBernoulliBandit$new(weights = weights) agent <- Agent$new(policy, bandit) -history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:00.963
#> Computing statistics.
+history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
#> Simulation horizon: 100
#> Number of simulations: 100
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:01.001
#> Computing statistics.
plot(history, type = "cumulative")
plot(history, type = "arms")
diff --git a/docs/reference/OfflineDoublyRobustBandit.html b/docs/reference/OfflineDoublyRobustBandit.html index 816cc5a..3b30891 100644 --- a/docs/reference/OfflineDoublyRobustBandit.html +++ b/docs/reference/OfflineDoublyRobustBandit.html @@ -190,7 +190,7 @@

Usage

bandit &lt;- OfflineDoublyRobustBandit(formula, data, k = NULL, d = NULL, unique = NULL, shared = NULL, - inverted = FALSE, randomize = TRUE) + randomize = TRUE)

Arguments

@@ -217,7 +217,6 @@

Arg
jitter

logical; add jitter to contextual features (optional, default: FALSE)

unique

integer vector; index of disjoint features (optional)

shared

integer vector; index of shared features (optional)

-
inverted

logical; have the propensities been inverted (1/p) or not (p)?

threshold

float (0,1); Lower threshold or Tau on propensity score values. Smaller Tau makes for less biased estimates with more variance, and vice versa. For more information, see paper by Strehl at all (2010). Values between 0.01 and 0.05 are known to work well.

diff --git a/docs/reference/OfflinePropensityWeightingBandit.html b/docs/reference/OfflinePropensityWeightingBandit.html index ab38595..c7aebfd 100644 --- a/docs/reference/OfflinePropensityWeightingBandit.html +++ b/docs/reference/OfflinePropensityWeightingBandit.html @@ -191,8 +191,7 @@

Usage

data, k = NULL, d = NULL, unique = NULL, shared = NULL, randomize = TRUE, replacement = TRUE, - jitter = TRUE, arm_multiply = TRUE, - inverted = FALSE) + jitter = TRUE, arm_multiply = TRUE)

Arguments

@@ -216,7 +215,6 @@

Arg
replacement

logical; sample with replacement (optional, default: TRUE)

jitter

logical; add jitter to contextual features (optional, default: TRUE)

arm_multiply

logical; multiply the horizon by the number of arms (optional, default: TRUE)

-
inverted

logical; have the propensity scores been weighted (optional, default: FALSE)

threshold

float (0,1); Lower threshold or Tau on propensity score values. Smaller Tau makes for less biased estimates with more variance, and vice versa. For more information, see paper by Strehl at all (2010). Values between 0.01 and 0.05 are known to work well.

@@ -242,7 +240,7 @@

Methods
new(formula, data, k = NULL, d = NULL, unique = NULL, shared = NULL, randomize = TRUE, - replacement = TRUE, jitter = TRUE, arm_multiply = TRUE, inverted = FALSE)

generates and instantializes a new OfflinePropensityWeightingBandit instance.

+ replacement = TRUE, jitter = TRUE, arm_multiply = TRUE)

generates and instantializes a new OfflinePropensityWeightingBandit instance.

get_context(t)

argument:

    diff --git a/docs/reference/RandomPolicy.html b/docs/reference/RandomPolicy.html index 0e5d9d2..55889f4 100644 --- a/docs/reference/RandomPolicy.html +++ b/docs/reference/RandomPolicy.html @@ -257,7 +257,7 @@

    Examp bandit <- BasicBernoulliBandit$new(weights = weights) agent <- Agent$new(policy, bandit) -history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:00.833
    #> Computing statistics.
    +history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:00.821
    #> Computing statistics.
    plot(history, type = "arms")
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:00.968
    #> Computing statistics.
    +history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:00.950
    #> Computing statistics.
    plot(history, type = "cumulative")
    plot(history, type = "arms")
    diff --git a/docs/reference/ThompsonSamplingPolicy.html b/docs/reference/ThompsonSamplingPolicy.html index dfbe250..1f89d72 100644 --- a/docs/reference/ThompsonSamplingPolicy.html +++ b/docs/reference/ThompsonSamplingPolicy.html @@ -265,7 +265,7 @@

    Examp bandit <- BasicBernoulliBandit$new(weights = weights) agent <- Agent$new(policy, bandit) -history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:01.218
    #> Computing statistics.
    +history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:01.197
    #> Computing statistics.
    plot(history, type = "cumulative")
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:01.237
    #> Computing statistics.
    +history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
    #> Simulation horizon: 100
    #> Number of simulations: 100
    #> Number of batches: 1
    #> Starting main loop.
    #> Finished main loop.
    #> Completed simulation in 0:00:01.194
    #> Computing statistics.
    plot(history, type = "cumulative")
    plot(history, type = "arms")
    diff --git a/man/OfflineDoublyRobustBandit.Rd b/man/OfflineDoublyRobustBandit.Rd index 9e3901d..7ab6d59 100644 --- a/man/OfflineDoublyRobustBandit.Rd +++ b/man/OfflineDoublyRobustBandit.Rd @@ -12,7 +12,7 @@ Bandit for the doubly robust evaluation of policies with offline data. bandit <- OfflineDoublyRobustBandit(formula, data, k = NULL, d = NULL, unique = NULL, shared = NULL, - inverted = FALSE, randomize = TRUE) + randomize = TRUE) } } @@ -57,9 +57,6 @@ integer vector; index of disjoint features (optional) \item{\code{shared}}{ integer vector; index of shared features (optional) } -\item{\code{inverted}}{ -logical; have the propensities been inverted (1/p) or not (p)? -} \item{\code{threshold}}{ float (0,1); Lower threshold or Tau on propensity score values. Smaller Tau makes for less biased estimates with more variance, and vice versa. For more information, see paper by Strehl at all (2010). diff --git a/man/OfflinePropensityWeightingBandit.Rd b/man/OfflinePropensityWeightingBandit.Rd index 55f9d79..61e21f8 100644 --- a/man/OfflinePropensityWeightingBandit.Rd +++ b/man/OfflinePropensityWeightingBandit.Rd @@ -13,8 +13,7 @@ Policy for the evaluation of policies with offline data through replay with prop data, k = NULL, d = NULL, unique = NULL, shared = NULL, randomize = TRUE, replacement = TRUE, - jitter = TRUE, arm_multiply = TRUE, - inverted = FALSE) + jitter = TRUE, arm_multiply = TRUE) } } @@ -54,9 +53,6 @@ logical; add jitter to contextual features (optional, default: TRUE) \item{\code{arm_multiply}}{ logical; multiply the horizon by the number of arms (optional, default: TRUE) } -\item{\code{inverted}}{ -logical; have the propensity scores been weighted (optional, default: FALSE) -} \item{\code{threshold}}{ float (0,1); Lower threshold or Tau on propensity score values. Smaller Tau makes for less biased estimates with more variance, and vice versa. For more information, see paper by Strehl at all (2010). @@ -90,7 +86,7 @@ integer vector; index of shared features (optional) \describe{ \item{\code{new(formula, data, k = NULL, d = NULL, unique = NULL, shared = NULL, randomize = TRUE, - replacement = TRUE, jitter = TRUE, arm_multiply = TRUE, inverted = FALSE)}}{ + replacement = TRUE, jitter = TRUE, arm_multiply = TRUE)}}{ generates and instantializes a new \code{OfflinePropensityWeightingBandit} instance. } \item{\code{get_context(t)}}{