ReinforcementLearning

Blackjack project.

GLIE Monte Carlo Control

Algorithm

GLIE Monte Carlo Control is model-free algorithm for solving an episodic MDP. It works in theory but in practice it is not efficient when there are a lot of state-action pairs.

For a given MDP we assume we know how to sample episodes from that MDP and observe rewards. This algorithm finds the best policy to follow.

Implementation

Algorithm is implmented in function GLIE_MC_control in script ControlFunctions.R with arguments:

states - list of strings representing states: states = c("A")
actions - list of strings representing actions (in this implementation it is assumed that in each state we can do the same actions): actions = c("l", "r", "d")
simulator - function that for a given policy samples one episode from the environment. Returns named list with 3 vectors. states, actions, rewards. See the example bellow.
n_sim - number of episodes to simulate
epsilon_fun - function which controls epsilon decay depending on the episode number k. Must converge to 0 when k goes to infinity: function(k) {1/k}. Must always be between $<0,1>$. In the function, k goes from 1 to n_sim.

### Example

states = c("A")
actions = c("l", "r", "d")

simple_mdp = function(policy) {
  episode = list()
  episode$states = "A"
  episode$actions = sample(colnames(policy),1, prob = policy[1,])
  
  if(episode$actions == "r"){
    episode$rewards = 5
  } else if(episode$actions == "l") {
    episode$rewards = -5
  } else {
    episode$rewards = sample(c(0, 100),1, prob = c(0.9, 0.1))
  }
  return(episode)
}

GLIE_MC_control(states, actions, simple_mdp, n_sim = 1000, epsilon_fun = function(k) {1/sqrt(k)})
GLIE_MC_control(states, actions, simple_mdp, n_sim = 1000, epsilon_fun = function(k) {1/k})

`policy` argument of simulator function

policy is matrix with rows representing states and columns representing actions. Rows sum to 1. When in a given state simulator must sample action given the distribution provided with correspoding row. Easy way to do that in R is

sample(colnames(policy),1, prob = policy[current_state,])

For example, in the first step algorithm creates random uniform policy:

  # Uniform random policy
  policy = matrix(rep(1/n_A, n_S*n_A), nrow = n_S, ncol = n_A)
  rownames(policy) = states
  colnames(policy) = actions

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
CodeForLearningTheory		CodeForLearningTheory
.gitignore		.gitignore
ControlFunctions.R		ControlFunctions.R
Learning_MultArmBandit.R		Learning_MultArmBandit.R
Learning_SimpleBlackjack.R		Learning_SimpleBlackjack.R
README.md		README.md
SimpleBlackjackResults.RData		SimpleBlackjackResults.RData
Simulator.R		Simulator.R
Test_Simulator.R		Test_Simulator.R
VisualizeResults.R		VisualizeResults.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReinforcementLearning

GLIE Monte Carlo Control

Algorithm

Implementation

`policy` argument of simulator function

About

Releases

Packages

Contributors 2

Languages

eliobartos/ReinforcementLearning

Folders and files

Latest commit

History

Repository files navigation

ReinforcementLearning

GLIE Monte Carlo Control

Algorithm

Implementation

policy argument of simulator function

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`policy` argument of simulator function

Packages