Pole Balancer

About Pole Balancer

Pole Balancer is a Python program that uses reinforcement learning (RL) to automatically design a policy for the classic controls problem of a cart balancing a pole. Through Markov decision processes framework, we can perform reinforcement learning without having any explicit knowledge of the physics of the underlying system, in our case, the pole on the cart.

Requirements

Ubuntu 18.04+, macOS 10.15+ and Windows 10+ (64-bit)
At least 5GB of memory
Anaconda/Miniconda
Python 3.6 or above
A Python IDE (Jupyter/PyCharm)

Getting Started

Install the following Python packages:

matplotlib
numpy
scipy
pillow

Clone

git clone https://github.com/avrumnoor/PoleBalancer.git

Run

python polebalancer.py

Model

A thin pole is hinged to a cart. The cart moves laterally on a smooth table surface. The program fails if either the angle of the pole deviates by more than a particular amount from the vertical position (i.e., if the pole falls over), or if the cart’s position goes out of bounds (i.e., if it falls off the end of the table).

Program Objective

Balance the pole with these constraints, by appropriately having the cart accelerate left and right.

Algorithm

Estimate a model (i.e., transition probabilities and rewards) for the underlying MDP.
Obtain a value function by solving Bellman’s equations for this estimataion to obtain a value function.
Act greedily with respect to this value function.
Initially, each state has estimated reward zero, and the estimated transition probabilities are uniform.
As the program goes along taking actions, it will gather observations on transitions and rewards, which it can use to get a better estimate of the MDP model.
Store the state transitions and reward observations each time, and update the model and value function/policy only periodically.
Each time a failure occurs, re-estimate the transition probabilities and rewards as the average of the observed values (if any).
Repeat previous steps until convergence (once several consecutive attempts (defined by the parameter NO LEARNING THRESHOLD) to solve Bellman’s equation all converge in the first iteration since this implies that the estimated model has stopped changing significantly).

Results

Author

Avrum Noor

Acknowledgements

Anand Avati

Stanford Machine Learning Coursework

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
images		images
LICENSE		LICENSE
README.md		README.md
env.py		env.py
polebalancer.py		polebalancer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pole Balancer

About Pole Balancer

Requirements

Getting Started

Model

Program Objective

Algorithm

Results

Author

Acknowledgements

About

Releases

Packages

Languages

License

avrumnoor/pole-balancer

Folders and files

Latest commit

History

Repository files navigation

Pole Balancer

About Pole Balancer

Requirements

Getting Started

Model

Program Objective

Algorithm

Results

Author

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages