Selected algorithms and exercises from the book Sutton, R. S. & Barton, A.: Reinforcement Learning: An Introduction. 2nd Edition, MIT Press, Cambridge, 2018.
-
Results of experiments are dumped to hdf5 files and are placed in .dump directory.
-
Gathered data are by default used to present various charts about experiment.
git clone https://github.com/ocraft/rl-sandbox.git cd rl-sandbox pip install -e .
python setup.py test
python -m rlbox.run --testbed=narmedbandit.SampleAverage
Param | Description | Default |
---|---|---|
--testbed |
[required] Name of a testbed that you want to use. |
None |
--start |
Run experiment using a chosen testbed. |
true |
--plot |
Plot data that was generated with a chosen testbed. |
true |
--help |
Show a list of all flags. |
false |
Section | Run | Output |
---|---|---|
python -m rlbox.run --testbed=narmedbandit.SampleAverage |
||
python -m rlbox.run --testbed=narmedbandit.WeightedAverage |
||
python -m rlbox.run --testbed=narmedbandit.OptInitVal |
||
python -m rlbox.run --testbed=narmedbandit.Ucb |
||
python -m rlbox.run --testbed=narmedbandit.Gradient |
||
python -m rlbox.run --testbed=narmedbandit.ParamStudy |
||
python -m rlbox.run --testbed=car_rental_v1 |
||
python -m rlbox.run --testbed=car_rental_v2 |
||
python -m rlbox.run --testbed=gambler.0.4 |
||
python -m rlbox.run --testbed=gambler.0.25 |
||
python -m rlbox.run --testbed=gambler.0.55 |
||
python -m rlbox.run --testbed=racetrack |
||
python -m rlbox.run --testbed=gridworld.windy |
||
python -m rlbox.run --testbed=gridworld.windy_stochastic |
||
python -m rlbox.run --testbed=gridworld.NStepSarsa |
||
python -m rlbox.run --testbed=maze.DynaQ |
||
python -m rlbox.run --testbed=maze.DynaQ+ |
||
10.1 Episodic Semi-gradient Control#Example 10.1: Mountain Car Task |
python -m rlbox.run --testbed=mountain_car.SemiGradientSarsa |
|
python -m rlbox.run --testbed=mountain_car.TrueSarsaLambda |
||
python -m rlbox.run --testbed=mountain_car.ActorCritic |
Testbed | Environment | Exe | Time [s] |
---|---|---|---|
narmedbandit.SampleAverage |
|
|
11 |
narmedbandit.WeightedAverage |
|
|
78 |
narmedbandit.OptInitVal |
|
|
7.51 |
narmedbandit.Ucb |
|
|
11.78 |
narmedbandit.Gradient |
|
|
105 |
narmedbandit.ParamStudy |
|
|
303 |
carrental.JackCarRentalV1 |
|
|
|
carrental.JackCarRentalV2 |
|
|
|
gambler.0.4 |
|
|
22 |
gambler.0.25 |
|
|
16 |
gambler.0.55 |
|
|
11 |
racetrack |
|
|
|
gridworld.windy |
|
|
0.05 |
gridworld.windy_stochastic |
|
|
0.33 |
gridworld.NStepSarsa |
|
|
0.32 |
gridworld.NStepSarsa |
|
|
0.32 |
maze.DynaQ |
|
|
18 |
maze.DynaQ+ |
|
|
29 |
mountain_car.SemiGradientSarsa |
|
|
52 |
mountain_car.TrueSarsaLambda |
|
|
51 |
mountain_car.ActorCriticLambda |
|
|
115 |