Pytorch implimentation of proximal policy optimization with clipped objective function and generalized advantage estimation. The model is trained on Humanoid, hopper, ant and halfcheetah pybullet environment. To run multiple environments in multiple threads, SubprocVecEnv class from stable baseline is used (file included).
HumanoidBulletEnv-v0 | CheetahBulletEnv-v0 |
---|---|
HopperBulletEnv-v0 | AntBulletEnv-v0 |
Important command line arguments :
--env
environment name (note : works only for continuous pybullet environments)
--learn
agent starts training
--play
agent plays using pretrained model
-n_workers
number of environments
-load
continues training from given checkpoint
-model
load the model or checkpoint
-ppo_steps
number of steps before update
-epochs
number of updates
-mini_batch
batch size during ppo update
-lr
policy and critic learning rate
-c1
critic discount
-c2
entropy beta
To train the agent:
# train new agent
python agent.py --learn --env <ENV_ID>
# load checkpoints
python agent.py --learn --env <ENV_ID> -load -model <CHECKPOINT PATH>
To Play:
python agent.py --play --env <ENV_ID> -model <MODEL PATH>