This python package integrates V-REP robot simulation software, base libraries for NAO robot control along with reinforcement learning algorithms for solving custom or any OpenAI-gym-based learning environments.
- Parallelized Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C) for training agents.
- Custom OpenAI-gym-based API for controlling V-REP that makes it easy to create new learning tasks and environments (50 - 100 LOC)
- Learned policies can be transferred back to the real robot, or learning can be done online (not recommended).
- Custom learning environments for the NAO robot:
- VREP v3.4.0 - Robot simulation software by Copellia Robotics
- Python 2.7 and python-virtualenv.
- tensorflow, gym, numpy, opencv-python
-
Choregraphe Suite v2.1.2 - for creating a virtual NAO (requires registering) (By default installed in /opt/Aldebaran/Choregraphe Suite 2.1)
-
Python NAOqi SDK v2.1.2 - Libraries provided by Softbank robotics for NAO control (requires registering)
(Tested on Ubuntu 18.04)
1. Clone the repository
git clone https://github.com/andriusbern/nao_rl
cd nao-rl
2. Create and activate the virtual environment
virtualenv env
source env/bin/activate
3. Install the package and the required libraries
python setup.py install
You will be prompted to enter the path to your V-Rep installation directory
To try the environments out (V-Rep will be launched with the appropriate scene and agent loaded, actions will be sampled randomly):
import nao_rl
env = nao_rl.make('env_name')
env.run()
Where 'env_name' corresponds to one of the following available environments:
- NaoTracking - tracking an object using the camera information
- NaoBalancing - keeping upright balance
- NaoWalking - learning a bipedal gait
To train the agents in these environments you can use built-in RL algorithms:
python train.py NaoTracking a3c 0
Live plotting of training results. Sped up by 40x (enabled with flag '-p'). |
- Environment name: name one of nao_rl or gym environments
- Training algorithm:
- --a3c - Asynchronous Advantage Actor-Critic or
- --ppo - Proximal Policy Optimization
- Rendering mode:
- [0] - Do not render
- [1] - Render the first worker
- [2] - Render all workers
To find out more about additional command line arguments:
python train.py -h
The training session can be interrupted at any time and the model is going to be saved and can be loaded later.
To test trained models:
python test.py trained_models/filename.cpkt
Add -r flag to run the trained policy on the real NAO (can be dangerous). It is recommended to set low fps for the environment e.g. (the robot will perform actions slowly):
python test.py trained_models/filename.cpkt -r -fps 2