This is a Reinforcement Learning Control simulation platform that integrates many different Algorithms and Environment together. It does not require some complicated python packages, for example, gym, mujoco, casadi, acado, et al. This platform only requires pytorch, opencv-python, numpy, matplotlib, et al., all of which are easy to install and config. The objective is to allow developers to create their own environments and algorithms totally by themselves and would not blocked by some other python packages. Recently, we have updated this platform.
In a nutshell! We rewrite the whole platform!
- All environments have been rewritten. We remove model description files from environments, which are useless. We further standardized some basic functions and parameters to enhance it portability.
- All algorithms have been rewritten. We remove many redundant modules in our previous version. For example, network saving functions, network loading functions, and some variables defined but never used.
- All basic classes have been rewritten. We remove some useless classes in /utils/classes.
- All demonstrations have been re-trained. The demos we have already trained are not standardized. Therefore, we retrain all of them.
Very easy, you even cannot use "install" to define this operation. You can firstly install nothing, just download this platform, run it, and then see what packages you need to install. Mainly, pytorch (gpu version, cpu is also ok for now), opencv-python, numpy, pandas, matplotlib.
This platform currently consists of four parts, namely, algorithm, demonstration, environment, and utils.
Nothing special, just some commonly used classes and functions.
All RL algorithms we implemented for now.
Algorithm | classification | Description |
---|---|---|
DQN | value based | None |
Dueling DQN | value based | None |
Double DQN | value based | None |
DDPG | actor-critic | None |
TD3 | actor-critic | update of DDPG |
SAC | actor-critic | None |
PPO | policy based | None |
DPPO | policy based | multi process PPO |
PPO2 | policy based | PPO with gae and some tricks |
DPPO2 | policy based | multi process PPO2 |
Noting that the code runs actually pretty fast. We might don't choose a proper mp4-2-gif tool, and that is why the gifs shown below are all low frame rate. One can directly see gif files in the 'gifs' folder. Or one can run 'test.py' in each environment to generate a mp4 file (or jsut see the animation of the environment). Again, it is not responsible to say out platform is very very fast, but we can say it is not slow (or fast enough for us to use). ^_^
A ball is put on a beam, and the beam is controlled by a two-link manipulator. The objective is to make the ball hold at the center of the beam by adjusting the joints' angle of the of manipulator.
A cartpole, nothing special. Two environments are integrated. The first is angluar only, the objectove is to balance the pole by adjusting the force added on the cart. The objective of the second is to control both the angle of the cart and the position of the pole.
CartPole with both angle and position
CartPole with angle only
This is an intermediate fixed rod with free rotational capability. We need to keep the rod in a horizontal position by adjusting the force added at the end of the rod.
A two-link manipulator. The objective is to control the position of the end of the manipulator.
A mass point, the control input is the two-dimensional accelection, we need to control its position.
A quadrotor has already controlled by fast nonsingular terminal sliding mode controller (FNTSMC). We use RL to automatically tune the hyper-parameters of the FNTSMC to achieve a better tracking performance. Also, we have position tracking mode and attitude tracking mode.
Attitude tracking controller
Position tracking controller
Just quadrotor fixed-point control. The difference between UavFntsmcParam and UavRobust is that UavRobust directly use RL to control the quadrotor while UavFntsmcParam utilizes RL to optimize the hyper-parameters of FNTSMC.
Graphical demonstration is identical to UavFntsmcParam.
A ground vehicle, the control outputs are the expected linear and angular accelections. The objective is to control the position of the UGV.
UGV drives only forward
UGV drives both forward and backward
A ground vehicle, the control outputs are the expected linear and angular accelections. The states are those in environment 9 (10) and the data of laser.
All demos are classified by RL algorithms. For example, in folder SAC, all environments are controled by a SAC-trained NN controller. Currently, we have: 3 for DDPG, 2 for DoubleDQN, 9 for DPPO, 9 for DPPO2, 2 for DQN, 2 for DuelingDQN, 11 for PPO, 10 for PPO2, 8 for SAC, 3 for TD3, which are 58 demonstrations (嗨嗨嗨,奥利给兄弟们干了).
Noting!!!
One needs to copy the corresponding environment file into a new folder and rewrite it when yor are training it. The 'environment' folder only contains the foundamental of environments. Some details, for example, the max time per episode, the sampling period, graphic demonstration, reward function, may be different if you are using different RL algorithms. Further, if you are using DPPO2, besides the environment, you also need to copy 'Distributed_PPO2.py' into the new training folder.
Noting!!!
We put each demo a gif here:
CartPoleAngleOnly | FlightAttitudeSimulator | SecondOrderIntegration |
---|---|---|
FlightAttitudeSimulator | SecondOrderIntegration |
---|---|
BallBalancer1D | CartPole | TwoLinkManipulator |
---|---|---|
UGVForward | UGVBidirectional | SecondOrderIntegration |
UavHover |
---|
UavHoverOuterLoop |
---|
UavHoverInnerLoop |
---|
SecondOrderIntegration | UGVForward | CartPole |
---|---|---|
TwoLinkManuplator | UGVBidirectional | BallBalancer1D |
CartPoleAngleOnly | FlightAttitudeSimulator | UGVForwardObstacleAvoidance |
FlightAttitudeSimulator | SecondOrderIntegration |
---|---|
FlightAttitudeSimulator | SecondOrderIntegration |
---|---|
FlightAttitudeSimulator | SecondOrderIntegration | BallBalancer1D |
---|---|---|
CartPole | CartPoleAngleOnly | TwoLinkManipulator |
UGVForward | UGVBidirectional | |
UavHover |
---|
UavHoverOuterLoop |
---|
UavHoverInnerLoop |
---|
CartPole | CartPoleAngleOnly | FlightAttitudeSimulator |
---|---|---|
SecondOrderIntegration | UGVForward | UGVBidirecitonal |
BallBalancer1D | TwoLinkManipulator | |
UavFntsmcParamAtt |
---|
UavFntsmcParamPos |
---|
BallBalancer1D | CartPole | CartPoleAngleOnly |
---|---|---|
FlightAttitudeSimulator | SecondOrderIntegration | TwoLinkManipulator |
UGVForward | UGVBidirectional | |
CartPoleAngleOnly | FlightAttitudeSimulator | SecondOrderIntegration |
---|---|---|
- Add D4PG algorithm (maybe)
- Add more demos for DPPO2
- Add a new environment: UGVObstacleAvoidance (This environment has already been integrated into our ReinforcementLearning, but we just haven't rewritten it yet.)