GSP-RL is a library built on PyTorch for several deep learning predictive methods to eliminate non-stationarity in MARL. This method uses A-CTDE which assumes and accounts for agent's actions impacts on every other agent's state after action execution due to either rigid or soft lattice formations.
This Library supports 4 common RL Algorithms:
- DQN http://www.nature.com/articles/nature14236
- DDQN https://arxiv.org/abs/1509.06461
- DDPG http://arxiv.org/abs/1509.02971
- TD3 https://arxiv.org/abs/1802.09477
A study of these 4 algorithms as it pertains to multi-agent reinforcement learning in swarm robotics, specifically collective transport with imperfect robots, can be found here: https://arxiv.org/pdf/2203.15129
To address the issues identified in the study above, we introdice Global State Prediction (GSP). GSP is a decentralized predictive network that observes partial observations over the other agents in the swarm and provides a prediction on the global state in the next time step. This prediction is a direct result of the actions to be executed, thus giving each agent a prediction on what the rest of the collective will do. The prediction is then fed into the action network as part of the observation at the current time step. We present GSP here: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=r_6eZtMAAAAJ&citation_for_view=r_6eZtMAAAAJ:UeHWp8X0CEIC
We next reduce the communication size of GSP to limit it to its immediate neighborhood. In our Collective Transport example this results in the immediate neighbors clockwise and counterclockwise of the current agent. This allows us to greatly reduce the scale of communication as the swarm grows and allows us to generalize a method trained on a specific number of robots to any number of robots. These results are to be submitted and a link to the paper will be uploaded shortly.
We next study the role memory plays in a distributed system. We introduce two new variations of GSP. Firstly we append a Reccurent Neural Network to the front end of GSP in the form of a LSTM layer. This provides initial short term memory retention of relevant near term history while providing overarching memory of longer term events. This is esspecially important when coming in contact with obstacles in the environment. We term this version of GSP as RGSP.
Next, we study a novel implementation of Attention encoding by replacing the GSP architecture with an attention encoder with modified front and back end to allow for continuous floating point numbers to be passed as input and for a normalized prediction to be output. We term this version, A-GSP
Both R-GSP and A-GSP are to be submitted shortly and a link will be uploaded.
Before getting started, you will need pyenv and poetry installed on your machine
After you install pyenv, you will need to add the following to .bashrc
and restart your terminal
# pyenv
export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
- Use pyenv to install and use python
3.10.2
.
pyenv install 3.10.2
- Create a new virtual dev environment
pyenv virtualenv 3.10.2 gsprl
- Avtivate the new virtual environment
pyenv activate gsprl
- Update pip
python3.10 -m pip install --upgrade pip
- Install Poetry
pip install poetry
Instalition Notes:
- You may run into an issue with the library
_tkinter
, I found installing this specific version solved the problem:sudo apt-get install tk-dev
- Specify the python version that poetry should use and create a
virtualenv
.
poetry env use 3.10.2
- Install the package and its dependancies
poetry install
- Unit Tests: you can run unit tests via the command
poetry run pytest
- RL Testing: you can test the RL algorithms on several different gym environments via the examples directory
$ cd examples/baselines
$ python cart_pole.py
$ python lunar_lander.py
$ python pendulum.py
CartPole and LunarLander are Discrete action space environments and thus can be learned via DQN or DDQN. Pendulum is a Coninuous action space and thus can be learned via DDPG or TD3