Note: A new version will be updated soon!
A modified benchmark for designing and controlling 2D Voxel-based Soft Robots (VSRs)
ModularEvoGym is compatible with Evolution Gym but provides some new features.
Design:
-
A modular robot design space.
-
We incorporate the design process into the environment and provide a universal design policy based on a simple Neural Cellular Automata (NCA), which takes multiple actions to grow a robot from an initial morphology. NCA encodes complex patterns in a neural network and generates different developmental outcomes while using a small set of trainable parameters.
Control:
-
A modular robot state-action space.
-
A generalizable control policy based on Transformer, which is capable of handling incompatible state-action spaces.
These new features make the End-to-end Brain-body Co-design of modular soft robots possible.
In ModularEvoGym, the design problem is modeled as a Markov Decision Process (MDP). Our objective is to find a universal design policy that can take in arbitrary VSRs' morphologies and output a series of actions to modify them. Typically, VSRs can be characterized as multi-cellular systems which develop from a small set of initial cells. We represent our design policy as a Neural Cellular Automata, which begins with some initial seeds and updates their states according to local rules parameterized by a multilayer perceptron, thus, NCA is agnostic to the robot's morphological topology.
The above figure demonstrates a single design step when developing a VSR. The design space is surrounded by empty voxels, and each voxel is represented as a discrete value that corresponds to its material property (e.g., empty voxel=0, rigid voxel=1, soft voxel=2, horizontal actuator=3 and vertical actuator=4). The state vector
We model the local observations of all voxels as a sequence and provide a universal Transformer-based control policy.
Observation Space of ModularEvoGym
The input state of the robot at time step
During the simulation, voxels (except empty voxels) only sense locally, and based on the input sensory information, a controller outputs control signals to vary the volume of actuator voxels. By default, we use
git clone --recurse-submodules https://github.com/Yuxing-Wang-THU/ModularEvoGym.git
Make sure that submodules (glfw, glew and pybind11) are successfully downloaded to "/evogym/simulator/externals"
Requirements:
sudo apt-get install xorg-dev libglu1-mesa-dev
Install Python dependencies:
conda create -n modularevogym python==3.7.11
conda activate modularevogym
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install Gpy==1.10.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install git+https://github.com/yunshengtian/GPyOpt.git
pip install git+https://github.com/yunshengtian/neat-python.git
To build the C++ simulation, build all the submodules, and install evogym
run the following command:
python setup.py install
If you meet this error "Could NOT find GLEW (missing: GLEW_INCLUDE_DIRS GLEW_LIBRARIES)", run
sudo apt install libglew-dev
cd to the examples
folder and run the following script:
python modularevogym_test.py
The code of modularevogym_test.py
import gym
import evogym.envs
from evogym import sample_robot
import numpy as np
from evogym.utils import MODULAR_ENV_NAMES
if __name__ == '__main__':
# Setting
mode = "modular"
body_size = (5,5)
for env_name in MODULAR_ENV_NAMES:
print("MODULAR ENV TEST: ", env_name)
body, connections = sample_robot(body_size)
# ModularEvoGym is compatible with EvolutionGym, if you just want to use EvoGym
env = gym.make(env_name, body=body)
# If you want to use ModularEvoGym, add mode='modular' and env_id=env_name
env = gym.make(env_name, body=body, mode='modular', env_id=env_name)
obs = env.reset()
# Just Test: Update the orignal env
new_body, new_connections = sample_robot(body_size)
env.update(body=new_body, connections=new_connections)
obs = env.reset()
# Rollout
while True:
if mode == 'modular':
action = np.random.uniform(low=-1.0, high=1.0, size=body_size[0]*body_size[1])
else:
action = env.action_space.sample()
ob, reward, done, info = env.step(action)
# env.render()
if done:
break
env.close()
print("Done!")
We provide a universal Transformer-based controller which can handle incompatible state-action spaces. This controller can be trained by many popular RL methods (e.g., SAC, PPO and DDPG).
A learnable position embedding is applied after the linear projection layer. The local observation, the output feature of the Transformer and the task-related observation are concatenated before passing them through a decoder. Here, the control policy outputs the mean
To optimize the control of a predefined robot, cd to the examples
folder and run the following script:
Use self-attention
python run_transformer_ppo.py
Remove self-attention
python run_transformer_ppo.py --ac_type fc
All logs are stored in examples/saved_data/
, and you can find some trained models in examples/visual
.
To visualize the training process:
python simple_plotter.py
To evaluate the trained controller:
python simple_evaluate.py
To make a gif:
python simple_gif.py
Note: To get a smaller gif, make sure that the gifsicle
is successfully installed on your system. You can run the following command (for Ubuntu):
sudo apt-get install -y gifsicle
The Transformer-based control works well with those robots with beneficial homogeneous tissue structures. When the robot is moving, we can observe that groups of voxels are expanded or compressed simultaneously, which is related to Muscle Synergy (more visual results).
A muscle synergy is the activation of a group of muscles to contribute to a particular movement, thus reducing the dimensionality of muscle control. A single muscle can be part of multiple muscle synergies, and a single synergy can activate various muscles. Different ways of grouping muscles into synergies can be found in this reference literature.
Self-attention brings better interpretability than multilayer perceptron. We use only one Transformer encoder layer, thus we visualize the generated attention matrix after one input state passes through the attention layer. The above figure shows attention matrices produced by the control policy network. The color of each attention score tells the strength of the compatibility between inputs and interprets what is driving the current behaviour. When the robot's front foot (voxel
Generalization can be further enhanced by modularization, due to the success of modeling dynamic structures via self-attention. Our Transformer-based controller is able to handle incompatible state-action spaces, thus it is possible to pre-train a powerful policy.
To train some randomly generated robots to walk, cd to the examples
folder and run the following script:
python run_transformer_ppo_multi.py
Example Walker
morphologies to be controlled:
Two logs are stored in examples/saved_data/Walker-v0
and examples/saved_data/Pusher-v0
, respectively.
Learning curves:
A robot's ability to interact with the environment depends both on its brain (control policy) and body (morphology), which are inherently coupled. With the help of ModularEvoGym, we can learn how to co-design a modular soft robot.
Different from the traditional expensive bi-level optimization methods which maintain a population of design prototypes (left in the above figure), here, we can simultaneously optimize the design and control via reinforcement learning, which is enabled by unifying the two processes into a single MDP and using our proposed NCA-based design policy and Transformer-based control policy (right in the above figure).
Specifically, we maintain a universal design policy
At the beginning of each episode, the design policy
To co-design a Thrower
robot in examples
folder and run the following script:
python run_cuco.py --env Thrower-v0 --target_size 5 --rl_only --train_iters 3000
Logs are stored in examples/saved_data/Thrower-v0
.
Let's have a look on the robots designed by our method:
Robots designed by evolution-based methods (Genetic Algorithm & Bayesian Optimization) from Evolution Gym.
We have shown that it is possible to co-design robots via end-to-end RL. It's really interesting, however, given a big modular design space (e.g.,
A simple implementation of CuCo can be found in /exapmles/CuCo
and logs are stored in /exapmles/saved_data
.
Run the following code for training:
python run_cuco.py --env Walker-v0 --target_size 7 --train_iters 1000
Remove the curriculum mechanism:
python run_cuco.py --env Walker-v0 --target_size 7 --rl_only --train_iters 3000
Learning curves:
@inproceedings{
wang2023curriculumbased,
title={Curriculum-based Co-design of Morphology and Control of Voxel-based Soft Robots},
author={Yuxing Wang and Shuang Wu and Haobo Fu and QIANG FU and Tiantian Zhang and Yongzhe Chang and Xueqian Wang},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=r9fX833CsuN}
}
[1] Jagdeep Bhatia, Holly Jackson, Yunsheng Tian, Jie Xu, and Wojciech Matusik. Evolution gym: A large-scale benchmark for evolving soft robots. In NeurIPS, 2021.
[2] Agrim Gupta, Linxi (Jim) Fan, Surya Ganguli, and Li Fei-Fei. Metamorph: Learning universal controllers with transformers. ArXiv, abs/2203.11931, 2022.
[3] Vitaly Kurin, Maximilian Igl, Tim Rocktaschel, Wendelin Boehmer, and Shimon Whiteson. My body is a cage: the role of morphology in graph-based incompatible control. ArXiv, abs/2010.01856, 2021.