This repository contains unofficial baselines for ManiSkill (more specifically, version 0.5.3
). These baselines are heavily tuned so they generally give you better sample efficiency and performance.
- Install all dependencies via
mamba
orconda
by running the following command:
mamba env create -f environment.yml
mamba activate ms
Note: mamba
is a drop-in replacement for conda
. Feel free to use conda
if you prefer it.
- Download and link the necessary assets for ManiSkill
python -m mani_skill2.utils.download_asset all # if you need the assets for all tasks
python -m mani_skill2.utils.download_asset ${ENV_ID} # if you only need the assets for one task
which downloads assets to ./data
. You may move these assets to any location. Then, add the following line to your ~/.bashrc
or ~/.zshrc
:
export MS2_ASSET_DIR=<path>/<to>/<data>
and restart your terminal.
- SAC state
- PPO state
- SAC rgbd (a few examples)
- PPO rgbd (a few examples)
- diffusion policy state (a few examples)
- diffusion policy rgbd (a few examples)
- MPC
Task | SAC (state) | SAC (RGBD) | PPO (state) | PPO (RGBD) | Diffusion Policy (state) | Diffusion Policy (RGBD) |
---|---|---|---|---|---|---|
PickCube | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
StackCube | ✅ | ❌ | ✅ | ✅ | ||
PickSingleYCB | ✅ | ✅ | ||||
PickSingleEGAD | ✅ | ✅ | ||||
PickClutterYCB | ✅ | |||||
PegInsertionSide | ✅ | ❌ | ✅ | ❌ | ||
TurnFaucet | ✅ | ✅ | ||||
PlugCharger | ❌ | |||||
PandaAvoidObstacles | ❌ | ❌ | ||||
OpenCabinetDrawer | ✅ | |||||
OpenCabinetDoor | ✅ | |||||
MoveBucket | ✅ | ❌ | ||||
PushChair |
- ✅ = works well
⚠️ = works, but there is still room for improvement- ❌ = doesn't work at all
- blank = not tested yet
The following commands should be run under the repo root dir.
State observation:
python rl/sac_state.py --env-id PickCube-v1 --total-timesteps 500_000
python rl/sac_state.py --env-id StackCube-v1 --total-timesteps 5_000_000
python rl/sac_state.py --env-id PickSingleYCB-v1 --total-timesteps 5_000_000
python rl/sac_state.py --env-id PickSingleEGAD-v1 --total-timesteps 2_000_000
python rl/sac_state.py --env-id PickClutterYCB-v1 --total-timesteps 15_000_000
python rl/sac_state.py --env-id PegInsertionSide-v1 --total-timesteps 10_000_000 --gamma 0.9 --control-mode pd_ee_delta_pose
python rl/sac_state.py --env-id TurnFaucet-v0 --total-timesteps 20_000_000 --gamma 0.95 --control-mode pd_ee_delta_pose
python rl/sac_state.py --env-id PlugCharger-v0 --total-timesteps 15_000_000 --control-mode pd_ee_delta_pose
python rl/sac_state.py --env-id OpenCabinetDrawer_unified-v1 --total-timesteps 3_000_000 --gamma 0.95 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel
python rl/sac_state.py --env-id OpenCabinetDoor_unified-v1 --total-timesteps 5_000_000 --gamma 0.95 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel
python rl/sac_state.py --env-id MoveBucket_unified-v1 --total-timesteps 80_000_000 --gamma 0.9 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
python rl/sac_state.py --env-id PushChair_unified-v1 --total-timesteps 20_000_000 --gamma 0.9 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
RGBD observation:
python rl/sac_rgbd.py --env-id PickCube-v1 --total-timesteps 500_000
Notes:
- If you want to use Weights and Biases (
wandb
) to track learning progress, please add--track
to your commands. - You can tune
--num-envs
to get better speed.
State observation:
python rl/ppo_state.py --env-id PickCube-v1 --total-timesteps 3_000_000
python rl/ppo_state.py --env-id PickSingleYCB-v1 --total-timesteps 50_000_000 --gamma 0.9 --utd 0.025
python rl/ppo_state.py --env-id PickSingleEGAD-v1 --total-timesteps 5_000_000 --utd 0.025
python rl/ppo_state.py --env-id PickClutterYCB-v1 --total-timesteps 50_000_000
python rl/sac_state.py --env-id TurnFaucet-v0 --total-timesteps 20_000_000 --gamma 0.99 --utd 0.025 --control-mode pd_ee_delta_pose
python rl/ppo_state.py --env-id OpenCabinetDrawer_unified-v1 --total-timesteps 30_000_000 --gamma 0.95 --utd 0.025 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
python rl/ppo_state.py --env-id OpenCabinetDoor_unified-v1 --total-timesteps 50_000_000 --gamma 0.95 --utd 0.025 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
python rl/ppo_state.py --env-id PushChair_unified-v1 --total-timesteps 20_000_000 --gamma 0.8 --bootstrap-at-done truncated --control-mode base_pd_joint_vel_arm_pd_joint_vel --eval-freq 500_000 --log-freq 20_000
RGBD observation:
python rl/ppo_rgbd.py --env-id PickCube-v1 --total-timesteps 5_000_000
Notes:
- PPO usually yields worse sample effiency when comapred to SAC.
State observation:
python bc/diffusion_unet.py --env-id PegInsertionSide-v0 --demo-path PATH_TO_MS2_OFFICIAL_DEMO
RGBD observation:
python bc/diffusion_unet_rgbd.py --env-id StackCube-v0 --demo-path PATH_TO_MS2_OFFICIAL_DEMO
This codebase is built upon CleanRL repository.
This project is licensed under the MIT License - see the LICENSE
file for details. Note that the repository relies on third-party code, which is subject to their respective licenses.