This repository contains a reference PyTorch implementation of the paper:
Lessons from Learning to Spin “Pens”
Jun Wang*,
Ying Yuan*,
Haichuan Che*,
Haozhi Qi*,
Yi Ma,
Jitendra Malik,
Xiaolong Wang
[Website]
[Paper]
See installation instructions.
Our pen spinning method contains the following four steps.
- Learn a oracle policy with privileged information, point-clouds, and tactile sensor output with RL in simulation.
- Learn a student policy using the rollout of the oracle policy, also in simulation.
- Rollout trajectories generated by the oracle policy in a real robot, with initial state distribution matched. The success trajectories are collected while failures are discarded.
- Finetune the student policy in step 2 with the real-world successful trajectories.
The following session only provides example script of our method. For baselines, checkout baselines.
cd outputs/AllegroHandHora
gdown 1LCRFE6lvKSUDPpUfEATOmpDUPDbB7n8d
unzip demo.zip -d ./
cd ../../
scripts/vis_teacher.sh demo
To train an oracle policy
# 0 is GPU is
# 42 is experiment seed
scripts/train_teacher.sh 0 42 output_name
After training your oracle policy, you can visualize it as follows:
scripts/vis_teacher.sh output_name
In this section, we train a proprioceptive student policy by distilling from our trained oracle policy
Note we use the teacher rollout to train student policy, in contrast to DAgger in previous works.
scripts/train_student_sim.sh train.ppo.is_demon=True train.demon_path=ORACLE_CHECKPOINT_PATH
We have provided a reference teacher checkpoint in Google Drive.
To generate open-loop replay data for the student policy
python real/robot_controller/teacher_replay.py --data-collect --exp=0 --replay_data_dir=REPLAY_DATA_DIR
where REPLAY_DATA_DIR
is the directory to save the replay data.
Then process the replay data.
To fine-tune the student policy
scripts/finetune_ppo.sh --real-dataset-folder=REAL_DATA_PATH --checkpoint-path=YOUR_CHECKPOINTPATH
Please download the real reference data from Google Drive.
Real data:
real_data.h5 is in the format of h5 file, which contains the following keys:
-replay_demon_{idx}: the idx-th replay demonstration data
- qpos: the current qpos of the robot
- action: the delta action applied to the robot
- current_target_qpos: the target qpos of the robot
real_data_full.h5 is a full version of real_data.h5, which contains the following keys:
-replay_demon_{idx}: the idx-th replay demonstration data
- qpos: the current qpos of the robot
- action: the delta action applied to the robot
- current_target_qpos: the target qpos of the robot
- rgb_ori: the original rgb image
- rgb_c2d: the rgb image after camera2depth image processing
- depth: the depth image
- pc: the point cloud
- obj_ends: the position of object ends
Note: This repository is built based on Hora and IsaacGymEnvs.
If you find PenSpin or this codebase helpful in your research, please consider citing:
@inproceedings{wang2024penspin,
author={Wang, Jun and Yuan, Ying and Che, Haichuan and Qi, Haozhi and Ma, Yi and Malik, Jitendra and Wang, Xiaolong},
title={Lessons from Learning to Spin “Pens”},
booktitle={CoRL},
year={2024}
}