Skip to content

Latest commit

 

History

History
77 lines (54 loc) · 4.7 KB

README.md

File metadata and controls

77 lines (54 loc) · 4.7 KB

Data Scaling Laws in Imitation Learning for Robotic Manipulation

[Project Page] [Paper] [Models] [Processed Dataset] [Raw GoPro Videos]

wild-data wild-eval

Fanqi Lin1,2,3*, Yingdong Hu1,2,3*, Pingyue Sheng1, Chuan Wen1,2,3, Jiacheng You1, Yang Gao1,2,3,

1Tsinghua University, 2Shanghai Qi Zhi Institute, 3Shanghai Artificial Intelligence Laboratory

* indicates equal contributions

🛠️ Installation

See the UMI repository for installation.

📷 Data

We release data for all four of our tasks: pour water, arrange mouse, fold towel, and unplug charger. You can view or download all raw GoPro videos from this link, and generate the dataset for training by running:

bash run_slam.sh && bash run_generate_dataset.sh

Alternatively, we provide processed dataset here, ready for direct use in training.

You can visualize the dataset with a simple script:

python visualize_dataset.py

🦾 Real-World Evaluation

For the hardware setup, please refer to the UMI repo (note: we remove the mirror from the gripper, see link).

For each task, we release a policy trained on data collected from 32 unique environment-object pairs, with 50 demonstrations per environment. These polices generalize well to any new environment and new object. You can download them from link and run real-world evaluation using:

bash eval.sh

The temporal_agg parameter in eval.sh refers to temporal ensemble strategy mentioned in our paper, enabling smoother robot actions.

Additionally, you can use the -j parameter to reset the robot arm to a fixed initial position (make sure that the initial joint configuration specified in example/eval_robots_config.yaml is safe for your robot !!!).

📊 Reproducing Data Scaling Laws

After downloading the processed dataset, you can train a policy by running:

cd train_scripts && bash <task_name>.sh

For multi-GPU training, configure your setup with accelerate config, then replace python with accelerate launch in the <task_name>.sh script. Additionally, you can speed up training without sacrificing policy performance by adding the --mixed_precision 'bf16' argument.

Note that for the pour_water and unplug_charger tasks, we incorporate an additional step of historical observation for policy training and inference.

The current parameters in the <task_name.sh> scripts correspond to our released models, but you can customize training:

  • Use policy.obs_encoder.model_name to specify the type of vision encoder for the diffusion policy. Other options include vit_base_patch14_dinov2.lvd142m (DINOv2 ViT-Base) and vit_large_patch14_clip_224.openai (CLIP ViT-Large).
  • To adjust the number of training environment-object pairs (up to a maximum of 32), modify task.dataset.dataset_idx. You can change the proportion of demonstrations used by adjusting task.dataset.use_ratio within the range (0, 1]. Training policies on data from different environment-object pairs, using 100% of the demonstrations, generates scaling curves similar to the following:

The curve (third column) shows that the policy’s ability to generalize to new environments and objects scales approximately as a power law with the number of training environment-object pairs.

🙏 Acknowledgement

We thank the authors of UMI for sharing their codebase.