shaoanlu / diffusion_policy_quadrotor Public

Notifications You must be signed in to change notification settings
Fork 1
Star 10

A simple demo of imitation learning based on diffusion policy for quadrotor control

10 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
assets		assets
config		config
core		core
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
learning_note.md		learning_note.md
pyproject.toml		pyproject.toml
train.ipynb		train.ipynb

Repository files navigation

diffusion_policy_quadrotor

This repository provides a demonstration of imitation learning using a diffusion policy on quadrotor control. The implementation is adapted from the official Diffusion Policy repository with an additional feature of using CBf-CLF controller to improve the safety of the generated trajectory.

Result

The control task is to drive the quadrotor from the initial position (0, 0) to the goal position (5, 5) without collision with the obstacles. The animation shows the denoising process of the diffusion policy predicting future trajectory followed by the quadrotor applying the actions.

Usage

The notebook demo.ipynb demonstrates a closed-loop simulation using the diffusion policy controller for quadrotor collision avoidance. You can run it in colab .

The training script is provided as train.ipynb.

Dependencies

The program was developed and tested in the following environment.

Python 3.10
torch==2.2.1
jax==0.4.26
jaxlib==0.4.26
diffusers==0.27.2
torchvision==0.14.1
gdown (to download pre-trained weights)
joblib (format of training data)

Diffusion policy

The policy takes 1) the latest N step of observation $o_t$ (position and velocity) and 2) the encoding of obstacle information $O_{BST}$ (a flattened 7x7 grid with obstacle radius as values) as input. The outputs are N steps of actions $a_t$ (future position and future velocity).

*The quadrotor icon is from flaticon.

Deviation from the original implementation

Add a linear layer before the Mish activation to the condition encoder of ConditionalResidualBlock1D. This is to prevent the activation from truncating large negative values from the normalized observation.
A CLF-CBF-QP controller is implemented and used to modify the noisy actions during the denoising process of the policy. By default, this controller is disabled.
A finetuned model for single-step inference is used by default.

References

Papers

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [arXiv:2303.04137]
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations [arXiv:2403.03954]
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think [arXiv:2409.11355]

Videos and Lectures

Learning note

Failure case: the diffusion policy controller failed to extrapolate from training data

Figure: A failure case of the controller.

The left figure is a trajectory in the training data.
The middle figure is the closed-loop simulation result of the controller starting from the SAME initial position as the training data.
The right figure is the closed-loop simulation result of the controller starting from a DIFFERENT initial position, which resulted in a trajectory with collision.

Refer to learning_note.md for other notes.

About

A simple demo of imitation learning based on diffusion policy for quadrotor control

Report repository

Releases

No releases published

Packages

No packages published

Languages