This repo is the official implementation of "α-MDF: An Attention-based Multimodal Differentiable Filter for Robot State Estimation" by Xiao Liu, Yifan Zhou, Shuhei Ikemoto, and Heni Ben Amor. The project website is here.
Differentiable Filters are recursive Bayesian estimators that derive the state transition and measurement models from data alone. Their data-driven nature eschews the need for explicit analytical models, while remaining algorithmic components of the filtering process intact. As a result, the gain mechanism -- a critical component of the filtering process -- remains non-differentiable and cannot be adjusted to the specific nature of the task or context. In this paper, we propose an attention-based Multimodal Differentiable Filter (
- Attention Gain: Our approach is an attention-based strategy that replaces the conventional Kalman gain in the measurement update step, as depicted by the colored blocks in figure above. The gain mechanism is learned to update the current state based on multimodal observations.
- Latent Space Filtering: our proposed differentiable framework operates in a latent space, learning high-dimensional representations of system dynamics and capturing intricate nonlinear relationships. This approach proves particularly advantageous for highly nonlinear systems.
-
Empirical evaluations:
$\alpha$ -MDF achieves significant reductions in state estimation errors, demonstrating nearly 4-fold improvements compared to state-of-the-art sensor fusion strategies in multimodal manipulation tasks. Furthermore,$\alpha$ -MDF accurately models the non-linear dynamics of soft robots, consistently surpassing differentiable filter baselines by up to 45%.
We provide implementation using Pytorch
. Clone the repo git clone https://github.com/ir-lab/alpha-MDF.git
and then there are two options for running the code.
Intall PyTorch and then set up the environment using pip install -r requirements.txt
. Make sure to have corresponding libraries and dependencies installed on your local environment, i.e., we use PyTorch 1.8.0 with cuda11.1.
For training or testing, Go to ./latent_space
and then Run
python train.py --config ./config/xxx.yaml
Edit the conf.sh
file to set the environment variables used to start the docker
containers.
IMAGE_TAG= # unique tag to be used for the docker image.
CONTAINER_NAME=UR5 # name of the docker container.
DATASET_PATH=/home/xiao/datasets/ # Dataset path on the host machine.
CUDA_VISIBLE_DEVICES=0 # comma-separated list of GPU's to set visible.
Build the docker image by running ./build.sh
.
Create or a modify a yaml file found in ./latent_space/config/xxx.yaml
, and set the mode parameter to perform the training or testing routine.
mode:
mode: 'train' # 'train' | 'test'
Run the training and test script using the bash file ./run_filter.sh $CONFIG_FILE
where $CONFIG_FILE
is the path to the config file.
`./run_filter.sh ./config/xxx.yaml`
View the logs with docker logs -f $CONTAINER_NAME
Use the docker logs to copy the tensorboard link to a browser
docker logs -f $CONTAINER_NAME-tensorboard
We conduct a series of experiments to evaluate the efficacy of the
- Can the
$\alpha$ -MDF framework generalize across various tasks? - To what extent does the new filtering mechanism improve state tracking performance when compared to the current state-of-the-art?
- How does the use of multiple modalities compare to a subset of modalities for state estimation with differentiable filters?
We use
Left: manipulation in a simulated environment with modalities [RGB, Depth, Joints] with The attention maps indicate the attention weights assigned to each modality during model inference. In the visualization, regions in blue correspond to low attention values, while those in red indicate high attention values. Right: real-time predicted joint angle trajectories.
This experiment involves implementing the
Left: Tensegrity robot with modalities [RGB, Depth, IMUs] with The attention maps indicate the attention weights assigned to each modality during model inference. In the visualization, regions in blue correspond to low attention values, while those in red indicate high attention values. Right: real-time predicted XYZ trajectories.
https://www.cvlibs.net/datasets/kitti/eval_odometry.php
sim2real UR5 Dataset https://www.dropbox.com/sh/qgd3hc9iu1tb1cd/AABDfyYLyGpso605-19kbOhCa?dl=0 (Yifan: yzhou298@asu.edu)
The Dataset is available upon request. (Dr. Ikemoto: ikemoto@brain.kyutech.ac.jp)
- Please cite the paper if you used any materials from this repo, Thanks.
@inproceedings{liu2023alpha,
title = {$\alpha$-MDF: An Attention-based Multimodal Differentiable Filter for Robot State Estimation},
author = {Liu, Xiao and Zhou, Yifan and Ikemoto, Shuhei and Amor, Heni Ben},
booktitle = {7th Annual Conference on Robot Learning},
year = {2023}
}