Baseline Model: Task-aware Motion Fulfillment (TaMF)

Xinyu Zhan* · Lixin Yang* · Yifei Zhao · Kangrui Mao · Hanlin Xu
Zenan Lin · Kailin Li · Cewu Lu†

CVPR 2024

This repo contains the training and evaluation of TaMF models on OakInk2 dataset. TaMF targets at the generation of hand motion sequences that can fulfill given object trajectories conditioned on task descriptions.

⚠️ This work uses object_raw models, which are aligned and downsampled from the objects' raw scans.

Get Started

Setup dataset files.

Download tarballs from huggingface. You will need the preview version annotation tarball for all sequences, the object_raw tarball, the object_repair tarball and the program tarball. Organize these files as follow:
```
data
|-- data
|   `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS
|-- anno_preview
|   `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.pkl
|-- object_raw
|-- object_repair
`-- program
```
Refer to oakink2_toolkit for more details.
Setup the enviroment.
1. Create a virtual env of python 3.10. This can be done by either conda or python package venv.
  1. conda approach
```
conda create -p ./.conda python=3.10
conda activate ./.conda
```
  2. venv approach First use pyenv or other tools to install a python intepreter of version 3.10. Here 3.10.14 is used as example:
```
pyenv install 3.10.14
pyenv shell 3.10.14
```
    Then create a virtual environment:
```
python -m venv .venv --prompt oakink2_tamf
. .venv/bin/activate
```
2. Install the dependencies.
  
  Make sure all bundled dependencies are there.
```
git submodule update --init --recursive --progress
```
  Use pip to install the packages:
```
pip install -r requirements.dist.txt
```

Train

Download the MANO model (version v1.2) and place the files at asset/mano_v1_2.

 The directory structure should be like:
 ```
 asset
 `-- mano_v1_2
     `-- models
         |-- MANO_LEFT.pkl
         `-- MANO_RIGHT.pkl
 ```

Download object embeddings and sampled point clouds.

Untar the tarballs into common. The directory structure should be like:
```
common
|-- common/retrieve_obj_embedding/main/embedding
`-- common/retrieve_obj_pointcloud/main/pointcloud
```
Download grabnet assets. Untar the tarballs into asset.
```
asset
`-- grabnet
```
There assets are from https://github.com/otaheri/GrabNet and https://github.com/oakink/OakInk-Grasp-Generation.

Save cache dict for each split.

Train:

python -m script.save_cache_dict --data.process_range ?(file:asset/split/train.txt) --data.split_name train --commit

Val:

python -m script.save_cache_dict --data.process_range ?(file:asset/split/val.txt) --data.split_name val --commit

Test:

python -m script.save_cache_dict --data.process_range ?(file:asset/split/test.txt) --data.split_name test --commit

All Dataset:

python -m script.save_cache_dict --data.process_range ?(file:asset/split/all.txt) --data.split_name all --commit

Train MF-MDM G.
```
bash script/train.sh
```

Sample from MF-MDM G for data cache that will be used in MF-MDM R training (press y to proceed).

./script/sample.sh train common/train/main__<timestamp here>/save/model_0099.pt arch_mdm_l__0099

./script/sample.sh val common/train/main__<timestamp here>/save/model_0399.pt arch_mdm_l__0399

./script/sample.sh test common/train/main__<timestamp here>/save/model_0399.pt arch_mdm_l__0399

Train MF-MDM R.
```
bash script/train_refine.sh
```

Sample from MF-MDM R.

./script/sample_refine.sh test common/train/refine__<timestamp>/save/model_0399.pt arch_mdm_l__0399

Evaluation

(Optional) Download the pretrained model weights for MF-MDM G) and MF-MDM R. The directory structure should be like:

common
|-- common/train/main__remastered/save
`-- common/train/refine__remastered/save

Download the feature-extraction model weights for FID computation in the paper. You could also use your own feature-extraction model if you would like to. The directory structure should be like:

common
`-- common/train/encoder__fid_1/save

Sample from MF-MDM R. You can do it multiple times from different cache copies for evaluation.

./script/sample_refine.sh test common/train/refine__remastered/save/model_0399.pt arch_mdm_l__0399

Evaluate.

Contact Ratio, CR:

python -m script.compute_score.compute_score_cr

Solid Intersection Volume, SIV:

python -m script.compute_score.compute_score_siv

Power Spectrum Kullback-Leibler divergence of Joints, PSKL-J:

python -m script.compute_score.compute_score_psklj

FID:

python -m script.compute_score.compute_score_fid --cfg config/arch_encoder.yml --debug.encoder_checkpoint_filepath common/train/encoder__fid_1/save/model_0399.pt

Point --sample_refine_filepath to different saved sampled trajectories to do evaluations multiple times.

(Optional) Visualize.

python -m script.debug.debug_refine_sample --debug.model_weight_filepath xxx.pt

Citation

If you find OakInk2 dataset or OakInk2-TAMF repo useful for your research, please considering cite us:

@InProceedings{Zhan_2024_CVPR,
    author    = {Zhan, Xinyu and Yang, Lixin and Zhao, Yifei and Mao, Kangrui and Xu, Hanlin and Lin, Zenan and Li, Kailin and Lu, Cewu},
    title     = {{OAKINK2}: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {445-456}
}

Our TaMF model is based on the Motion Diffusion Model (MDM), please also cite:

@inproceedings{
    tevet2023human,
    title={Human Motion Diffusion Model},
    author={Guy Tevet and Sigal Raab and Brian Gordon and Yoni Shafir and Daniel Cohen-or and Amit Haim Bermano},
    booktitle={The Eleventh International Conference on Learning Representations },
    year={2023},
    url={https://openreview.net/forum?id=SJ1kSyO2jwu}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Baseline Model: Task-aware Motion Fulfillment (TaMF)

CVPR 2024

Get Started

Train

Evaluation

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Baseline Model: Task-aware Motion Fulfillment (TaMF)

CVPR 2024

Get Started

Train

Evaluation

Citation