Skip to content

[CVPR 2024] OakInk2 baseline model: Task-aware Motion Fulfillment (TaMF) via Diffusion

Notifications You must be signed in to change notification settings

oakink/OakInk2-TaMF

Repository files navigation

Logo

Baseline Model: Task-aware Motion Fulfillment (TaMF)

Xinyu Zhan* · Lixin Yang* · Yifei Zhao · Kangrui Mao · Hanlin Xu
Zenan Lin · Kailin Li · Cewu Lu

CVPR 2024

Paper PDF Project Page youtube views

This repo contains the training and evaluation of TaMF models on OakInk2 dataset. TaMF targets at the generation of hand motion sequences that can fulfill given object trajectories conditioned on task descriptions.


⚠️ This work uses object_raw models, which are aligned and downsampled from the objects' raw scans.

Get Started

  1. Setup dataset files.

    Download tarballs from huggingface. You will need the preview version annotation tarball for all sequences, the object_raw tarball, the object_repair tarball and the program tarball. Organize these files as follow:

    data
    |-- data
    |   `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS
    |-- anno_preview
    |   `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.pkl
    |-- object_raw
    |-- object_repair
    `-- program
    

    Refer to oakink2_toolkit for more details.

  2. Setup the enviroment.

    1. Create a virtual env of python 3.10. This can be done by either conda or python package venv.

      1. conda approach

        conda create -p ./.conda python=3.10
        conda activate ./.conda
      2. venv approach First use pyenv or other tools to install a python intepreter of version 3.10. Here 3.10.14 is used as example:

        pyenv install 3.10.14
        pyenv shell 3.10.14

        Then create a virtual environment:

        python -m venv .venv --prompt oakink2_tamf
        . .venv/bin/activate
    2. Install the dependencies.

      Make sure all bundled dependencies are there.

      git submodule update --init --recursive --progress

      Use pip to install the packages:

      pip install -r requirements.dist.txt

Train

  1. Download the MANO model (version v1.2) and place the files at asset/mano_v1_2.

     The directory structure should be like:
     ```
     asset
     `-- mano_v1_2
         `-- models
             |-- MANO_LEFT.pkl
             `-- MANO_RIGHT.pkl
     ```
    
  2. Download object embeddings and sampled point clouds.

    Untar the tarballs into common. The directory structure should be like:

    common
    |-- common/retrieve_obj_embedding/main/embedding
    `-- common/retrieve_obj_pointcloud/main/pointcloud
    

    Download grabnet assets. Untar the tarballs into asset.

    asset
    `-- grabnet
    

    There assets are from https://github.com/otaheri/GrabNet and https://github.com/oakink/OakInk-Grasp-Generation.

  3. Save cache dict for each split.

    Train:

    python -m script.save_cache_dict --data.process_range ?(file:asset/split/train.txt) --data.split_name train --commit

    Val:

    python -m script.save_cache_dict --data.process_range ?(file:asset/split/val.txt) --data.split_name val --commit

    Test:

    python -m script.save_cache_dict --data.process_range ?(file:asset/split/test.txt) --data.split_name test --commit

    All Dataset:

    python -m script.save_cache_dict --data.process_range ?(file:asset/split/all.txt) --data.split_name all --commit
  4. Train MF-MDM G.

    bash script/train.sh
  5. Sample from MF-MDM G for data cache that will be used in MF-MDM R training (press y to proceed).

    ./script/sample.sh train common/train/main__<timestamp here>/save/model_0099.pt arch_mdm_l__0099
    ./script/sample.sh val common/train/main__<timestamp here>/save/model_0399.pt arch_mdm_l__0399
    ./script/sample.sh test common/train/main__<timestamp here>/save/model_0399.pt arch_mdm_l__0399
  6. Train MF-MDM R.

    bash script/train_refine.sh
  7. Sample from MF-MDM R.

    ./script/sample_refine.sh test common/train/refine__<timestamp>/save/model_0399.pt arch_mdm_l__0399

Evaluation

  1. (Optional) Download the pretrained model weights for MF-MDM G) and MF-MDM R. The directory structure should be like:
common
|-- common/train/main__remastered/save
`-- common/train/refine__remastered/save
  1. Download the feature-extraction model weights for FID computation in the paper. You could also use your own feature-extraction model if you would like to. The directory structure should be like:
common
`-- common/train/encoder__fid_1/save
  1. Sample from MF-MDM R. You can do it multiple times from different cache copies for evaluation.

    ./script/sample_refine.sh test common/train/refine__remastered/save/model_0399.pt arch_mdm_l__0399
  2. Evaluate.

    Contact Ratio, CR:

    python -m script.compute_score.compute_score_cr

    Solid Intersection Volume, SIV:

    python -m script.compute_score.compute_score_siv

    Power Spectrum Kullback-Leibler divergence of Joints, PSKL-J:

    python -m script.compute_score.compute_score_psklj

    FID:

    python -m script.compute_score.compute_score_fid --cfg config/arch_encoder.yml --debug.encoder_checkpoint_filepath common/train/encoder__fid_1/save/model_0399.pt 

    Point --sample_refine_filepath to different saved sampled trajectories to do evaluations multiple times.

  3. (Optional) Visualize.

python -m script.debug.debug_refine_sample --debug.model_weight_filepath xxx.pt

Citation

If you find OakInk2 dataset or OakInk2-TAMF repo useful for your research, please considering cite us:

@InProceedings{Zhan_2024_CVPR,
    author    = {Zhan, Xinyu and Yang, Lixin and Zhao, Yifei and Mao, Kangrui and Xu, Hanlin and Lin, Zenan and Li, Kailin and Lu, Cewu},
    title     = {{OAKINK2}: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {445-456}
}
Our TaMF model is based on the Motion Diffusion Model (MDM), please also cite:
@inproceedings{
    tevet2023human,
    title={Human Motion Diffusion Model},
    author={Guy Tevet and Sigal Raab and Brian Gordon and Yoni Shafir and Daniel Cohen-or and Amit Haim Bermano},
    booktitle={The Eleventh International Conference on Learning Representations },
    year={2023},
    url={https://openreview.net/forum?id=SJ1kSyO2jwu}
}

About

[CVPR 2024] OakInk2 baseline model: Task-aware Motion Fulfillment (TaMF) via Diffusion

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published