Xinyu Zhan*
·
Lixin Yang*
·
Yifei Zhao
·
Kangrui Mao
·
Hanlin Xu
Zenan Lin
·
Kailin Li
·
Cewu Lu†
This repo contains the training and evaluation of TaMF models on OakInk2 dataset. TaMF targets at the generation of hand motion sequences that can fulfill given object trajectories conditioned on task descriptions.
object_raw
models, which are aligned and downsampled from the objects' raw scans.
-
Setup dataset files.
Download tarballs from huggingface. You will need the preview version annotation tarball for all sequences, the
object_raw
tarball, theobject_repair
tarball and theprogram
tarball. Organize these files as follow:data |-- data | `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS |-- anno_preview | `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.pkl |-- object_raw |-- object_repair `-- program
Refer to oakink2_toolkit for more details.
-
Setup the enviroment.
-
Create a virtual env of python 3.10. This can be done by either
conda
or python packagevenv
.-
conda
approachconda create -p ./.conda python=3.10 conda activate ./.conda
-
venv
approach First usepyenv
or other tools to install a python intepreter of version 3.10. Here 3.10.14 is used as example:pyenv install 3.10.14 pyenv shell 3.10.14
Then create a virtual environment:
python -m venv .venv --prompt oakink2_tamf . .venv/bin/activate
-
-
Install the dependencies.
Make sure all bundled dependencies are there.
git submodule update --init --recursive --progress
Use
pip
to install the packages:pip install -r requirements.dist.txt
-
-
Download the MANO model (version v1.2) and place the files at
asset/mano_v1_2
.The directory structure should be like: ``` asset `-- mano_v1_2 `-- models |-- MANO_LEFT.pkl `-- MANO_RIGHT.pkl ```
-
Download object embeddings and sampled point clouds.
Untar the tarballs into
common
. The directory structure should be like:common |-- common/retrieve_obj_embedding/main/embedding `-- common/retrieve_obj_pointcloud/main/pointcloud
Download grabnet assets. Untar the tarballs into
asset
.asset `-- grabnet
There assets are from
https://github.com/otaheri/GrabNet
andhttps://github.com/oakink/OakInk-Grasp-Generation
. -
Save cache dict for each split.
Train:
python -m script.save_cache_dict --data.process_range ?(file:asset/split/train.txt) --data.split_name train --commit
Val:
python -m script.save_cache_dict --data.process_range ?(file:asset/split/val.txt) --data.split_name val --commit
Test:
python -m script.save_cache_dict --data.process_range ?(file:asset/split/test.txt) --data.split_name test --commit
All Dataset:
python -m script.save_cache_dict --data.process_range ?(file:asset/split/all.txt) --data.split_name all --commit
-
Train
MF-MDM G
.bash script/train.sh
-
Sample from
MF-MDM G
for data cache that will be used inMF-MDM R
training (press y to proceed)../script/sample.sh train common/train/main__<timestamp here>/save/model_0099.pt arch_mdm_l__0099
./script/sample.sh val common/train/main__<timestamp here>/save/model_0399.pt arch_mdm_l__0399
./script/sample.sh test common/train/main__<timestamp here>/save/model_0399.pt arch_mdm_l__0399
-
Train
MF-MDM R
.bash script/train_refine.sh
-
Sample from
MF-MDM R
../script/sample_refine.sh test common/train/refine__<timestamp>/save/model_0399.pt arch_mdm_l__0399
- (Optional) Download the pretrained model weights for
MF-MDM G
) andMF-MDM R
. The directory structure should be like:
common
|-- common/train/main__remastered/save
`-- common/train/refine__remastered/save
- Download the feature-extraction model weights for FID computation in the paper. You could also use your own feature-extraction model if you would like to. The directory structure should be like:
common
`-- common/train/encoder__fid_1/save
-
Sample from
MF-MDM R
. You can do it multiple times from different cache copies for evaluation../script/sample_refine.sh test common/train/refine__remastered/save/model_0399.pt arch_mdm_l__0399
-
Evaluate.
Contact Ratio, CR:
python -m script.compute_score.compute_score_cr
Solid Intersection Volume, SIV:
python -m script.compute_score.compute_score_siv
Power Spectrum Kullback-Leibler divergence of Joints, PSKL-J:
python -m script.compute_score.compute_score_psklj
FID:
python -m script.compute_score.compute_score_fid --cfg config/arch_encoder.yml --debug.encoder_checkpoint_filepath common/train/encoder__fid_1/save/model_0399.pt
Point
--sample_refine_filepath
to different saved sampled trajectories to do evaluations multiple times. -
(Optional) Visualize.
python -m script.debug.debug_refine_sample --debug.model_weight_filepath xxx.pt
If you find OakInk2 dataset or OakInk2-TAMF repo useful for your research, please considering cite us:
@InProceedings{Zhan_2024_CVPR,
author = {Zhan, Xinyu and Yang, Lixin and Zhao, Yifei and Mao, Kangrui and Xu, Hanlin and Lin, Zenan and Li, Kailin and Lu, Cewu},
title = {{OAKINK2}: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {445-456}
}
Our TaMF model is based on the Motion Diffusion Model (MDM), please also cite:
@inproceedings{
tevet2023human,
title={Human Motion Diffusion Model},
author={Guy Tevet and Sigal Raab and Brian Gordon and Yoni Shafir and Daniel Cohen-or and Amit Haim Bermano},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=SJ1kSyO2jwu}
}