Skip to content

Latest commit

 

History

History
322 lines (237 loc) · 12.6 KB

README.md

File metadata and controls

322 lines (237 loc) · 12.6 KB

HO-Cap Toolkit

Python ROS Pytorch License

The HO-Cap Toolkit is a Python package that provides evaluation and visualization tools for the HO-Cap dataset.

HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

Jikai Wang, Qifan Zhang, Yu-Wei Chao, Bowen Wen, Xiaohu Guo, Yu Xiang

[ arXiv ] [ Project page ]

hocap-demo-video


Contents

News

  • 2024-06-24: The HO-Cap dataset is released! Please check the project page for more details.

BibTeX Citation

If HO-Cap helps your research, please consider citing the following:

@misc{wang2024hocap,
      title={HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction}, 
      author={Jikai Wang and Qifan Zhang and Yu-Wei Chao and Bowen Wen and Xiaohu Guo and Yu Xiang},
      year={2024},
      eprint={2406.06843},
      archivePrefix={arXiv},
      primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
}

License

HOCap Toolkit is released under the GNU General Public License v3.0.

Installation

This code is tested with Python 3.10 and CUDA 11.8 on Ubuntu 20.04. Make sure CUDA 11.8 is installed on your system before running the code.

  1. Clone the HO-Cap repository from GitHub.

    git clone --rescursive git@github.com:IRVLUTD/HO-Cap.git
  2. Change the current directory to the cloned repository.

    cd HO-Cap
  3. Create conda environment

    conda create -n hocap-toolkit python=3.10
  4. Activate conda environment

    conda activate hocap-toolkit
  5. Install hocap-toolkit package and dependencies

    # Install dependencies
    python -m pip install --no-cache-dir -r requirements.txt
    
    # Build meshsdf_loss
    bash build.sh
    
    # Install hocap-toolkit
    python -m pip install -e .
  6. Download models for external libraries

    bash download_models.sh
    
  7. Download MANO models and code (mano_v1_2.zip) from the MANO website and place the extracted .pkl files under config/ManoModels directory. The directory should look like this:

    ./config/ManoModels
    ├── MANO_LEFT.pkl
    └── MANO_RIGHT.pkl
    

Download the HO-Cap Dataset

  1. Run below code to download the whole dataset:

    python dataset_downloader.py --subject_id all
    
  2. Or you can download the dataset for a specific subject:

    python dataset_downloader.py --subject_id 1
    
  3. The downloaded .zip files will be extracted to the ./data directory. And the directory should look like this:

    ./data
    ├── calibration
    ├── models
    ├── subject_1
    │   ├── 20231025_165502
    │   ├── ...
    ├── ...
    └── subject_9
       ├── 20231027_123403
       ├── ...
    

Loading Dataset and Visualizing Samples

  1. Below example shows how to visualize the pose annotations of one frame:

    python examples/sequence_pose_viewer.py

    sequence_pose_viewer

  2. Below example shows how to visualize sequence by the interactive 3D viewer:

    python examples/sequence_3d_viewer.py

    sequence_3d_viewer

  3. Below example shows how to offline render the sequence:

    python examples/sequence_renderer.py

    This will render the color image and segmentation map for all the frames in the sequence. The rendered images will be saved in the <sequence_folder>/renders/ directory.

    sequence_renderer_color sequence_renderer_mask

Evaluation

HO-Cap provides the benchmark evaluation for three tasks:

  • Hand Pose Estimation (A2J-Transformer1 and HaMeR2)
  • Novel Object Pose Estimation (MegaPose3 and FoundationPose4)
  • Novel Object Detection (CNOS5 and GroundingDINO6).

Run below code to download the example evaluation results:

python config/benchmarks/benchmark_downloader.py

If the evaluation results are saved in the same format, the evaluation codes below can be used to evaluate the results.

Hand Pose Estimation Evaluation

  • Evaluate the hand pose estimation performance:

    python examples/evaluate_hand_pose.py
    You should see the following output:
    PCK (0.05)  PCK (0.10)  PCK (0.15)  PCK (0.20)  MPJPE (mm)
     45.319048   81.247619   91.357143   95.080952   25.657379
    

Novel Object Pose Estimation Evaluation

  • Evaluate the novel object pose estimation performance:

    python examples/evaluate_object_pose.py
    You should see the following output:
    Object_ID  ADD-S_err (cm)  ADD_err (cm)  ADD-S_AUC (%)  ADD_AUC (%)
        G01_1        0.622285      0.931847      95.251779    93.088153
        G01_2        1.722639      2.864552      88.236088    82.951038
        G01_3        3.603058      5.267925      80.363333    74.809918
        G01_4        3.319628      5.182604      81.892213    73.259688
        G02_1        2.876358      4.932917      83.108740    71.551933
        G02_2        2.311827      4.164094      85.415819    73.653125
        G02_3        2.053942      4.038427      86.666730    73.781861
        G02_4        2.156008      4.216609      85.868099    72.308455
        G04_1        2.291773      4.423770      84.896350    70.877876
        G04_2        2.277173      4.526859      84.796541    69.969442
        G04_3        2.262719      4.480607      84.811976    70.126703
        G04_4        2.187466      4.335308      85.241063    71.009475
        G05_1        2.202152      4.406457      85.158656    70.094175
        G05_2        2.150769      4.311178      85.284022    70.394463
        G05_3        2.101135      4.209764      85.459741    70.812713
        G05_4        2.049368      4.321723      85.748722    69.201963
        G07_1        2.239657      4.499831      84.288352    68.425880
        G07_2        2.283744      4.585382      84.192769    68.369226
        G07_3        2.289358      4.521216      84.392293    69.088029
        G07_4        2.453944      4.659746      83.901788    69.095688
        G09_1        2.335954      4.383290      84.421006    70.399909
        G09_2        2.207153      4.117222      84.960095    71.813927
        G09_3        2.335119      4.363489      84.739485    70.545486
        G09_4        2.314741      4.390959      84.742636    69.967545
        G10_1        2.287382      4.345581      84.872734    70.169253
        G10_2        2.292289      4.354261      84.920001    70.067050
        G10_3        2.286696      4.332340      84.864717    70.138265
        G10_4        2.350560      4.466639      84.312511    69.109810
        G11_1        2.478856      4.630755      83.580471    68.318521
        G11_2        2.517070      4.716128      83.381718    67.764666
        G11_3        2.497892      4.752518      83.509188    67.267398
        G11_4        2.608370      4.907855      82.847013    66.485662
        G15_1        2.607319      4.912701      82.787732    66.344681
        G15_2        2.604308      4.916133      82.790136    66.274095
        G15_3        2.603031      4.916675      82.782173    66.238405
        G15_4        2.629115      4.932682      82.644975    66.187657
        G16_1        2.606751      4.876389      82.686423    66.579694
        G16_2        2.583274      4.851990      82.732962    66.555754
        G16_3        2.636666      4.903458      82.405020    66.285514
        G16_4        2.613952      4.858562      82.467323    66.479288
        G18_1        2.623657      4.922163      82.487034    66.229327
        G18_2        2.623725      4.909405      82.459508    66.320043
        G18_3        2.605120      4.869260      82.583889    66.595389
        G18_4        2.582878      4.822793      82.745806    66.909936
        G19_1        2.579643      4.815924      82.741131    66.929992
        G19_2        2.594446      4.834087      82.630870    66.835297
        G19_3        2.589485      4.847906      82.652686    66.650070
        G19_4        2.598538      4.853894      82.662542    66.699528
        G20_1        2.590124      4.950461      82.710792    65.838859
        G20_2        2.572236      4.932026      82.833246    65.916376
        G20_3        2.542719      4.877217      83.028086    66.275407
        G20_4        2.576188      4.990698      82.730561    65.337352
        G21_1        2.563550      4.973498      82.796708    65.404425
        G21_2        2.556220      4.961612      82.823936    65.445065
        G21_3        2.588855      4.998793      82.592185    65.252844
        G21_4        2.608319      5.020533      82.438422    65.145589
        G22_1        2.584527      4.989324      82.588827    65.342481
        G22_2        2.635756      5.087002      82.387761    64.790779
        G22_3        2.643167      5.106887      82.358116    64.694632
        G22_4        2.680397      5.162142      82.094643    64.501227
      Average        2.680397      5.162142      83.829502    68.882950
    

Novel Object Detection Evaluation

  • Evaluate the novel object detection performance:

    python examples/evaluate_object_detection.py
    You should see the following output: (click to expand)
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.253
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.279
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.248
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.276
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.249
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.549
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.549
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.549
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.138
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.562
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.388
    AP: 0.253 | AP_50: 0.279 | AP_75: 0.248 | AP_s: 0.016 | AP_m: 0.276 | AP_l: 0.249
    

Footnotes

  1. A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image

  2. Reconstructing Hands in 3D with Transformers

  3. MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare

  4. FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

  5. CNOS: A Strong Baseline for CAD-based Novel Object Segmentation

  6. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection