Paper link: https://arxiv.org/abs/2308.09098
Project page: https://ttaoretw.github.io/imgeonet/
Dataset | mAP@0.25 | mAP@0.5 | Log |
---|---|---|---|
ScanNet | 54.57 | 28.94 | link |
ScanNet200 | 22.38 | 9.67 | link |
ARKitScenes | 59.82 | 42.76 | link |
Performance may vary slightly depending on the number of GPUs.
# Create conda virtual environment
conda create -n imgeonet python=3.8
conda activate imgeonet
# Clone repo
git clone https://github.com/ttaoREtw/ImGeoNet.git
cd ImGeoNet
# Setup virtual environment
bash script/0_install_env.sh
Download ScanNet data and link scans
folder under data/scannet
, then run
# Warning: this step requires a lot of disk space
# Extract frame data: rgb, depth, intrinsic, pose, axis matrix
bash script/1a_extract_scannet_posed_data.sh
# Process scannet as in VoteNet
bash script/1b_preproc_scannet_data.sh
# Convert to mmdet3d's format
bash script/1c_convert_scannet_data.sh
# Download ARKitScenes data - 3D detection part
bash script/2a_download_arkit.sh
# Extract frame data: rgb, depth, intrinsic, pose
bash script/2b_preproc_arkit_data.sh
# Convert to mmdet3d's format
bash script/2c_convert_arkit_data.sh
# Train on ScanNet, ScanNet200, and ARKitScenes
bash script/3_train_imgeonet.sh
cd mmdetection3d
# config in `configs/`
# checkpoint in `work_dir/.../latest.pth`
python tools/test.py $config $checkpoint --eval mAP
@inproceedings{tu2023imgeonet,
title = {ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection},
author = {Tu, Tao and Chuang, Shun-Po and Liu, Yu-Lun and Sun, Cheng and Zhang, Ke and Roy, Donna and Kuo, Cheng-Hao and Sun, Min},
booktitle = {Proceedings of the IEEE international conference on computer vision},
year = {2023},
}
This project is built upon various open-source projects. If your work involves components related to these projects, you may have to consider citing them.