In our work, we experiment with using HD Maps to see how they can benefit 3d object detection. To this end, we try multiple different experiments using HD Maps from the nuScenes dataset and augment the centerpoint 3d object detection model.
Our main theme is to experiment with ideas where we allow extra information flow from our HD maps to the center-head or the bounding box head of CenterPoint. In other words, we leave the 3d backbone of centerpoint as is and see how using maps can potentially improve detection (i.e, bounding box, orientation, etc).
Our code is forked off of the CenterPoint repository and modified as required to incorporate our changes. The setup required to run our code is the same as that to run Centerpoint, but with the modification that the data directory will require HD Maps to be generated and stored in the format specified in our script.
For more details refer to our project report
Note: This repo has been moved (partially) from our internal private repository to serve as documentation. For more info, please reach out over email.
Below is the README for the original CenterPoint project. Please refer to the original centerpoint repository for any details!
3D Object Detection and Tracking using center points in the bird-eye view.
Center-based 3D Object Detection and Tracking,
Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl,
arXiv technical report (arXiv 2006.11275)
@article{yin2020center,
title={Center-based 3D Object Detection and Tracking},
author={Yin, Tianwei and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
journal={arXiv:2006.11275},
year={2020},
}
[2020-08-10] NEW: We now support vehicle detection on Waymo with SOTA performance. Please stay tuned for more updates in the fall.
Any questions or discussion are welcome!
Tianwei Yin yintianwei@utexas.edu Xingyi Zhou zhouxy@cs.utexas.edu
Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection, but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objectsas points. We use a keypoint detector to find centers of objects, and simply regress to other attributes, including 3D size, 3D orientation, and velocity. In our center-based framework, 3D object tracking simplifies to greedy closest-point matching.The resulting detection and tracking algorithm is simple, efficient, and effective. On the nuScenes dataset, our point-based representations performs 3-4mAP higher than the box-based counterparts for 3D detection, and 6 AMOTA higher for 3D tracking. Our real-time model runs end-to-end 3D detection and tracking at 30 FPS with 54.2AMOTA and 48.3mAP while the best single model achieves 60.3mAP for 3D detection, and 63.8AMOTA for 3D tracking.
-
Simple: Two sentences method summary: We use standard 3D point cloud encoder with a few convolutional layers in the head to produce a bird-eye-view heatmap and other dense regression outputs including the offset to centers in the previous frame. Detection is a simple local peak extraction, and tracking is a closest-distance matching.
-
Fast: Our PointPillars model runs at 30 FPS with 48.3 AP and 59.1 AMOTA for simultaneous 3D detection and tracking on the nuScenes dataset.
-
Accurate: Our best single model achieves 60.3 mAP and 67.3 NDS on nuScenes detection testset.
-
Extensible: Simple baseline to switch in your backbone and novel algorithms.
Split | MAP | NDS | FPS | |
---|---|---|---|---|
PointPillars-512 | Val | 48.3 | 59.1 | 30.3 |
VoxelNet-1024 | Val | 55.4 | 63.8 | 14.5 |
VoxelNet-1440_dcn_flip | Val | 59.1 | 67.1 | 2.2 |
VoxelNet-1440_dcn_flip | Test | 60.3 | 67.3 | 2.2 |
Split | Tracking time | Total time | AMOTA ↑ | AMOTP ↓ | |
---|---|---|---|---|---|
CenterPoint_pillar_512 | val | 1ms | 34ms | 54.2 | 0.680 |
CenterPoint_voxel_1024 | val | 1ms | 70ms | 62.6 | 0.630 |
CenterPoint_voxel_1440_dcn_flip | val | 1ms | 451ms | 65.9 | 0.567 |
CenterPoint_voxel_1440_dcn_flip | test | 1ms | 451ms | 63.8 | 0.555 |
All results are tested on a Titan Xp GPU with batch size 1. More models and details can be found in MODEL_ZOO.md.
- AFDet: another work inspired by CenterNet achieves good performance on KITTI/Waymo dataset.
We provide a demo with PointPillars model for 3D object detection on the nuScenes dataset.
# basic python libraries
conda create --name centerpoint python=3.6
conda activate centerpoint
conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch
git clone https://github.com/tianweiy/CenterPoint.git
cd CenterPoint
pip install -r requirements.txt
# add CenterPoint to PYTHONPATH by adding the following line to ~/.bashrc (change the path accordingly)
export PYTHONPATH="${PYTHONPATH}:PATH_TO_CENTERPOINT"
First download the model (By default, centerpoint_pillar_512) from the Model Zoo and put it in work_dirs/centerpoint_pillar_512_demo
.
We provide a driving sequence clip from the nuScenes dataset. Donwload the folder and put in the main directory.
Then run a demo by python tools/demo.py
. If setup corectly, you will see an output video like (red is gt objects, blue is the prediction):
For more advanced usage, please refer to INSTALL to set up more libraries needed for distributed training and sparse convolution.
Please refer to GETTING_START to prepare the data. Then follow the instruction there to reproduce our detection and tracking results. All detection configurations are included in configs and we provide the scripts for all tracking experiments in tracking_scripts. The pretrained models, log, and each model's prediction files are provided in the MODEL_ZOO.md.
CenterPoint is release under MIT license (see LICENSE). It is developed based on a forked version of det3d. We also incorperate a large amount of code from CenterNet and CenterTrack. See the NOTICE for details. Note that the nuScenes dataset is free of charge for non-commercial activities. Please contact the nuScenes team for commercial usage.
This project is not possible without multiple great opensourced codebases. We list some notable examples below.
CenterPoint is deeply influenced by the following projects. Please consider citing the relevant papers.
@article{zhu2019classbalanced,
title={Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection},
author={Zhu, Benjin and Jiang, Zhengkai and Zhou, Xiangxin and Li, Zeming and Yu, Gang},
journal={arXiv:1908.09492},
year={2019}
}
@article{lang2019pillar,
title={PointPillars: Fast Encoders for Object Detection From Point Clouds},
journal={CVPR},
author={Lang, Alex H. and Vora, Sourabh and Caesar, Holger and Zhou, Lubing and Yang, Jiong and Beijbom, Oscar},
year={2019},
}
@article{zhou2018voxelnet,
title={VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection},
journal={CVPR},
author={Zhou, Yin and Tuzel, Oncel},
year={2018},
}
@article{yan2018second,
title={Second: Sparsely embedded convolutional detection},
author={Yan, Yan and Mao, Yuxing and Li, Bo},
journal={Sensors},
year={2018},
}
@article{zhou2019objects,
title={Objects as Points},
author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp},
journal={arXiv:1904.07850},
year={2019}
}
@article{zhou2020tracking,
title={Tracking Objects as Points},
author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
journal={arXiv:2004.01177},
year={2020}
}