This repository contains a partial implementation of CVPR 2018 paper 'Multi-Level Fusion based 3D Object Detection from Monocular Images' by Xu, Chen et al. based on the awesome open source codebase maskrcnn-benchmark
. It is worth noting that depth estimation module(monodepth) in the original paper is replaced with a more accurate module(PSMNet) to generate disparity maps from stereo images.
Box3dList
is a structure holds a set of 3D bounding boxes represented as a Nx7 'ry-lwhxyz' tensor in one image. It also contains a set of utility methods that allow to perform geometric transformations to the bounding boxes (such as scaling and flipping).box3d_head
is a building block similar toroi_heads/box_head
in 2d, implemented to regress size, dimension and rotation of 3D bounding box.orientation_coder
the implementation of Multibin in an independent module, separated into encode and decode like modeling/box-coder.kittiDataset
a subclasses of torch.utils.data.Dataset, this module support for KITTI dataset and evaluate.- provide some useful functions convert KITTI annotation to json format, if you want to train 2d object detection on KITTI with original powerful annotation processing tools cocoapi/PythonAPI/pycocotools.
Tensorboard
support for training visualization.
For environment requirements and compilation, please refer to maskrcnn-benchmark. You can check INSTALL.md for installation instructions instead.
Clone the repository
git clone https://github.com/abbyxxn/maskrcnn-benchmark-3d
To train on the Kitti Object Detection Dataset:
- Download the KITTI object detection dataset, calib, label and place it in your home folder at ~/kitti/object
- Follow chen, et al. here to split dataset into train, val and test sets.
- It is worth noting that download color images of object data set instead of temporally preceding frames from KITTI wesite, because annotations of 3D object detector is just prepared for single frame image.
- And make sure to put the files as the following structure:
kitti
object
testing
training
calib
image_2
label_2
depth
ImageSets
train.txt
val.txt
trainval.txt
annotations
kitti_train_car_gt_roidb.pkl
kitti_val_car_gt_roidb.pkl
Recommend to symlink the path to the kitti dataset to datasets/
as follows
# symlink the kitti dataset
cd ~/github/maskrcnn-benchmark-3d
mkdir -p datasets/kitti
ln -s /path_to_kitti_dataset/object datasets/kitti/object
You can also configure your own paths to the datasets. For that, all you need to do is to modify maskrcnn_benchmark/config/paths_catalog.py
to point to the location where your dataset is stored.
You can also create a new paths_catalog.py
file which implements the same two classes, and pass it as a config argument PATHS_CATALOG
during training.
Most of the configuration files that we provide assume that we are running on 8 GPUs. In order to be able to run it on fewer GPUs, there are a few possibilities:
1. Run the following without modifications
python /path_to_maskrcnn_benchmark/tools/train_net.py --config-file "/path/to/config/file.yaml"
2. Modify the cfg parameters
Here is an example for Mask R-CNN R-50 FPN with the 1x schedule:
python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
We use internally torch.distributed.launch
in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Pythonprocess will only use a single GPU.
export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py --config-file "path/to/config/file.yaml"
- Xuan Xiong This project is based on maskrcnn-benchmark. Thanks Francisco Massa and his colleagues for their great work!
This project is licensed under the MIT License - see the LICENSE.md file for details.