This is a fast and concise implementation of Faster R-CNN with TensorFlow2 based on endernewton TensorFlow1 implementation and other works.
The purpose and features of this repository:
- Recurrence of origin paper <Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks> with superior performance.
- Concise ,flat and straightforward model architecture with abundant comments.
- Extend from tf1 to tf2 with Eager implementation and Graph Execution.
- Faster CUDA Non Maximum Suppression implementation with Cython wrapper.
- A good start point for further inspection of classic two-stage object detection architecture.
Source | GPU | mAP | Inference Speed |
---|---|---|---|
origin paper | K40 | 0.699 | 5 fps |
this repo | 2080Ti | 0.7051 | ~15 fps |
In order to be compatible with original setup, this result shown here is initialized with VGG16 ImageNet pretrained model and trained on Pascal VOC2007 trainval
and tested on test
dataset. The inference may need warm up to achieve the best speed.
-
Clone this repository
git clone cd tf2-faster-rcnn
-
Install denpendencies with virtual env
anaconda
This project is trained and tested under Python3.7, TensorFlow v2.2 and cudatoolkit 10.1.conda create -n tf2 tensorflow-gpu cudatoolkit=10.1 conda activate tf2 conda install numpy opencv cython scipy lxml
-
Build CUDA NMS with Cython wrapper. This implementation uses CUDA NMS as default. i. Check your GPU's compute capability and modify
-arch=sm_86
in thebuild.bat
. Here usessm_86
as an example (-Xptxas="-v"
is optional for showing register and smem usage):nvcc -lib -O3 -arch=sm_86 -Xptxas="-v" -o nms_kernel.lib gpu_nms.cu
ii.Run the following commands:
cd model/utils/nms/gpu ./build.bat
Compared with pure Python NMS and Cython implementation, the gpu version is ~15x and ~12x faster in the setting of 6000 bboxes as input. Differ from the original(1-D), this CUDA NMS kernel is implemented by expanding IoU calculation into 2-D and parallelized in each CUDA thread.
-
Build the Cython CPU NMS (optional)
cd model/utils/nms/cpu python build.py build_ext --inplace
-
Prepare Pascal VOC2007 dataset Download VOCdevkit, VOCtrainval, VOCtest and extract all into
data/VOCdevkit
.- VOCdevkit - VOC2007 - VOCcode - ...
-
Download VGG16 ImageNet pretrained model(optional) If you want to initialize the training with ImageNet pretrained model and train from scratch, download vgg16 to
model/vgg16.h5
. -
Download Pascal VOC2007
trainval
dataset trained model weights tomodel/ckpt/
from Google Drive. -
Start training
python train.py
Optional arguments:
--scratch
: Use ImageNet pretrained vgg16 to train the model from scratch.--epoch <n>
: n = the number of epochs. Each epoch will train overtrainval
dataset once.--record_all
: Include kernel and gradient information in tensorboard summary.
Training results viewed by tensorboard while training:
tensorboard --logdir model/logs --bind_all
If you start the training first time, it will take some time to set up the dataset(preprocessing). After preprocessing the directory will contain data/cache/voc_07_train_gt_roidb.pkl
, data/cache/voc_07_test_gt_roidb.pkl
.
- Test with VOC2007
test
dataset:python test.py -i voc07_test
- Test with self defined images:
Put your images under
demo/images
python test.py -i demo
- Optional arguments
--unpop
: run test without popping up result images as a new window--save
: save result images todemo/results
Examples:
image source: Internet
This project is built based on the following great works:
Please check out the model structure below before digging into details. ;)
Also feel free to pull requests and open an issue if you have any problem!