Skip to content

Fork of Tensorpack to make breaking performance improvements to the Mask RCNN example. Training is approximately 2x faster than the original implementation.

License

Notifications You must be signed in to change notification settings

YangFei1990/tensorpack-mask-rcnn

 
 

Repository files navigation

Mask RCNN

Performance focused implementation of Mask RCNN based on the Tensorpack implementation. The original paper: Mask R-CNN

Overview

This implementation of Mask RCNN is focused on increasing training throughput without sacrificing any accuracy. We do this by training with a batch size > 1 per GPU using FP16 and two custom TF ops.

Status

Training on N GPUs (V100s in our experiments) with a per-gpu batch size of M = NxM training

Training converges to target accuracy for configurations from 8x1 up to 32x4 training. Training throughput is substantially improved from original Tensorpack code.

A pre-built dockerfile is available in DockerHub under armandmcqueen/tensorpack-mask-rcnn:master-latest. It is automatically built on each commit to master.

Notes

  • Running this codebase requires a custom TF binary - available under GitHub releases (custom ops and fix for bug introduced in TF 1.13
  • We give some details the codebase and optimizations in CODEBASE.md

To launch training

Container is recommended for training

  • To train with docker, refer to Docker
  • To train with Amazon EKS, refer to EKS

Training results

The result was running on P3dn.24xl instances using EKS. 12 epochs training:

Num_GPUs x Images_Per_GPU Training time Box mAP Mask mAP
8x4 5.09h 37.47% 34.45%
16x4 3.11h 37.41% 34.47%
32x4 1.94h 37.20% 34.25%

24 epochs training:

Num_GPUs x Images_Per_GPU Training time Box mAP Mask mAP
8x4 9.78h 38.25% 35.08%
16x4 5.60h 38.44% 35.18%
32x4 3.33h 38.33% 35.12%

Tensorpack fork point

Forked from the excellent Tensorpack repo at commit a9dce5b220dca34b15122a9329ba9ff055e8edc6

About

Fork of Tensorpack to make breaking performance improvements to the Mask RCNN example. Training is approximately 2x faster than the original implementation.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.8%
  • Shell 2.0%
  • Dockerfile 0.2%