Skip to content

Flexible 3D detection framework supporting diverse datasets, anchor boxes, and multi-object inference.

Notifications You must be signed in to change notification settings

jungarden/Flex3D-bbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Flexible 3D Object Detection (Flex3D-bbox)

Introduction

Building on the work by Bugra Tekin, Sudipta N. Sinha, and Pascal Fua in "Real-Time Seamless Single Shot 6D Object Pose Prediction" (CVPR 2018), this implementation enhances the original framework to better address everyday object detection tasks. Paper

"We've integrated fine-tuning across diverse datasets, ranging from custom-labeled data to standard benchmarks, with seamless conversion into a unified labeling format. The system supports multiple input types, including images, videos, and even webcam feeds, and is optimized for robust multi-object and multi-class inference. These enhancements make the method highly adaptable and effective for a wide range of real-world applications."

video2

Key Features

  • Utilizing Various Datasets: Includes parcel3d, AIHUB, and other "manually labeled" custom datasets.
  • Omitting Reprojection Process: Streamlined pipeline by removing unnecessary reprojection.
  • Generating Inference Code: Easy-to-use inference code generation.
  • Adding Multi-Object Inference: Enhanced capabilities for detecting multiple objects simultaneously.
  • Introducing Anchors: Improved detection accuracy through the use of anchors.

1. Download the Repository

Download the repository including the necessary datasets:

git clone https://github.com/jungarden/Flex3D-bbox.git

2. System Environment

Ensure your environment meets the following requirements:

  • Python: 3.6
  • CUDA: 11.1
  • Cudnn: 8
  • Docker: Image: nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04

Install the required libraries as follows:

  • PyTorch:

    pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
  • OpenCV:

    pip install opencv-python
    # Alternatively, install a specific version:
    # pip install opencv-contrib-python==4.1.0.25
  • Scipy:

    pip install scipy==1.2.0
  • Pillow:

    pip install pillow==8.2.0
  • tqdm:

    pip install tqdm==4.64.1

3. Parsing

Before training, ensure your dataset labels are correctly formatted using the txt_labels.py script: This script parses and converts your dataset's labeling information into the format required for training. Make sure to select the appropriate labeling method for your dataset, whether it is manually labeled or follows the AIHUB dataset format.

python3 making_txt_labels.py
  • glove00 folder structure

image

  • glove00.data

image

4-1. Training (Multi-Object)

To train the model on multiple objects across datasets, use the following command:

python3 train_multi.py \
--datacfg data/occlusion.data \
--modelcfg
cfg/yolo-pose-multi.cfg \
--initweightfile cfg/darknet19_448.conv.23 \ 
--pretrain_num_epochs 15

4-2. Training (Finetuning)

For finetuning on a custom dataset, run:

python3 train.py \
--datacfg data/trainbox.data \
--modelcfg cfg/yolo-pose.cfg \
--initweightfile backup/parcel3d/model.weights \
--pretrain_num_epochs 5

5. Inference

To perform inference on a video file, execute:

python3 inference.py \
--datacfg data/occlusion.data \
--modelcfg cfg/yolo-pose-multi.cfg \
--initweightfile backup_multi/model.weights \
--file video.mp4

6. Results

Below is an example of the detection results: *multi classes image video

7. References


Additional Information

System Architecture

  • Repository Structure:
    • baseline/: Single object detection
    • multi/: Multi-object detection
    • dataset/: Contains various datasets
    • utils/: Contains utility functions (e.g get_anchors.py)

Code Modifications

  • train.py: Removed internal parameters, rotation matrices, and reprojection variables.
  • utils.py: Created build_target_anchors to consider anchors in single detection('baseline'), modified 'get_region_boxes' to consider anchors.
  • image.py & dataset.py: Updated paths for custom datasets.
  • yolo-pose.cfg: Adjusted the number of filters for anchors and classes.
  • inference.py: Added visualization for bounding boxes and classes.

About

Flexible 3D detection framework supporting diverse datasets, anchor boxes, and multi-object inference.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages