3D object detection problem

This is my project as a part of internship in ELEKS. My task was to predict 3D bounding boxes for objects on photos. To this end, I chose Google Objectron dataset as my primary data. There are several classes available. For simplicity I chose only one - cup class. Also, I train my model only on photos with one cup in it, omitting pictures with two or more cups.

Google's annotation provides coordinates of bounding box for each frame. It describes 9 points id 3d coordinate system (8 of a box and 1 in the center of a box), so 27 values total.

At first I build a simple custom model which predicts 27 values to get baseline.

This model produced MSE loss value at about 0.1805

Then I tried VGG16 model with frozen convolutional layers.

The result is much better than baseline - the MSE is 0.022

Then I unfroze weights and trained the model again.

Unfortunatelly, it didn't help a bit.

In the end, I got something like this

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
VGG16_model.py		VGG16_model.py
custom_model.py		custom_model.py
data_processing.py		data_processing.py
download_dataset.py		download_dataset.py
extract_frames.py		extract_frames.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D object detection problem

About

Releases

Packages

Languages

gnat-vasylevych/3d_object_positioning

Folders and files

Latest commit

History

Repository files navigation

3D object detection problem

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages