Skip to content

gnat-vasylevych/3d_object_positioning

Repository files navigation

3D object detection problem

This is my project as a part of internship in ELEKS. My task was to predict 3D bounding boxes for objects on photos. To this end, I chose Google Objectron dataset as my primary data. There are several classes available. For simplicity I chose only one - cup class. Also, I train my model only on photos with one cup in it, omitting pictures with two or more cups.


Google's annotation provides coordinates of bounding box for each frame. It describes 9 points id 3d coordinate system (8 of a box and 1 in the center of a box), so 27 values total.

At first I build a simple custom model which predicts 27 values to get baseline.

first_custom_model

This model produced MSE loss value at about 0.1805 image

Then I tried VGG16 model with frozen convolutional layers.

VGG16_froze_weights

The result is much better than baseline - the MSE is 0.022

Then I unfroze weights and trained the model again.

VGG16_unfroze_weights_15epochs

Unfortunatelly, it didn't help a bit.


In the end, I got something like this

photo

photo2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages