SemanticSingleViewReconstruction

3D Semantic Scene Reconstruction from a Single Viewport

Maximilian Denninger and Rudolph Triebel

Accepted paper at IMPROVE 2023. paper

Overview

Abstract

We introduce a novel method for semantic volumetric reconstructions from a single RGB image. To overcome the problem of semantically reconstructing regions in 3D that are occluded in the 2D image, we propose to combine both in an implicit encoding. By relying on a headless autoencoder, we are able to encode semantic categories and implicit TSDF values into a compressed latent representation. A second network then uses these as a reconstruction target and learns to convert color images into these latent representations, which get decoded after inference. Additionally, we introduce a novel loss-shaping technique for this implicit representation. In our experiments on the realistic benchmark Replica-dataset, we achieve a full reconstruction of a scene, which is visually and in terms of quantitative measures better than current methods while only using synthetic data during training. On top of that, we evaluate our approach on color images recorded in the wild.

Network overview

Content description

This repository contains the models used to reproduce the main results presented in the paper. We also include the code to generate the data and train the models.

Quick start

If you just want to test this method on your images, only a few steps are necessary to do that:

Head over to the Setup section, install the conda script, start the server and wait for the prediction.

Citation

If you find our work useful, please cite us with:

@inproceedings{denninger2022,
  title={3D Semantic Scene Reconstruction from a Single Viewport},
  author={Denninger, Maximilian and Triebel, Rudolph},
  booktitle={Proceedings of the 3rd International Conference on Image Processing and Vision Engineering (IMPROVE)},
  year={2022}
}

Train your own network

Everything you need to retrain these methods with your own data is provided in this repository. Before you can start with the training you need to generate the data, which is nearly completely automatized. For this head over to data generation. After you generate the data you need for the network you want to retrain, head over to the specific network:

Be aware that the data generation takes roughly 15.000 GPU hours and needs around 15 TB of storage space.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_generation		data_generation
demo		demo
docu_images		docu_images
svr		svr
.gitignore		.gitignore
README.md		README.md
SemanticSVR.yaml		SemanticSVR.yaml
download_models.py		download_models.py
map_image_to_correct_opening_angle.py		map_image_to_correct_opening_angle.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SemanticSingleViewReconstruction

3D Semantic Scene Reconstruction from a Single Viewport

Overview

Abstract

Network overview

Content description

Quick start

Citation

Train your own network

About

Releases

Packages

Languages

DLR-RM/SemanticSingleViewReconstruction

Folders and files

Latest commit

History

Repository files navigation

SemanticSingleViewReconstruction

3D Semantic Scene Reconstruction from a Single Viewport

Overview

Abstract

Network overview

Content description

Quick start

Citation

Train your own network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages