Skip to content

Investigate mapping of articulations from the image space to the latent space using neural networks.

Notifications You must be signed in to change notification settings

R-Haecker/latent-representations-of-articulations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Representation of three dimensional Objects
using Neural Networks

The code provided in this reposoitory was created during my bachelor thesis in the research group Computer Vision, at the Heidelberg Collaboratory for Image processing.
Below you can find a brief description of this work, for all details you should have a look here.

Abstract

In this thesis we will investigate the mapping from the image space to the latent space using neural networks. We will focus on image data sets, specifically created to display three-dimensional objects with exact labeled articulations. Using a variational autoencoder in combination with a discriminative network, the aim is to extract and investigate information about articulations from images. This enables us to explore and compare the mapping of specific articulation parameters onto the latent space. The main contribution of this work is the comparison between natural interpolations of articulations and different interpolations in the latent space. Furthermore, we investigate how a metric loss improves the model and how a discriminator helps expand the latent space around observations.

Model

We used an Variational Autoencoder (VAE) and an adversarial Discriminator as our model.
The architecture used for the Variational Autoencoder is shown in the following figure.

alt text

Discriminator Loss

VAE Loss

Data sets

We use data sets created with tools from this repository.

1) Phi Data set

This data set consist of 10,000 samples containing only one articulation of two cuboids while varying the parameter Phi as shown below. Hence, the articulation is horizontally rotated without any other changes throughout the whole data set.

alt text

2) Varied Data set

This data set consists of 500,000 samples displaying between two and four cuboids. To create this diversified data set we vary the articulation parameters Phi Theta and Lambda while allowing different angles between the cuboids but not different scales of cuboids. To introduce even more complexity we will not only vary the articulation parameters but including the appearances and lighting parameters into into the parameter space. This enables different colors, directional lights, up to four spotlights and four point lights with different settings to be randomly chosen in every image. Therefore, creating the following examples.

alt text

Experiments

Updating the Discriminator

updating both networks,
     at every training step
inferior network,
     comparing the discriminator output of the "false" image for the VAE loss and the discriminator loss probabilistic inferior network,      inferior method but with a randomness introduced for the decision
accuracy threshold,
     calculate the accuracy of the discriminator predictions and update below a given accuracy threshold reducing learning rate,
     improve the model by reducing the learning rate before the end of training

The following table shows the FID scores on the Varied Data set for networks trained with different methods to update the disciminator during training. Lower is better.

reducing learning rate inferior network probablistic inferior network both networks accuracy threshold
without 50.53 36.39 73.51 39.23
with 52.57 29.14 53.55 28.09

Sampling from latent space

We use Principle Component Analysis (PCA) on the latent representation of the Phi validation data set to sample from the principle component (pc) with the highest eigenvalue.
These latent samples create following images.

alt text

Furthermore we used an Uniform Manifold Approximation and Projection (UMAP) for dimension reduction to visualize the 128 dimensional latent space in a 2 dimensional plot. In the following figure we compare the pc latent sampling with the representations of the images from the data set.

alt text

Adding Metric Loss

By adding a metric triplet loss we hoped for a better embedding of the parameter phi in the latent space. Redoing the experiments from before results in the following figures.

alt text alt text

Interpolaion in latent space

We use an image sequence with a natural interpolation of the articulation parameter to create the input images in the following figure. The reconstructed output from the VAE model is displayed as the output.

alt text

We will use the latent representation of the first and last image to interpolate between them in the latent space. We use an euclidean interpolation and a spherical interpolation to create the following images.

alt text
alt text

We use again a UMAP dimension reduction to compare the two interpolations.

alt text

Adding Metric Loss

alt text

Conclusion

Updating the Discriminator
     Accuracy threshold as criteria for updating the discriminator
     Updating the inferior network with additional randomness

PCA sampling in the latent space
     linear PC can not represent an articulation parameter
     metric loss does not improve correlation

Interpolation in latent space
     linear interpolation better than a spherical interpolation
     metric loss decreases the correlation

About

Investigate mapping of articulations from the image space to the latent space using neural networks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages