Skip to content
This repository has been archived by the owner on Feb 28, 2018. It is now read-only.

Latest commit

 

History

History
6 lines (4 loc) · 531 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 531 Bytes

CS598ps_project

In this paper we present a creative approach to reconstruct 3D audio for multiple sources from a single channel input by detecting and tracking visual cues using supervised learning methods. We also discuss a similar approach for improving speaker’s classification from a video stream by employing both facial and speech likelihoods, or simply Multimodal Speaker Recognition on a video stream.

Videos assets are here: