This repository contains the official codebase for Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation. [Project Page]
To setup the environment, please simply run
conda env create -f environment.yml
conda activate SLfM
We use speech samples from this dataset to render binaural audio. Data can be downloaded from here. Please see Dataset/LibriSpeech
for more processing details.
We use audio samples from this dataset to render binaural audio. Data can be downloaded from FMA offical github repo. Please see Dataset/Free-Music-Archive
for more processing details.
we use the SoundSpaces 2.0 platform and Habitat-Matterport 3D dataset to create our audio-visual dataset HM3D-SS. Please follow the installation guide from here.
We provide the code under (Dataset/AI-Habitat)
for generating the dataset. To create HM3D-SS dataset, simply run:
cd Dataset/AI-Habitat
# please check out the bash files before running, which require users to sepecify the output directory
sh ./multi-preprocess.sh
sh ./multi-postprocess.sh
We also provide self-recorded real-world videos under Dataset/DemoVideos/RawVideos
. The videos are recorded using iPhone 14 Pro and binaural audio are recorded with Sennheiser AMBEO Smart Headset. The demo videos are for research purposes only.
We will release several models pre-trained with our proposed methods. We hope it could benefit our research communities. To download all the checkpoints, simply run
cd slfm
sh ./scripts/download_models.sh
We provide training and evaluation scripts under scripts
, please check each bash file before running.
- To train and evaluate our SLfM cross-view binauralization pretext task and perform linear probing experiments, simply run:
cd slfm
sh ./scripts/training/slfm-pretext.sh
- To train and evaluate our SLfM model with freezed embedding from the pretext task, simply run:
cd slfm
sh ./scripts/training/slfm-geometric.sh
If you find this code useful, please consider citing:
@inproceedings{
chen2023sound,
title={Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation},
author={Chen, Ziyang and Qian, Shengyi and Owens, Andrew},
booktitle = {ICCV},
year={2023}
}
This work was funded in part by DARPA Semafor and Sony. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.