Skip to content
/ SLfM Public

Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

License

Notifications You must be signed in to change notification settings

IFICL/SLfM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Ziyang Chen, Shengyi Qian, Andrew Owens
University of Michigan, Ann Arbor
ICCV 2023


This repository contains the official codebase for Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation. [Project Page]

SLfM Illustration

Environment

To setup the environment, please simply run

conda env create -f environment.yml
conda activate SLfM

Datasets

LibriSpeech

We use speech samples from this dataset to render binaural audio. Data can be downloaded from here. Please see Dataset/LibriSpeech for more processing details.

Free Music Archive (FMA)

We use audio samples from this dataset to render binaural audio. Data can be downloaded from FMA offical github repo. Please see Dataset/Free-Music-Archive for more processing details.

HM3D-SS

we use the SoundSpaces 2.0 platform and Habitat-Matterport 3D dataset to create our audio-visual dataset HM3D-SS. Please follow the installation guide from here.

We provide the code under (Dataset/AI-Habitat) for generating the dataset. To create HM3D-SS dataset, simply run:

cd Dataset/AI-Habitat
# please check out the bash files before running, which require users to sepecify the output directory
sh ./multi-preprocess.sh
sh ./multi-postprocess.sh

Demo Videos

We also provide self-recorded real-world videos under Dataset/DemoVideos/RawVideos. The videos are recorded using iPhone 14 Pro and binaural audio are recorded with Sennheiser AMBEO Smart Headset. The demo videos are for research purposes only.

Pretrained Models

We will release several models pre-trained with our proposed methods. We hope it could benefit our research communities. To download all the checkpoints, simply run

cd slfm
sh ./scripts/download_models.sh

Train & Evaluation

We provide training and evaluation scripts under scripts, please check each bash file before running.

  • To train and evaluate our SLfM cross-view binauralization pretext task and perform linear probing experiments, simply run:
cd slfm
sh ./scripts/training/slfm-pretext.sh
  • To train and evaluate our SLfM model with freezed embedding from the pretext task, simply run:
cd slfm
sh ./scripts/training/slfm-geometric.sh

Citation

If you find this code useful, please consider citing:

@inproceedings{
    chen2023sound,
    title={Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation},
    author={Chen, Ziyang and Qian, Shengyi and Owens, Andrew},
    booktitle = {ICCV},
    year={2023}
}

Acknowledgment

This work was funded in part by DARPA Semafor and Sony. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

About

Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published