Sound bubbles on hearables

The source code for Nature Electronics Paper "Sound bubbles on hearables" paper_link

Video Demo in Youtube:

Abstract

The human auditory system has a limited ability to perceive distance and distinguish speakers in crowded settings. A headset technology that can create a sound bubble in which all speakers within the bubble are audible, but speakers and noise outside the bubble are suppressed, could augment human hearing. However, developing such technology is challenging. Here we report an intelligent headset system capable of creating sound bubbles. The system is based on real-time neural networks that use acoustic data from up to six microphones integrated into noise-cancelling headsets and are run on-device, processing 8 ms audio chunks in 6.36 ms on an embedded central processing unit. Our neural networks can generate sound bubbles with programmable radii between 1 and 2 meters, and with output signals that reduce the intensity of sounds outside the bubble by 49 decibels. With previously unseen environments and wearers, our system can focus on up to two speakers within the bubble with one to two interfering speakers and noise outside the bubble.

Highlight💡Features:

1.Achieve a $\color{red}{\textsf{streaming}}$, $\color{red}{\textsf{real-time}}$ and $\color{red}{\textsf{low algorithmic latency (12ms)}}$ distance-based speech seperation model ($\color{red}{\textsf{0.3-0.5M params}}$) on low resources device (mobile cpu like Raspberry Pi).

2. $\color{red}{\textsf{Generalize}}$ to realworld unseen enviroments and wearers in presence of reverbration, motion and noise. Check the real-world end-to-end demo in our webpage.

3. The $\color{red}{\textsf{real-world}}$ dataset and checkpoints are available upon request. Please fill this form.

Running instruction

Setting up

Please note that most scripts require you to use a GPU with CUDA capabilities. We cannot guarantee the scripts will work on CPU out-of-the-box.

If this is your first time running the code, create the environment and install the required modules.

conda create --name sound_bubble python=3.8
conda activate sound_bubble
pip install -r requirements2.txt

After initial setup, source.sh should activate the environment and add the working directory to your python path. (May not work on Windows/Mac, instead, simply copy and paste the commands inside into the terminal).

source setup.sh

Play with some examples

First Download the checkpoint for synthetic dataset. Run the code to test some synthetic samples for differennt bubble sizes:

python src/test_samples.py ./test_samples/syn_1m/ ./TFG_S_big_newdis_v3_pt_fix_MutiLoss/  --distance_threshold 1 --use_cuda
python src/test_samples.py ./test_samples/syn_1_5m/ ./TFG_S_big_newdis_v3_pt_fix_MutiLoss/  --distance_threshold 1.5 --use_cuda
python src/test_samples.py ./test_samples/syn_2m/ ./TFG_S_big_newdis_v3_pt_fix_MutiLoss/  --distance_threshold 2 --use_cuda

Synthentic dataset preparation

You can use our generated synthetic dataset in dryad or generate the synthetic dataset yourself based on your own requirement.

Download the VCTK, LIBRITTS, and WHAM! dataset. The splitting files (./datasets/WHAM_split.json and ./datasets/vctk_split.json) split the VCTK and WHAM! into non-overlapped training, validation and testing sets. The LibriTTS is already split. Then to generate our synthetic dataset, run

python generate_adaptive_dataset.py VCTK_DATASET_DIRECTORY FOLDER_TO_SAVE --dis_threshold bubble_size --n_outputs_test 2000 --n_outputs_train 10000 --n_outputs_val 2000 --seed 12 --bg_voice_dir WHAM_DATASET_DIRECTORY --tts_dir LIBRITTS_DATASET_DIRECTORY

Realworld dataset preparation

The real-world dataset and checkpoints are available upon request. Please fill this form.

Model training

To train the our model, we need (1) unzip the dataset we provide and specify the dataset path in the experiment json file (2) prepare the experiment json files for training configuration (example in ./syn_experiments and ./real_experiments) (3) train with the following scripts:

python src/train_pt.py --run_dir DIRECTORY_TO_SAVE_MODEL --config PATH_TO_CONFIG_FILE

Model testing

First unzip the test dataset, then run the testing script. For synthetic data testing,

python src/eval_syn.py TEST_SET_PATH DIRECTORY_TO_SAVE_MODEL DIRECTORY_TO_SAVE_RESULT --use_cuda

For real-world data testing,

python src/eval.py TEST_SET_PATH DIRECTORY_TO_SAVE_MODEL DIRECTORY_TO_SAVE_RESULT --use_cuda

Code structure

datasets

The files for dataset splitting.

src

It includes the training and testing scripts for our model.

edge

Convert the pytorch model to ONNX model

helpers

Some utility functions

real_experiments

Experiment/Model configuration files for real-world experiment

syn_experiments

Experiment/Model configuration files for synthetic experiment

generate_adaptive_dataset.py

The script to generate the synthetic dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sound bubbles on hearables

Abstract

Highlight💡Features:

1.Achieve a $\color{red}{\textsf{streaming}}$, $\color{red}{\textsf{real-time}}$ and $\color{red}{\textsf{low algorithmic latency (12ms)}}$ distance-based speech seperation model ($\color{red}{\textsf{0.3-0.5M params}}$) on low resources device (mobile cpu like Raspberry Pi).

2. $\color{red}{\textsf{Generalize}}$ to realworld unseen enviroments and wearers in presence of reverbration, motion and noise. Check the real-world end-to-end demo in our webpage.

3. The $\color{red}{\textsf{real-world}}$ dataset and checkpoints are available upon request. Please fill this form.

Running instruction

Setting up

Play with some examples

Synthentic dataset preparation

Realworld dataset preparation

Model training

Model testing

Code structure

datasets

src

edge

helpers

real_experiments

syn_experiments

generate_adaptive_dataset.py

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
datasets		datasets
edge		edge
helpers		helpers
real_experiments		real_experiments
src		src
syn_experiments		syn_experiments
test_samples		test_samples
LICENSE		LICENSE
README.md		README.md
generate_adaptive_dataset.py		generate_adaptive_dataset.py
generate_realdata_from_denoised.py		generate_realdata_from_denoised.py
requirements2.txt		requirements2.txt
setup.sh		setup.sh

License

chentuochao/Sound_Bubble

Folders and files

Latest commit

History

Repository files navigation

Sound bubbles on hearables

Abstract

Highlight💡Features:

1.Achieve a $\color{red}{\textsf{streaming}}$, $\color{red}{\textsf{real-time}}$ and $\color{red}{\textsf{low algorithmic latency (12ms)}}$ distance-based speech seperation model ($\color{red}{\textsf{0.3-0.5M params}}$) on low resources device (mobile cpu like Raspberry Pi).

2. $\color{red}{\textsf{Generalize}}$ to realworld unseen enviroments and wearers in presence of reverbration, motion and noise. Check the real-world end-to-end demo in our webpage.

3. The $\color{red}{\textsf{real-world}}$ dataset and checkpoints are available upon request. Please fill this form.

Running instruction

Setting up

Play with some examples

Synthentic dataset preparation

Realworld dataset preparation

Model training

Model testing

Code structure

datasets

src

edge

helpers

real_experiments

syn_experiments

generate_adaptive_dataset.py

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages