Skip to content

Neural audio inpainting using a lightweight CNN for real-time inference

Notifications You must be signed in to change notification settings

carlmoore256/NextBlock

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NextBlock - Neural Audio Distortion Repair for Realtime Network Streaming

Live audio dropout correction using a variety of CNN-based approaches

This project aims to determine the viability of using convolutional neural networks to reduce the THD present in low-latency P2P audio communication protocols as a result of rectangular windowing artifacts. These artifacts occur when using JackTrip, when its internal ring buffer experiences packet underruns due to poor connectivity.

The approach is to create an on-line, client-side Tensorflow prediction pipeline integrated with the Jack Audio Connection Kit. NextBlock uses a lightweight 1D convoltional network to continuously predict the next block of incoming audio and repair the stream if packets are reported as dropped.

It also presents a solution for on-line server-side learning for realtime audio applications. Both use frequency-domain (FFT) approaches for training and inference, and require an intermediate processing step for processing the FFT before a forward pass through the network, and afterwards for the inverse, to yield the waveform prediction of the next block.

Dataset

The Cambridge Multitrack download library provides hundreds of free multitrack recording sessions as waveform stems. I've created this accompanying set of tools to extract labels and features from the stems, and optimize the large collection of studio recorded audio tracks for machine learning purposes.

Timbral verification of training data

Around 18 hours of audio labeled with "vocals" and/or "vox" are verified for their content via yamnet. Yamnet is trained on AudioSet, and provides an easy method for filtering out erroneous sounds like talking, microphone bleed and silence.

This repo is currently a work in progress!

Model Architecture

Model Architecture

About

Neural audio inpainting using a lightweight CNN for real-time inference

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages