Skip to content

Dataset

Tapan Sharma edited this page Jul 19, 2019 · 1 revision

Dataset

For the original experiments in this project as a PoC, only the speech portion, which consists of read speech (TIMIT recordings), and the set of noises(MUSAN recordings), which ranges from beeps emitted from technical equipment, to ambient sounds such as rain, road , factory noise, etc) were considered. All of the files are available in “.wav” format, are single channel, and are 16-bit sample PCM encoded. All recordings are downsampled to 8kHz sampling rate to speed up the training.

Clean Speech dataset: TIMIT

Noise dataset: MUSAN

However, using a completely different audio dataset for both clean speech and noise recordings is encouraged along with the dataset mentioned above.

Generation of Noisy data:

There can be many methods to generate the noisy audio data including the following:

  • An individual clean speech sample from the speech dataset can be mixed with a randomly chosen noise sample from the noise dataset, after making both samples of the same duration and at the same amplitude level, i.e. both clean speech and noise samples can first be normalized to have the same amplitude and then added together.

Generation of noisy audio

Clone this wiki locally