-
Notifications
You must be signed in to change notification settings - Fork 4
Dataset
Tapan Sharma edited this page Jul 19, 2019
·
1 revision
For the original experiments in this project as a PoC, only the speech portion, which consists of read speech (TIMIT recordings), and the set of noises(MUSAN recordings), which ranges from beeps emitted from technical equipment, to ambient sounds such as rain, road , factory noise, etc) were considered. All of the files are available in “.wav” format, are single channel, and are 16-bit sample PCM encoded. All recordings are downsampled to 8kHz sampling rate to speed up the training.
Clean Speech dataset: TIMIT
Noise dataset: MUSAN
However, using a completely different audio dataset for both clean speech and noise recordings is encouraged along with the dataset mentioned above.
There can be many methods to generate the noisy audio data including the following:
- An individual clean speech sample from the speech dataset can be mixed with a randomly chosen noise sample from the noise dataset, after making both samples of the same duration and at the same amplitude level, i.e. both clean speech and noise samples can first be normalized to have the same amplitude and then added together.