Features

Praditor

A DBSCAN-Based Automation for Speech Onset Detection

Download Praditor | English · 中文

Features

Praditor is a speech onset detector that helps you find out boundaries between silence and sound automatically.

Praditor works for both single-onset and multi-onset audio files without any language limitation. It generates output as PointTiers in .TextGrid format.

Onset/Offset Detection
Silence Detection

Praditor also allows users to adjust parameters in the Dashboard to get a better performance.

We have prepared test_audio.wav for you to give it a try.

Authors

Praditor is written and maintained by Tony, Liu Zhengyuan from Centre for Cognitive and Brain Sciences, University of Macau.

If you have any questions in terms of how to use Praditor or its algorithm details, or you want me to help you write some additional scripts like export audio files, export Excel tables, feel free to contact me at zhengyuan.liu@connect.um.edu.mo or paradeluxe3726@gmail.com.

How to use Praditor?

1. Import your audio

File -> Read files... -> Select your target audio file

2. Play with Praditor

For onset/offset...

Run Apply Praditor algorithm on the current audio
Prev/Next Go to previous/next audio
Read Read time points from current audio's .TextGrid results
Clear Clear time points that are being displayed (but no change to .TextGrid)
Onset/Offset Show/Hide onsets/offsets

For parameters...

Current/Default Display default parameters or parameters for the current file
Save Save the displayed parameters as Current/Default
Reset Reset the displayed parameters to the last time you saved it.

On the menu...

File > Read files... > Select an audio file
Help > Parameters > Show quick instruction on how our parameters work

In case you want to zoom in/out

Wheel ↑/Wheel ↓ to zoom-in/zoom-out in timeline
Ctrl+Wheel ↑/Wheel ↓ to zoom-in/zoom-out (for Windows users)
Command+Wheel ↑/Wheel ↓ to zoom-in/zoom-out (for Mac users)

How does Praditor work?

The audio signal is first band-pass filtered to remove some high/low frequency noise. Then, it is down sampled with max-pooling strategy (i.e., using the max value to represent each piece).

DBSCAN requires two dimensions. How do we transform 1-D audio signal into 2-D array? For every two consecutive pieces, they are grouped into a point. The point has two dimensions, previous and next frame. On this point array, Praditor applies DBSCAN clustering to these points. Noise points are usually gathered around (0, 0) due to their relatively small amplitudes.

At this point, noise areas are found, which means we have roughly pinpoint the probable locations of onsets (i.e., target area).

We do not continue to use the original amplitudes, but first derivatives. First-derivative thresholding is a common technique in other signal processing areas (e.g., ECG). It keeps the trend but remove the noisy ("spiky") part, which helps to improve the performance.

For every target area, we do the same procedure as below:

Set up a noise reference. It's mean absolute first-derivatives as baseline.
Set up a starting frame as the onset candidate (start from the very next frame from the noise reference).
Scan from the starting frame. We use kernel smoothing to see if the current frame (or actually kernel/window) is valid/invalid.
When we gather enough valid frames, the exact frame/time point we stop is the answer we want. Otherwise, we move on to the next starting frame.

Parameters

HighPass/LowPass

Before we apply down sampling and clustering to the audio signal, a band pass filter is first applied to the original signal. The idea is that we do not need all the frequencies. Too high and too low frequency band can be contaminated.

What we need is the middle part that has high contrast between silence and sound.

Be reminded that the LowPass should not surpass the highest valid frequency (half of the sample rate, refer to Nyquist theorem).

EPS%

DBSCAN clustering requires two parameters: EPS and MinPt. What DBSCAN does is to scan every point, take it as the circle center, and draw a circle with a radius EPS in length. Within that circle, calculate how many points within and count them valid if hit MinPt.

Praditor allows user to adjust EPS%. Since every audio file can have different amplitude level/silence-sound contrast, Praditor determines EPS = Current Audio's Largest Amplitude * EPS%.

RefLen

After Praditor has confirmed target areas, the original amplitudes is the transformed into absolute first-derivatives. For each target area, Praditor would set up a Reference Area, whose mean value serves as the baseline for later thresholding.

The length of this reference area is determined by RefLen. When you want to capture silence that has very short length, it is better that you turn down RefLen a little bit as well.

Threshold

It is the most used parameter. The core idea of thresholding method is about "Hitting the cliff". Whenever a talker speaks, the (absolute) amplitude rises up and creates a "cliff" (in amplitude, or other features).

Threshold has a minimum limitation at 1.00, which is based on the mean value of background-noise reference. However, background noise is not "smoothy" but actually "spiky". That is why Threshold is usually slightly larger than 1.00.

Besides, I would suggest you pay more attention to aspirated sound, as this type of sound has "very slow slope". Too large Threshold can end up in the middle of that "slope" (which is something you don't want). If that's the case, it can sound really weird, like a burst, rather than gradually smooth in.

KernelSize, KernelFrm%

After reference area and threshold are set, Praditor will (1) set up a starting frame (2) begin scan frame by frame (starting from the frame right next to ref area). It will repeat this process until the valid starting frame (i.e., onset) is found.

Usually we would compare the value (absolute 1st derivative) with threshold. If it surpasses, we call it valid; if not, then invalid. But, Praditor does it a little bit differently, using kernel smoothing. Praditor would borrow information from later frames, like setting up a window (kernel) with a length, KernelSize.

To prevent extreme values, Praditor would neglect the first few largest values in the window (kernel). Or, we only retain KernelFrm% of all frames (e.g., 80% of all). If there is actually extreme values, then we successfully avoid them; if not, then it would not hurt since they are among other values at similar level.

CountValid, Penalty

How do we say an onset is an onset? After that onset, lots of frames are above threshold consecutively.

Just as mentioned above, as Praditor scans frame by frame (window by window, or kernel by kernel), each frame is either going to be above or below the threshold. If the current frame (kernel) surpass the threshold, then it's valid and counted as +1; If it fails to surpass, then it's invalid and counted as -1 * Penalty.

Then, Praditor adds them up to get a sum. Whenever the sum hits zero or below zero, the scanning aborts, and we move on to the next starting frame. On other words, we only want a starting frame whose scanning sum stays positive.

Penalty here is like a "knob" for tuning noise sensitivity. Higher Penalty means higher sensitivity to below-threshold frames.

In summary, each scan has a starting frame (i.e., onset candidate). What we do is to check if this "starting frame" is "valid". By saying it "valid", we are saying that scanning sum stays positive and hits CountValid in the end.

Then, we can say, this is the exact time point (onset/offset) we want.

Data and Materials

If you would like to download the datasets that were used in developing Praditor, please refer to our OSF storage .

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
button		button
ffmpeg		ffmpeg
instructions		instructions
menu		menu
playground		playground
pyplot		pyplot
slider		slider
splitter		splitter
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Praditor.py		Praditor.py
QSS.py		QSS.py
QuickPraditor.py		QuickPraditor.py
README.md		README.md
README_zh.md		README_zh.md
core.py		core.py
core_qp.py		core_qp.py
icon.icns		icon.icns
icon.ico		icon.ico
icon.png		icon.png
params.txt		params.txt
test_audio.TextGrid		test_audio.TextGrid
test_audio.txt		test_audio.txt
test_audio.wav		test_audio.wav
tool.py		tool.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Praditor

Features

Authors

How to use Praditor?

1. Import your audio

2. Play with Praditor

How does Praditor work?

Parameters

HighPass/LowPass

EPS%

RefLen

Threshold

KernelSize, KernelFrm%

CountValid, Penalty

Data and Materials

About

Releases 10

Languages

License

Paradeluxe/Praditor

Folders and files

Latest commit

History

Repository files navigation

Praditor

Features

Authors

How to use Praditor?

1. Import your audio

2. Play with Praditor

How does Praditor work?

Parameters

HighPass/LowPass

EPS%

RefLen

Threshold

KernelSize, KernelFrm%

CountValid, Penalty

Data and Materials

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Languages