The package contains a Matlab (R2012b) implementation of the instantaneous pitch estimation algorithm 'Halcyon'.
The algorithm decomposes the signal into subband components and uses their instantaneous representations in order to evaluate candidate generating function. It is assumed that possible pitch variation range is proportional to pitch value. In order to get accurate estimates robust to rapid variations the analysis of signal is carried out using different time scales for each candidate. The algorithm shows a good frequency resolution for pitch-modulated sounds and performs well both in clean and noisy conditions.
A short algorithm description is given in
Azarov, E., Vashkevich, M. and Petrovsky, A., "Instantaneous Pitch Estimation Algorithm Based on Multirate Sampling", In Proc. ICASSP 2016, pp. 4970-4974.
@inproceedings{Azarov-16,
author={E. {Azarov} and M. {Vashkevich} and A. {Petrovsky}},
title={Instantaneous pitch estimation algorithm based on multirate sampling},
year={2016},
booktitle={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={4970-4974},
doi={10.1109/ICASSP.2016.7472623}}
For period candidates generation we use an autocorrelation-based measure. The following figure compares the PCGFs using in RAPT, IRAPT anf Halcyon algorithms.
Figure 1. Period candidates generation. (a) – source signal, (b) – NCCF, (c) – instantaneous model-based NCCF (IRAPT), (d) – proposed period candidate generation function
The proposed technique is compared with other pitch estimation algorithms in terms of gross pitch error (GPE, %) and mean fine pitch error (MFPE, %).
In order to explore time resolution of the algorithms and their robustness against pitch variations we synthesized artificial signals with changing pitch in the range from 100 to 350 Hz. All obtained measurements were separated into six groups distinguished by variation rate: 0–0.3, 0.3–0.6, 0.6–0.9, 0.9–1.2, 1.2–1.5, >1.5 percent of pitch change per millisecond. Averaged errors are shown in figure 2.
Figure 2. Performance for artificial signals
For natural speech experiments the PTDB-TUG speech database was used. Obtained averaged results for clean speech are given in table 1.
Table 1. Performance for natural speech
|
Male |
Female |
||
GPE |
MFPE |
GPE |
MFPE |
|
RAPT |
3.69 |
1.74 |
6.07 |
1.18 |
YIN |
3.18 |
1.39 |
3.96 |
0.84 |
SWIPE' |
0.756 |
1.51 |
4.27 |
0.80 |
PEFAC |
20.521 |
1.383 |
31.192 |
0.972 |
IRAPT |
1.63 |
1.61 |
3.78 |
0.98 |
Halcyon |
0.743 |
1.268 |
3.600 |
1.039 |