(A partial re-implementation of Perry Cook's SPASM singing voice synthesizer.)
This project aims to implement the core of Perry Cook’s singing voice synthesizer [1] in a Web environment to provide synthesis functionalities for web applications.
Cook’s synthesizer models the vocal tract as a set of equal-length cylindrical tubes with different radii. The length of each section is given by c/Fs
, where c
is the speed of sound and Fs
the sampling frequency [2]. The propagation of the sound is implemented by digital waveguide model with scattering junctions, and each section is modelled as a one-sample delay line.
Conversely, the glottal source is not physically modelled. A pre-calculated wavetable is used to excite the artificial vocal tract. Rosenberg's [3] cubic function modelled glottal source is adopted here.
The project is planned to be completed in a Pure Data patch, which is loaded within a web interface using WebPd [4], as Pure Data is a general programming environment and is able to resemble the schematic diagram in Cook’s paper. But the experimentation of delay line implementation reveals some difficulties:
- Pure Data does not provide sample-level control of the delay line.
- Using one-zero filter
y[n] = x[n-1]
causes a "signal loop" problem as Pure Data cannot figure out which object to compute at first. One-zero filter, unlike delay line, does not provide information about causality. - Replacing the one-sample delay line with a pair of
send~
andreceive~
solves the problem, asreceive~
receives the signal with one-block delay (typically 64 samples, but adjustable withblock~
). Correspondingly, WebPd can be configured in the same way that it computes the patch in a sample-wise manner, and concatenates them to send to JavaScript processing block (256~16384 samples). This approach works in Pure Data, however, it suffers from a performance problem in WebPd environment, as JavaScript is slow in numeric computation and there lacks proper optimization for number-based blocks: For example, to compute1+k
needs a+
, which requires extra memory allocation and storage in WebPd, and slows down the process. The limit for WebPd on a 2.6GHz Intel Core i7 computer is 4 sections, which cannot satisfy the requirement of 9 sections.
Therefore, the final implementation is written in pure JavaScript. The one-sample delay lines are implemented by temporary variables. The controls are exposed in the web interface, and updated every block (1024 samples). A visualization of the cylindrical sections is displayed at the right.
The model parameters for vowels /a/, /e/, /i/, /o/, /u/ are experimented through a trial-and-error process. The glottal pulse is pre-calculated as a wave table for lookup. Vibrato is implemented also in a wave table lookup manner. /a/, /o/ and /u/ are fairly recognizable.
[1] Cook, P. 1992. SPASM: A Real-Time Vocal Tract Physical Model Editor/Controller and Singer: the Companion Software Synthesis System. Computer Music Journal 17(1): 30-44.
[2] Cook, P. 1991. Identification of Control Parameters in an Articulatory Vocal Tract Model, with Applications to the Synthesis of Singing.
[3] Rosenberg, A. 1971. Effect of glottal pulse shape on the quality of naturalvowels. Journal of the Acoustical Society of America 49(2B): 583-90.
[4] WebPd, a 100% JavaScript Pure Data runtime. https://github.com/sebpiq/WebPd Accessed Mar. 17th, 2015.