Abstract
In this paper, we describe a full computer-based musical instrument allowing realtime synthesis of expressive singing voice. The expression results from the continuous action of an interpreter through a gestural control interface. In this context, expressive features of voice are discussed. New real-time implementations of a spectral model of glottal flow (CALM) are described. These interactive modules are then used to identify and quantify voice quality dimensions. Experiments are conducted in order to develop a first framework for voice quality control. The representation of vocal tract and the control of several vocal tract movements are explained and a solution is proposed and integrated. Finally, some typical controllers are connected to the system and expressivity is evaluated.
Similar content being viewed by others
10. References
http://www.loquendo.com/. 31
C. d’Alessandro, N. D’Alessandro, S. L. Beux, J. Simko, F. Cetin, and H. Pirker, “The Speech Conductor: Gestural Control of Speech Synthesis”, inProceedings of eNTERFACE’05 Summer Workshop on Multimodal Interfaces, 2005. 31, 32, 33, 37
M. Kob, “Singing Voice Modelling As We Know It Today”,Acta Acustica United with Acustica, vol. 90, pp. 649–661, 2004. 31
http://www.virsyn.de/. 31
http://www.vocaloid.com/. 31
X. Rodet and G. Bennet, “Synthesis of the Singing Voice”,Current Directories in Computer Music Research, 1989. 31
X. Rodet, “Synthesis and Processing of the Singing Voice”, inProceeding of the First IEEE Benelux Workshop on Model-Based Processing and Coding of Audio (MPCA-2002), (Leuven, Belgium), 2002. 31
P. Cook,Identification of Control Parameters in an Articulatory Vocal Tract Model, with Applications to the Synthesis of Singing. Ph.d. thesis, Standford University, 1990. 31
J. Moorer, “The Use of the Phase Vocoder in Computer Music Application”,Journal of the Audio Engineering Society, vol. 26, no. 1–2, pp. 42–45, 1978. 32
J. Laroche, Y. Stylianou, and E. Moulines, “HNS: Speech Modifications Based on a Harmonic plus Noise Model”, inProceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 550–553, 1993. 32
M. Macon, L. Jensen-Link, J. Oliviero, M. Clements, and E. George, “A Singing Voice Synthesis System Based on Sinusoidal Modeling”, inProceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 435–438, 1997. 32
K. Lomax,The Analysis and the Synthesis of the Singing Voice. Ph.d. thesis, Oxford University, 1997. 32
Y. Meron,High Quality Singing Synthesis Using the Selection-Based Synthesis Scheme. Ph.d. thesis, University of Michigan, 2001. 32
P. Cano, A. Loscos, J. Bonada, M. de Boer, and X. Serra, “Voice Morphing System for Impersonating in Karaoke Applications”, inProceedings of the International Computer Music Conference, 2000. 32
B. Doval, C. d’Alessandro, and N. Henrich, “The spectrum of glottal flow models”,Acta Acustica, vol. 92, pp. 1026–1046, 2006. 32
L. Kessous, “A two-handed controller with angular fundamental frequency control and sound color navigation”, inProceedings of the 2002 Conference on New Interfaces for Musical Expression (NIME-02), 2002. 32
G. Fant, J. Liljencrants, and Q. Lin, “A four-parameter model of glottal flow”,STL-QPSR, vol. 4, pp. 1–13, 1985. 32
B. Doval and C. d’Alessandro, “The voice source as a causal/anticausal linear filter”, inproc. Voqual’03, Voice Quality: Functions, analysis and synthesis, ISCA workshop, (Geneva, Switzerland), Aug. 2003. 32, 33
B. Larson, “Music and Singing Synthesis Equipment (MUSSE)”,Speech Transmission Laboratory Quarterly Progress and Statut Report (STL-QPSR), pp. (1/1977):38–40, 1977. 32
P. Cook, “SPASM: a Real-Time Vocal Tract Physical Model Editor/Controller and Singer: the Companion Software System”, inColloque sur les Modèles Physiques dans l’Analyse, la Production et la Création Sonore, 1990. 32
J. O. Smith, “Waveguide Filter Tutorial”, inProceedings of the International Computer Music Conference, pp. 9–16, 1987. 32
V. Välimäki and M. Karjalainen, “Improving the Kelly-Lochbaum Vocal Tract Model Using Conical Tubes Sections and Fractionnal Delay Filtering Techniques”, inProceedings of the International Conference on Spoken Language Processing, 1994. 32
X. Rodet, “Time-Domain Formant Wave Function Synthesis”, vol. 8, no. 3, pp. 9–14, 1984. 32
X. Rodet and J. Barriere, “The CHANT Project: From the Synthesis of the Singing Voice to Synthesis in General”,Computer Music Journal, vol. 8, no. 3, pp. 15–31, 1984. 32
N. Henrich,Etude de la source glottique en voix parlée et chantée. Ph.d. thesis, Université Paris 6, France, 2001. 32, 34
G. Fant,Acoustic theory of speech production. Mouton, La Hague, 1960. 32
D. Klatt and L. Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers”,J. Acous. Soc. Am., vol. 87, no. 2, pp. 820–857, 1990. 32, 34
R. Veldhuis, “A Computationally Efficient Alternative for the Liljencrants-Fant Model and its Perceptual Evaluation”,J. Acous. Soc. Am., vol. 103, pp. 566–571, 1998. 32
A. Rosenberg, “Effect of Glottal Pulse Shape on the Quality of Natural Vowels”,J. Acous. Soc. Am., vol. 49, pp. 583–590, 1971. 32
G. Fant, “The LF-Model Revisited. Transformations and Frequency Domain Analysis”,STL-QPSR, 1995. 32
B. Bozkurt,Zeros of the Z-Transform (ZZT) Representation and Chirp Group Delay Processing for the Analysis of Source and Filter Characteristics of Speech Signals. PhD thesis, Faculté Polytechnique de Mons, 2004. 33
N. D’Alessandro, C. d’Alessandro, S. Le Beux, and B. Doval, “Realtime CALM Synthesizer, New Approaches in Hands-Controlled Voice Synthesis”, inNIME’06, 6th international conference on New Interfaces for Musical Expression, (IRCAM, Paris, France), pp. 266–271, 2006. 33, 34, 37
D. Zicarelli, G. Taylor, J. Clayton, jhno, and R. Dudas,Max 4.3 Reference Manual. Cycling’74 / Ircam, 1993–2004. 33
D. Zicarelli, G. Taylor, J. Clayton, jhno, and R. Dudas,MSP 4.3 Reference Manual. Cycling’74 / Ircam, 1997–2004. 33
M. Puckette,Pd Documentation. 2006. http://puredata.info. 33
C. d’Alessandro, N. D’Alessandro, S. L. Beux, and B. Doval, “Comparing Time-Domain and Spectral-Domain Voice Source Models for Gesture Controlled Vocal Instruments”, inProc. of the 5th International Conference on Voice Physiology and Biomechanics, 2006. 34, 37
R. Schulman, “Articulatory dynamics of loud and normal speech”,J. Acous. Soc. Am., vol. 85, no. 1, pp. 295–312, 1989. 34
H. M. Hanson,Glottal characteristics of female speakers. Ph.d. thesis, Harvard University, 1995. 34
H. M. Hanson, “Glottal characteristics of female speakers: Acoustic correlates”,J. Acous. Soc. Am., vol. 101, pp. 466–481, 1997. 34
H. M. Hanson and E. S. Chuang, “Glottal characteristics of male speakers: Acoustic correlates and comparison with female data”,J. Acous. Soc. Am., vol. 106, no. 2, pp. 1064–1077, 1999. 34
M. Castellengo, B. Roubeau, and C. Valette, “Study of the acoustical phenomena characteristic of the transition between chest voice and falsetto”, inProc. SMAC 83, vol.1, (Stockholm, Sweden), pp. 113–23, July 1983. 34
P. Alku and E. Vilkman, “A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers”,Folia Phoniatr., vol. 48, pp. 240–54, 1996. 34
H. Traunmüller and A. Eriksson, “Acoustic effects of variation in vocal effort by men, women, and children”,J. Acous. Soc. Am., vol. 107, no. 6, pp. 3438–51, 2000. 34
N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of non-pathological phonation”,J. Acous. Soc. Am., vol. 115, pp. 1321–1332, Mar. 2004. 34
N. Henrich, C. d’Alessandro, M. Castellengo, and B. Doval, “Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency”,J. Acous. Soc. Am., vol. 117, pp. 1417–1430, Mar. 2005. 34
N. Henrich, G. Sundin, D. Ambroise, C. d’Alessandro, M. Castellengo, and B. Doval, “Just noticeable differences of open quotient and asymmetry coefficient in singing voice”,Journal of Voice, vol. 17, no. 4, pp. 481–494, 2003. 35
N. Henrich, “Mirroring the voice from garcia to the present day: Some insights into singing voice registers”,Logopedics Phoniatrics Vocology, vol. 31, pp. 3–14, 2006. 35
G. Bloothooft, M. van Wijck, and P. Pabon, “Relations between Vocal Registers in Voice Breaks”, inProceedings of Eurospeech, 2001. 35
J. D. Markel and A. H. Gray,Linear prediction of speech. Springer-Verlag, Berlin, 1976. 36
B. Story, “Physical modeling of voice and voice quality”, inproc. Voqual’03, Voice Quality: Functions, analysis and synthesis, ISCA workshop, (Geneva, Switzerland), Aug. 2003. 36
G. Carlsson and J. Sundberg, “Formant frequency tuning in singing”,J. Voice, vol. 6, no. 3, pp. 256–60, 1992. 36
http://www.vrealities.com/P5.html. 37
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
D’Alessandro, N., Woodruff, P., Fabre, Y. et al. Realtime and accurate musical control of expression in singing synthesis. J Multimodal User Interfaces 1, 31–39 (2007). https://doi.org/10.1007/BF02884430
Issue Date:
DOI: https://doi.org/10.1007/BF02884430