Abstract
Voice quality has been defined as the characteristic auditory colouring of an individual’s voice, derived from a variety of laryngeal and supralaryngeal features and running continuously through the individual’s speech. The distinctive tone of speech sounds produced by a particular person yields a particular voice. Voice quality is at the centre of several speech processing issues. In speech recognition, voice differences, particularly extreme divergences from the norm, are responsible for known performance degradations. In speech synthesis on the other hand, voice quality is a desirable modelling parameter, with millions of voice types that can be distinguished theoretically. This article reviews the experimental derivation of voice quality markers. Specifically, the use of perceptual judgements, the long-term averaged spectrum (LTAS) and prosodic markers is examined, as well as inverse filtering for the extraction of the glottal source waveform. This review suggests that voice quality is best investigated as a multi-dimensional parameter space involving a combination of factors involving individual prosody, temporally structured speech characteristics, spectral divergence and voice source features, and that it could profitably complement simple linguistic prosodic model processing in speech synthesis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pittam, J.: Voice in Social Interaction: An Interdisciplinary Approach. Language and Language Behaviors 5 (1994)
Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press, Cambridge (1980)
Laver, J.: The Description of Voice Quality in General Phonetic Theory. In: Laver, J. (ed.) The Gift of Speech, pp. 184–208. Edinburgh University Press (1991)
Fritzell, B., Hallén, O., Sundberg, J.: Evaluation of Teflon Injection Procedures for Paralytic Dysphonia. Folia Phoniatrica 26, 414–421 (1974)
Nordenberg, M., Sundberg, J.: Effect on LTAS of Vocal Loudness Variation. TMH/QPSR, 1/2001(2003), Available at: http://www.speech.kth.se/qpsr/tmh/2003/03-45-093-100.pdf
Leino, T.: Long-term Average Spectrum Study on Speaking Voice Quality in Male Actors. In: Friberg, A., Iwarsson, J., Jansson, E., Sundberg, J. (eds.) SMAC 1993 (Proceedings of the Stockholm Music Acoustics Conference, 1993). Stockholm: Publication No. 79, Royal Swedish Academy of Music, pp. 206–210 (1994)
Klasmeyer, G.: An Automatic Description Tool for Time-contours and Long-term Average Voice Features in Large Emotional Speech Databases. SpeechEmotion 2000, pp. 66–71 (2000)
Keller, E.: Voice Characteristics of MARSEC Speakers. VOQUAL: Voice Quality: Functions, Analysis And Synthesis (2003)
Gobl, C., Bennet, E., Ní Chasaide, A.: Expressive Synthesis: How Crucial is Voice Quality. In: Proceedings of the IEEE Workshop on Speech Synthesis. Santa Monica, CA, Paper 52: 1-4 (2002)
Besacier, L.: Un modèle parallèle pour la reconnaissance automatique du locuteur. Doctoral Thesis, University of Avignon, France (1998)
Zetterholm, E.: A Comparative Survey of Phonetic Features of two Impersonators. Fonetik 44, 129–132 (2002)
Nolan, F., Oh, T.: Identical Twins, Different Voices. Forensic Linguistics 3, 39–49 (1996)
Loakes, D.: A Forensic Phonetic Investigation into the Speech Patterns of Identical and Non-Identical Twins. In: Proceedings of 15th ICPhS. Barcelona, pp. 691–694 (2003) ISBN 1-876346-48-5
Zellner Keller, B.: Prosodic Styles and Personality Styles: are the two Interrelated? In: Proceedings of SP2004, Nara, Japan, pp. 383–386 (2004)
Rothenberg, M.: A New Inverse-filtering Technique for Deriving the Glottal Air Flow Waveform During Voicing. J. Acoust. Soc. Am. 53, 1632–1645 (1973)
Fourcin, A.: Electrolaryngographic Assessment of Vocal Fold Function. Journal of Phonetics 14, 435–442 (1986)
Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1960)
Fant, G.: Glottal Flow: Models and Interaction. Journal of Phonetics 14, 393–399 (1986)
Fant, G.: Swedish Vowels and a New Three-Parameter Model. TMH/QPSR, 1/2001 (2001), Available at: http://www.speech.kth.se/qpsr/tmh/2001/01-42-043-049.pdf
Ní Chasaide, A., Gobl, C.: Voice Source Variation. In: Hardcastle, W.J., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 427–461. Blackwell, Malden (1997)
Gobl, C.: The Voice Source in Speech Communication. Doctoral Thesis, KTH Stockholm, Sweden (2003)
Gobl, C.: Speech Production. Voice Source Dynamics in Connected Speech. STL-QPSR 1/1988, 123-159 (1988)
Strik, H., Cranen, B., Boves, L.: Fitting a LF-model to Inverse Filter Signals. In: EUROSPEECH 1993, Berlin, vol. 1, pp. 103–106 (1993)
McKenna, J.G.: Automatic Glottal Closed-Phase Location and Analysis by Kalman Filtering. In: 4th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW4 Proceedings, Perthshire Scotland (2001)
Fu, Q., Murphy, P.: A robust glottal source model estimation technique. In: 8th International Conference on Spoken Language Processing ICSLP, Jeju Island, Korea (2004)
Fant, G., Liljencrants, J., Lin, Q.: A four-parameter model of glottal flow. STL-QPSR, No. 4/1985 (1985)
Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the Glottal Flow Derivative Waveform with Application to Speaker Identification. IEEE Trans. on Speech and Audio Processing 1, 569–586 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Keller, E. (2005). The Analysis of Voice Quality in Speech Processing. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_4
Download citation
DOI: https://doi.org/10.1007/11520153_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)