The Analysis of Voice Quality in Speech Processing

Keller, Eric

doi:10.1007/11520153_4

The Analysis of Voice Quality in Speech Processing

Eric Keller²²

Conference paper

1435 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Abstract

Voice quality has been defined as the characteristic auditory colouring of an individual’s voice, derived from a variety of laryngeal and supralaryngeal features and running continuously through the individual’s speech. The distinctive tone of speech sounds produced by a particular person yields a particular voice. Voice quality is at the centre of several speech processing issues. In speech recognition, voice differences, particularly extreme divergences from the norm, are responsible for known performance degradations. In speech synthesis on the other hand, voice quality is a desirable modelling parameter, with millions of voice types that can be distinguished theoretically. This article reviews the experimental derivation of voice quality markers. Specifically, the use of perceptual judgements, the long-term averaged spectrum (LTAS) and prosodic markers is examined, as well as inverse filtering for the extraction of the glottal source waveform. This review suggests that voice quality is best investigated as a multi-dimensional parameter space involving a combination of factors involving individual prosody, temporally structured speech characteristics, spectral divergence and voice source features, and that it could profitably complement simple linguistic prosodic model processing in speech synthesis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pittam, J.: Voice in Social Interaction: An Interdisciplinary Approach. Language and Language Behaviors 5 (1994)
Google Scholar
Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press, Cambridge (1980)
Google Scholar
Laver, J.: The Description of Voice Quality in General Phonetic Theory. In: Laver, J. (ed.) The Gift of Speech, pp. 184–208. Edinburgh University Press (1991)
Google Scholar
Fritzell, B., Hallén, O., Sundberg, J.: Evaluation of Teflon Injection Procedures for Paralytic Dysphonia. Folia Phoniatrica 26, 414–421 (1974)
Article Google Scholar
Nordenberg, M., Sundberg, J.: Effect on LTAS of Vocal Loudness Variation. TMH/QPSR, 1/2001(2003), Available at: http://www.speech.kth.se/qpsr/tmh/2003/03-45-093-100.pdf
Leino, T.: Long-term Average Spectrum Study on Speaking Voice Quality in Male Actors. In: Friberg, A., Iwarsson, J., Jansson, E., Sundberg, J. (eds.) SMAC 1993 (Proceedings of the Stockholm Music Acoustics Conference, 1993). Stockholm: Publication No. 79, Royal Swedish Academy of Music, pp. 206–210 (1994)
Google Scholar
Klasmeyer, G.: An Automatic Description Tool for Time-contours and Long-term Average Voice Features in Large Emotional Speech Databases. SpeechEmotion 2000, pp. 66–71 (2000)
Google Scholar
Keller, E.: Voice Characteristics of MARSEC Speakers. VOQUAL: Voice Quality: Functions, Analysis And Synthesis (2003)
Google Scholar
Gobl, C., Bennet, E., Ní Chasaide, A.: Expressive Synthesis: How Crucial is Voice Quality. In: Proceedings of the IEEE Workshop on Speech Synthesis. Santa Monica, CA, Paper 52: 1-4 (2002)
Google Scholar
Besacier, L.: Un modèle parallèle pour la reconnaissance automatique du locuteur. Doctoral Thesis, University of Avignon, France (1998)
Google Scholar
Zetterholm, E.: A Comparative Survey of Phonetic Features of two Impersonators. Fonetik 44, 129–132 (2002)
Google Scholar
Nolan, F., Oh, T.: Identical Twins, Different Voices. Forensic Linguistics 3, 39–49 (1996)
Google Scholar
Loakes, D.: A Forensic Phonetic Investigation into the Speech Patterns of Identical and Non-Identical Twins. In: Proceedings of 15th ICPhS. Barcelona, pp. 691–694 (2003) ISBN 1-876346-48-5
Google Scholar
Zellner Keller, B.: Prosodic Styles and Personality Styles: are the two Interrelated? In: Proceedings of SP2004, Nara, Japan, pp. 383–386 (2004)
Google Scholar
Rothenberg, M.: A New Inverse-filtering Technique for Deriving the Glottal Air Flow Waveform During Voicing. J. Acoust. Soc. Am. 53, 1632–1645 (1973)
Article Google Scholar
Fourcin, A.: Electrolaryngographic Assessment of Vocal Fold Function. Journal of Phonetics 14, 435–442 (1986)
Google Scholar
Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1960)
Google Scholar
Fant, G.: Glottal Flow: Models and Interaction. Journal of Phonetics 14, 393–399 (1986)
Google Scholar
Fant, G.: Swedish Vowels and a New Three-Parameter Model. TMH/QPSR, 1/2001 (2001), Available at: http://www.speech.kth.se/qpsr/tmh/2001/01-42-043-049.pdf
Ní Chasaide, A., Gobl, C.: Voice Source Variation. In: Hardcastle, W.J., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 427–461. Blackwell, Malden (1997)
Google Scholar
Gobl, C.: The Voice Source in Speech Communication. Doctoral Thesis, KTH Stockholm, Sweden (2003)
Google Scholar
Gobl, C.: Speech Production. Voice Source Dynamics in Connected Speech. STL-QPSR 1/1988, 123-159 (1988)
Google Scholar
Strik, H., Cranen, B., Boves, L.: Fitting a LF-model to Inverse Filter Signals. In: EUROSPEECH 1993, Berlin, vol. 1, pp. 103–106 (1993)
Google Scholar
McKenna, J.G.: Automatic Glottal Closed-Phase Location and Analysis by Kalman Filtering. In: 4th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW4 Proceedings, Perthshire Scotland (2001)
Google Scholar
Fu, Q., Murphy, P.: A robust glottal source model estimation technique. In: 8th International Conference on Spoken Language Processing ICSLP, Jeju Island, Korea (2004)
Google Scholar
Fant, G., Liljencrants, J., Lin, Q.: A four-parameter model of glottal flow. STL-QPSR, No. 4/1985 (1985)
Google Scholar
Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the Glottal Flow Derivative Waveform with Application to Speaker Identification. IEEE Trans. on Speech and Audio Processing 1, 569–586 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Informatique et méthodes mathématiques (IMM), Faculté des Lettres, Université de Lausanne, 1015, Lausanne, Switzerland
Eric Keller

Authors

Eric Keller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS LTCI/TSI Paris, 46 rue Barrault, 75634, Paris Cedex 13, France
Gérard Chollet
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Escola Universitària Politècnica de Mataró, Universitat Politècnica de Catalunya, Barcelona, Spain
Marcos Faundez-Zanuy
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Via S. Allende, 84081, Baronissi, SA, Italy
Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Keller, E. (2005). The Analysis of Voice Quality in Speech Processing. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_4

Download citation

DOI: https://doi.org/10.1007/11520153_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics