Skip to main content

The Analysis of Voice Quality in Speech Processing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Abstract

Voice quality has been defined as the characteristic auditory colouring of an individual’s voice, derived from a variety of laryngeal and supralaryngeal features and running continuously through the individual’s speech. The distinctive tone of speech sounds produced by a particular person yields a particular voice. Voice quality is at the centre of several speech processing issues. In speech recognition, voice differences, particularly extreme divergences from the norm, are responsible for known performance degradations. In speech synthesis on the other hand, voice quality is a desirable modelling parameter, with millions of voice types that can be distinguished theoretically. This article reviews the experimental derivation of voice quality markers. Specifically, the use of perceptual judgements, the long-term averaged spectrum (LTAS) and prosodic markers is examined, as well as inverse filtering for the extraction of the glottal source waveform. This review suggests that voice quality is best investigated as a multi-dimensional parameter space involving a combination of factors involving individual prosody, temporally structured speech characteristics, spectral divergence and voice source features, and that it could profitably complement simple linguistic prosodic model processing in speech synthesis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pittam, J.: Voice in Social Interaction: An Interdisciplinary Approach. Language and Language Behaviors 5 (1994)

    Google Scholar 

  2. Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press, Cambridge (1980)

    Google Scholar 

  3. Laver, J.: The Description of Voice Quality in General Phonetic Theory. In: Laver, J. (ed.) The Gift of Speech, pp. 184–208. Edinburgh University Press (1991)

    Google Scholar 

  4. Fritzell, B., Hallén, O., Sundberg, J.: Evaluation of Teflon Injection Procedures for Paralytic Dysphonia. Folia Phoniatrica 26, 414–421 (1974)

    Article  Google Scholar 

  5. Nordenberg, M., Sundberg, J.: Effect on LTAS of Vocal Loudness Variation. TMH/QPSR, 1/2001(2003), Available at: http://www.speech.kth.se/qpsr/tmh/2003/03-45-093-100.pdf

  6. Leino, T.: Long-term Average Spectrum Study on Speaking Voice Quality in Male Actors. In: Friberg, A., Iwarsson, J., Jansson, E., Sundberg, J. (eds.) SMAC 1993 (Proceedings of the Stockholm Music Acoustics Conference, 1993). Stockholm: Publication No. 79, Royal Swedish Academy of Music, pp. 206–210 (1994)

    Google Scholar 

  7. Klasmeyer, G.: An Automatic Description Tool for Time-contours and Long-term Average Voice Features in Large Emotional Speech Databases. SpeechEmotion 2000, pp. 66–71 (2000)

    Google Scholar 

  8. Keller, E.: Voice Characteristics of MARSEC Speakers. VOQUAL: Voice Quality: Functions, Analysis And Synthesis (2003)

    Google Scholar 

  9. Gobl, C., Bennet, E., Ní Chasaide, A.: Expressive Synthesis: How Crucial is Voice Quality. In: Proceedings of the IEEE Workshop on Speech Synthesis. Santa Monica, CA, Paper 52: 1-4 (2002)

    Google Scholar 

  10. Besacier, L.: Un modèle parallèle pour la reconnaissance automatique du locuteur. Doctoral Thesis, University of Avignon, France (1998)

    Google Scholar 

  11. Zetterholm, E.: A Comparative Survey of Phonetic Features of two Impersonators. Fonetik 44, 129–132 (2002)

    Google Scholar 

  12. Nolan, F., Oh, T.: Identical Twins, Different Voices. Forensic Linguistics 3, 39–49 (1996)

    Google Scholar 

  13. Loakes, D.: A Forensic Phonetic Investigation into the Speech Patterns of Identical and Non-Identical Twins. In: Proceedings of 15th ICPhS. Barcelona, pp. 691–694 (2003) ISBN 1-876346-48-5

    Google Scholar 

  14. Zellner Keller, B.: Prosodic Styles and Personality Styles: are the two Interrelated? In: Proceedings of SP2004, Nara, Japan, pp. 383–386 (2004)

    Google Scholar 

  15. Rothenberg, M.: A New Inverse-filtering Technique for Deriving the Glottal Air Flow Waveform During Voicing. J. Acoust. Soc. Am. 53, 1632–1645 (1973)

    Article  Google Scholar 

  16. Fourcin, A.: Electrolaryngographic Assessment of Vocal Fold Function. Journal of Phonetics 14, 435–442 (1986)

    Google Scholar 

  17. Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1960)

    Google Scholar 

  18. Fant, G.: Glottal Flow: Models and Interaction. Journal of Phonetics 14, 393–399 (1986)

    Google Scholar 

  19. Fant, G.: Swedish Vowels and a New Three-Parameter Model. TMH/QPSR, 1/2001 (2001), Available at: http://www.speech.kth.se/qpsr/tmh/2001/01-42-043-049.pdf

  20. Ní Chasaide, A., Gobl, C.: Voice Source Variation. In: Hardcastle, W.J., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 427–461. Blackwell, Malden (1997)

    Google Scholar 

  21. Gobl, C.: The Voice Source in Speech Communication. Doctoral Thesis, KTH Stockholm, Sweden (2003)

    Google Scholar 

  22. Gobl, C.: Speech Production. Voice Source Dynamics in Connected Speech. STL-QPSR 1/1988, 123-159 (1988)

    Google Scholar 

  23. Strik, H., Cranen, B., Boves, L.: Fitting a LF-model to Inverse Filter Signals. In: EUROSPEECH 1993, Berlin, vol. 1, pp. 103–106 (1993)

    Google Scholar 

  24. McKenna, J.G.: Automatic Glottal Closed-Phase Location and Analysis by Kalman Filtering. In: 4th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW4 Proceedings, Perthshire Scotland (2001)

    Google Scholar 

  25. Fu, Q., Murphy, P.: A robust glottal source model estimation technique. In: 8th International Conference on Spoken Language Processing ICSLP, Jeju Island, Korea (2004)

    Google Scholar 

  26. Fant, G., Liljencrants, J., Lin, Q.: A four-parameter model of glottal flow. STL-QPSR, No. 4/1985 (1985)

    Google Scholar 

  27. Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the Glottal Flow Derivative Waveform with Application to Speaker Identification. IEEE Trans. on Speech and Audio Processing 1, 569–586 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Keller, E. (2005). The Analysis of Voice Quality in Speech Processing. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_4

Download citation

  • DOI: https://doi.org/10.1007/11520153_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27441-4

  • Online ISBN: 978-3-540-31886-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics