Skip to main content

Speaker Characteristics

  • Chapter
Speaker Classification I

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

Abstract

In this chapter, we give a brief introduction to speech-driven applications in order to motivate why it is desirable to automatically recognize particular speaker characteristics from speech. Starting from these applications, we derive what kind of characteristics might be useful. After categorizing relevant speaker characteristics, we describe in more detail language, accent, dialect, idiolect, and sociolect. Next, we briefly summarize classification approaches to illustrate how these characteristics can be recognized automatically, and conclude with a practical example of a system implementation that performs well on the classification of various speaker characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sacks, O.W.: The Man who Mistook His Wife for a Hat - and other Clinical Trials. New York (summit Books) (1985)

    Google Scholar 

  2. Krauss, R.M., Freyberg, R., Morsella, E.: Inferring speakers’ physical attributes from their voices. Journal of Experimental Social Psychology 38, 618–625 (2002)

    Article  Google Scholar 

  3. Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship. MIT Press, Cambridge (2005)

    Google Scholar 

  4. Sproat, R.: Review in Computational Linguist 17.65 on Nass and Brave 2005. Linguist List 17.65 (2006), http://linguistlist.org/issues/17/17-65.html

  5. Nass, C., Gong, L.: Speech Interfaces from an Evolutionary Perspective: Social Psychological Research and Design Implications. Communications of the ACM 43(9), 36–43 (2000)

    Article  Google Scholar 

  6. Nass, C., Lee, K.M.: Does computer-generated speech manifest personality? an experimental test of similarity-attraction. In: CHI 2000. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 329–336. ACM Press, New York (2000)

    Google Scholar 

  7. Tokuda, K.: Hidden Markov model-based Speech Synthesis as a Tool for constructing Communicative Spoken Dialog Systems. In: Proc. 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Special Session on Speech Communication: Communicative Speech Synthesis and Spoken Dialog, invited paper, Honolulu, Hawaii (2006)

    Google Scholar 

  8. Doddington, G.: Speaker Recognition - Identifying People by their Voices. Proceedings of the IEEE 73(11), 1651–1664 (1985)

    Article  Google Scholar 

  9. Meng, H., Li, D.: Multilingual Spoken Dialog Systems. In: Multilingual Speech Processing, pp. 399–447. Elsevier, Academic Press (2006)

    Google Scholar 

  10. Seneff, S., Hirschman, L., Zue, V.W.: Interactive problem solving and dialogue in the ATIS domain. In: Proceedings of the Fourth DARPA Speech and Natural Language Workshop, Defense Advanced Research Projects Agency, pp. 1531–1534. Morgan Kaufmann, Pacific Grove (1991)

    Google Scholar 

  11. Rudnicky, A., Thayer, E., Constantinides, P., Tchou, C., Shern, R., Lenzo, K., Xu, W., Oh, A.: Creating natural dialogs in the Carnegie Mellon Communicator system. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Budapest, Hungary, pp. 1531–1534 (1999)

    Google Scholar 

  12. Litman, D., Forbes, K.: Recognizing Emotions from Student Speech in Tutoring Dialogues. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, St. Thomas, Virgin Islands (2003)

    Google Scholar 

  13. Zue, V., Seneff, S., Glass, J., Polifroni, J., Pao, C., Hazen, T., Hetherington, L.: JUPITER: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing 8(1) (2000)

    Google Scholar 

  14. Hazen, T., Jones, D., Park, A., Kukolich, L., Reynolds, D.: Integration of Speaker Recognition into Conversational Spoken Dialog Systems. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Geneva, Switzerland (2003)

    Google Scholar 

  15. Muthusamy, Y.K., Barnard, E., Cole, R.A.: Reviewing Automatic Language Identification. IEEE Signal Processing Magazin (1994)

    Google Scholar 

  16. Gorin, A.L., Riccardi, G., Wright, J.H.: How may I help you? Speech Communication 23(1/2), 113–127 (1997)

    Article  Google Scholar 

  17. Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2004)

    Article  Google Scholar 

  18. Polzin, T., Waibel, A.: Emotion-sensitive Human-Computer Interfaces. In: Proc. ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, Belfast, Northern Ireland (2000)

    Google Scholar 

  19. Raux, A., Langner, B., Black, A.W., Eskenazi, M.: LET’S GO: Improving Spoken Language Dialog Systems for the Elderly and Non-natives. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Geneva, Switzerland (2003)

    Google Scholar 

  20. ELLS: The e-language learning system. ELLS Web-server (2004) (retrieved, December 2006), from http://ott.educ.msu.edu/elanguage/

  21. Eskenazi, M.: Issues in the Use of Speech Recognition for Foreign Language Tutors. Language Learning and Technology Journal 2(2), 62–76 (1999)

    Google Scholar 

  22. Barnard, E., Cloete, J.P.L., Patel, H.: Language and Technology Literacy Barriers to Accessing Government Services. In: Traunmüller, R. (ed.) EGOV 2003. LNCS, vol. 2739, pp. 37–42. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  23. CHIL: Computers in the human interaction loop. CHIL Web-server (2006) (retrieved, December 2006), from http://chil.server.de

  24. Schultz, T., Waibel, A., Bett, M., Metze, F., Pan, Y., Ries, K., Schaaf, T., Soltau, H., Westphal, M., Yu, H., Zechner, K.: The ISL Meeting Room System. In: HSC-2001. Proceedings of the Workshop on Hands-Free Speech Communication, Kyoto, Japan (2001)

    Google Scholar 

  25. Waibel, A., Bett, M., Finke, M., Stiefelhagen, R.: Meeting browser: Tracking and summarizing meetings. In: Penrose, D.E.M. (ed.) Proceedings of the Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, pp. 281–286. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  26. AMI: Augmented multi-party interaction. AMI Web-server (2006) (retrieved, December 2006), from http://amiproject.org/

  27. Vogel, S., Schultz, T., Waibel, A., Yamamoto, S.: Speech-to-Speech Translation. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 317–398 (2006)

    Google Scholar 

  28. GALE: Global autonomous language exploitation. GALE Program (2006) (retrieved, December 2006), from http://www.darpa.mil/ipto/Programs/gale/index.htm

  29. Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. LNCS (LNAI). Springer, Berlin, Heidelberg, New York (2000)

    MATH  Google Scholar 

  30. Waibel, A., Soltau, H., Schultz, T., Schaaf, T., Metze, F.: Multilingual Speech Recognition. In: The Verbmobil Book, Springer, Heidelberg (2000)

    Google Scholar 

  31. McNair, A., Hauptmann, A., Waibel, A., Jain, A., Saito, H., Tebelskis, J.: Janus: A Speech-To-Speech Translation System Using Connectionist And Symbolic Processing Strategies. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Toronto, Canada (1991)

    Google Scholar 

  32. Cincarek, T., Toda, T., Saruwatari, H., Shikano, K.: Acoustic Modeling for Spoken Dialog Systems based on Unsupervised Utterance-based Selective Training. In: ICSLP. Proc. of the International Conference on Spoken Language Processing, Pittsburgh, PA (2006)

    Google Scholar 

  33. Kemp, T., Waibel, A.: Unsupervised Training of a Speech Recognizer using TV Broadcasts. In: ICSLP. Proc. of the International Conference on Spoken Language Processing, Sydney, Australia, pp. 2207–2210 (1998)

    Google Scholar 

  34. Schultz, T., Waibel, A.: Language Independent and Language Adaptive Acoustic Modeling for Speech Recognition. Speech Communication 35(1-2), 31–51 (2001)

    Article  MATH  Google Scholar 

  35. Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Proceedings of Eurospeech (2001)

    Google Scholar 

  36. Goronzy, S., Tomokiyo, L.M., Barnard, E., Davel, M.: Other Challenges: Non-native Speech, Dialects, Accents, and Local Interfaces. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 273–315 (2006)

    Google Scholar 

  37. Jessen, M.: Speaker Classification in Forensic Phonetics and Acoustics. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)

    Google Scholar 

  38. Eriksson, E., Rodman, R., Hubal, R.C.: Emotions in Speech: Juristic Implications. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)

    Google Scholar 

  39. Reynolds, D.: Tutorial on SuperSID. In: JHU 2002 Workshop (2002) (retrieved, December 2006), from http://www.clsp.jhu.edu/ws2002/groups/supersid/SuperSID_Tutorial.pdf

  40. Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., Fischer, K.: The Recognition of Emotion. In: The Verbmobil Book, pp. 122–130. Springer, Heidelberg (2000)

    Google Scholar 

  41. Katzenmaier, M., Schultz, T., Stiefelhagen, R.: Human-Human-Robot Interaction. In: International Conference on Multimodal Interfaces, Penn State University - State College, PA (2004)

    Google Scholar 

  42. Kirchhoff, K.: Language Characteristics. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 5–32 (2006)

    Google Scholar 

  43. Goronzy, S.: Robust Adaptation to Non-Native Accents in Automatic Speech Recognition. LNCS (LNAI), vol. 2560. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  44. Wang, Z., Schultz, T.: Non-Native Spontaneous Speech Recognition through Polyphone Decision Tree Specialization. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Geneva, Switzerland, pp. 1449–1452 (2003)

    Google Scholar 

  45. Fischer, V., Gao, Y., Janke, E.: Speaker-independent upfront dialect adaptation in a large vocabulary continuous speech recognizer. In: ICSLP. Proc. of the International Conference on Spoken Language Processing (1998)

    Google Scholar 

  46. Sancier, M.L., Fowler, C.A.: Gestural drift in bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics 25, 421–436 (1997)

    Article  Google Scholar 

  47. Cohen, P., Dharanipragada, S., Gros, J., Monkowski, M., Neti, C., Roukos, S., Ward, T.: Towards a universal speech recognizer for multiple languages. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 591–598 (1997)

    Google Scholar 

  48. Fügen, C., Stüker, S., Soltau, H., Metze, F., Schultz, T.: Efficient handling of multilingual language models. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 441–446 (2003)

    Google Scholar 

  49. Navrátil, J.: Automatic Language Identification. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 233–272 (2006)

    Google Scholar 

  50. Reynolds, D.: An Overview of Automatic Speaker Recognition Technology. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp. 4072–4075 (2002)

    Google Scholar 

  51. Huang, X.D., Acero, A., Hon, H.-W.: Spoken Language Processing. Prentice Hall PTR, New Jersey (2001)

    Google Scholar 

  52. Reynolds, D.: A Gaussian mixture modeling approach to text-independent using automatic acoustic segmentation. PhD thesis, Georgia Institute of Technology (1993)

    Google Scholar 

  53. Kohler, M.A., Andrews, W.D., Campbell, J.P., Hernander-Cordero, L.: Phonetic Refraction for Speaker Recognition. In: Proceedings of Workshop on Multilingual Speech and Language Processing, Aalborg, Denmark (2001)

    Google Scholar 

  54. Jin, Q., Navratil, J., Reynolds, D., Andrews, W., Campbell, J., Abramson, J.: Cross-stream and Time Dimensions in Phonetic Speaker Recognition. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, HongKong, China (2003)

    Google Scholar 

  55. Campbell, J.P.: Speaker recognition: A tutorial. Proceedings of the IEEE 85, 1437–1462 (1997)

    Article  Google Scholar 

  56. Jin, Q.: Robust Speaker Recognition. PhD thesis, Carnegie Mellon University, Language Technologies Institute, Pittsburgh, PA (2007)

    Google Scholar 

  57. Cimarusti, D., Ives, R.: Development of an automatic identification system of spoken languages: Phase 1. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Paris (1982)

    Google Scholar 

  58. Zissman, M.A.: Language Identification Using Phone Recognition and Phonotactic Language Modeling. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing. vol. 5, pp. 3503–3506. Detroit, MI (1995)

    Google Scholar 

  59. Hazen, T.J., Zue, V.W.: Segment-based automatic language identification. Journal of the Acoustical Society of America 101(4), 2323–2331 (1997)

    Article  Google Scholar 

  60. Navrátil, J.: Spoken language recognition - a step towards multilinguality in speech processing. IEEE Trans. Audio and Speech Processing 9(6), 678–685 (2001)

    Article  Google Scholar 

  61. Parandekar, S., Kirchhoff, K.: Multi-stream language identification using data-driven dependency selection. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing (2003)

    Google Scholar 

  62. Torres-Carrasquillo, P., Reynolds, D., Deller, Jr., J.: Language identification using gaussian mixture model tokenization. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing (2002)

    Google Scholar 

  63. Eady, S.J.: Differences in the f0 patterns of speech: Tone language versus stress language. Language and Speech 25(1), 29–42 (1982)

    Google Scholar 

  64. Schultz, T., Rogina, I.A.W.: Lvcsr-based language identification. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta, Georgia, IEEE (1996)

    Google Scholar 

  65. Schultz, T.: Globalphone: A multilingual text and speech database developed at karlsruhe university. In: ICSLP. Proc. of the International Conference on Spoken Language Processing, Denver, CO (2002)

    Google Scholar 

  66. Jin, Q., Schultz, T., Waibel, A.: Speaker Identification using Multilingual Phone Strings. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL (2002)

    Google Scholar 

  67. NIST: Speaker recognition evaluation plan (1999) (retrieved, December 2006), from http://www.itl.nist.gov/iaui/894.01/spk99/spk99plan.html

  68. Tomokiyo-Mayfield, L.: Recognizing Non-Native Speech: Characterizing and Adapting to Non-Native Usage in LVCSR. PhD thesis, CMU-LTI-01-168, Language Technologies Institute, Carnegie Mellon, Pittsburgh, PA (2001)

    Google Scholar 

  69. Schultz, T., Jin, Q., Laskowski, K., Tribble, A., Waibel, A.: Speaker, accent, and language identification using multilingual phone strings. In: HLT. Proceedings of the Human Language Technologies Conference, San Diego, Morgan Kaufman, San Francisco (2002)

    Google Scholar 

  70. Schultz, T., Jin, Q., Laskowski, K., Tribble, A., Waibel, A.: Improvements in non-verbal cue identification using multilingual phone strings. In: Proceedings of the 40nd Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, The Association for Computational Linguistics (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schultz, T. (2007). Speaker Characteristics. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74200-5_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74186-2

  • Online ISBN: 978-3-540-74200-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics