Skip to main content

Accent and Channel Adaptation for Use in a Telephone-Based Spoken Dialog System

  • Conference paper
Text, Speech and Dialogue (TSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

  • 949 Accesses

Abstract

An utterance conveys not only the intended message but also information about the speaker’s gender, accent, age group, etc. In a spoken dialog system, these information can be used to improve speech recognition for a target group of users that share common vocal characteristics. In this paper, we describe various approaches to adapt acoustic models trained on native English data to the vocal characteristics of German-accented English speakers. We show that significant performance boost can be achieved by using speaker adaptation techniques such as Maximum Likelihood Linear Regression (MLLR), Maximum a Posteriori (MAP) adaptation, and a combination of the two for the purpose of accent adaptation. We also show that promising performance gain can be obtained through cross-language accent adaptation, where native German speech from a different application domain is used as enrollment data. Moreover, we show the use of MLLR for telephone channel adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huang, C., Chang, E., Chen, T.: Accent Issues in Large Vocabulary Continuous Speech Recognition. Microsoft Research China, Technical Report, MSR-TR-2001-69 (2001)

    Google Scholar 

  2. Tomokiyo, L.M.: Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition. Ph.D. thesis, Carnige Mellon University (2001)

    Google Scholar 

  3. Wang, Z., Schultz, T., Waibel, A.: Comparison of Acoustic Model Adaptation Techniques on Non-native Speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 540–543 (2003)

    Google Scholar 

  4. Tomokiyo, L.M., Waibel, A.: Adaptation Methods for Non-native Speech. In: Proceedings of the Workshop on Multilinguality in Spoken Language Processing, Aalborg (2001)

    Google Scholar 

  5. Huang, C., Chang, E., Zhou, J., Lee, K.: Accent Modeling Based on Pronunciation Dictionary Adaptation for Large Vocabulary Mandarin Speech Recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 818–821 (2000)

    Google Scholar 

  6. Liu, W.K., Fung, P.: MLLR-Based Accent Model Adaptation without Accented Data. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 738–741 (2000)

    Google Scholar 

  7. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book. Revised for HTK Version 3.4. Cambridge University Engineering Department, Cambridge (2006)

    Google Scholar 

  8. Leggetter, C., Woodland, C.P.: Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression. In: Proceedings of Eurospeech 1995, pp. 1155–1158 (1995)

    Google Scholar 

  9. Leggetter, C., Woodland, C.P.: Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. Computer Speech and Language 9, 171–185 (1995)

    Article  Google Scholar 

  10. Walker, M., Aberdeen, J., Sanders, G.: 2001 Communicator Evaluation. Linguistic Data Consortium, Philadelphia (2003)

    Google Scholar 

  11. Alsteris, L.D., Paliwal, K.K.: Evaluation of the Modified Group Delay Feature for Isolated Word Recognition. In: Proceedings of International Symposium on Signal Processing and Its Applications (ISSPA), pp. 715–718 (2005)

    Google Scholar 

  12. He, X., Zhao, Y.: Model Complexity Optimization for Non-native English Speakers. In: Proceedings of Eurospeech 2001, vol. 2, pp. 1461–1464 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mengistu, K.T., Wendemuth, A. (2008). Accent and Channel Adaptation for Use in a Telephone-Based Spoken Dialog System. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87391-4_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87390-7

  • Online ISBN: 978-3-540-87391-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics