Accent and Channel Adaptation for Use in a Telephone-Based Spoken Dialog System

Mengistu, Kinfe Tadesse; Wendemuth, Andreas

doi:10.1007/978-3-540-87391-4_52

Kinfe Tadesse Mengistu¹ &
Andreas Wendemuth¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

949 Accesses

Abstract

An utterance conveys not only the intended message but also information about the speaker’s gender, accent, age group, etc. In a spoken dialog system, these information can be used to improve speech recognition for a target group of users that share common vocal characteristics. In this paper, we describe various approaches to adapt acoustic models trained on native English data to the vocal characteristics of German-accented English speakers. We show that significant performance boost can be achieved by using speaker adaptation techniques such as Maximum Likelihood Linear Regression (MLLR), Maximum a Posteriori (MAP) adaptation, and a combination of the two for the purpose of accent adaptation. We also show that promising performance gain can be obtained through cross-language accent adaptation, where native German speech from a different application domain is used as enrollment data. Moreover, we show the use of MLLR for telephone channel adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Huang, C., Chang, E., Chen, T.: Accent Issues in Large Vocabulary Continuous Speech Recognition. Microsoft Research China, Technical Report, MSR-TR-2001-69 (2001)
Google Scholar
Tomokiyo, L.M.: Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition. Ph.D. thesis, Carnige Mellon University (2001)
Google Scholar
Wang, Z., Schultz, T., Waibel, A.: Comparison of Acoustic Model Adaptation Techniques on Non-native Speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 540–543 (2003)
Google Scholar
Tomokiyo, L.M., Waibel, A.: Adaptation Methods for Non-native Speech. In: Proceedings of the Workshop on Multilinguality in Spoken Language Processing, Aalborg (2001)
Google Scholar
Huang, C., Chang, E., Zhou, J., Lee, K.: Accent Modeling Based on Pronunciation Dictionary Adaptation for Large Vocabulary Mandarin Speech Recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 818–821 (2000)
Google Scholar
Liu, W.K., Fung, P.: MLLR-Based Accent Model Adaptation without Accented Data. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 738–741 (2000)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book. Revised for HTK Version 3.4. Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Leggetter, C., Woodland, C.P.: Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression. In: Proceedings of Eurospeech 1995, pp. 1155–1158 (1995)
Google Scholar
Leggetter, C., Woodland, C.P.: Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. Computer Speech and Language 9, 171–185 (1995)
Article Google Scholar
Walker, M., Aberdeen, J., Sanders, G.: 2001 Communicator Evaluation. Linguistic Data Consortium, Philadelphia (2003)
Google Scholar
Alsteris, L.D., Paliwal, K.K.: Evaluation of the Modified Group Delay Feature for Isolated Word Recognition. In: Proceedings of International Symposium on Signal Processing and Its Applications (ISSPA), pp. 715–718 (2005)
Google Scholar
He, X., Zhao, Y.: Model Complexity Optimization for Non-native English Speakers. In: Proceedings of Eurospeech 2001, vol. 2, pp. 1461–1464 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive Systems Group, FEIT-IESK, Otto-von-Guericke University, Universitätsplatz 2, 39106, Magdeburg, Germany
Kinfe Tadesse Mengistu & Andreas Wendemuth

Authors

Kinfe Tadesse Mengistu
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Wendemuth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mengistu, K.T., Wendemuth, A. (2008). Accent and Channel Adaptation for Use in a Telephone-Based Spoken Dialog System. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accent and Channel Adaptation for Use in a Telephone-Based Spoken Dialog System