Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Bouselmi, G.; Fohr, D.; Illina, I.

doi:10.1007/s10772-012-9134-8

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Original Paper
Published: 08 March 2012

Volume 15, pages 203–213, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

G. Bouselmi¹,
D. Fohr¹ &
I. Illina¹

299 Accesses
3 Citations
Explore all metrics

Abstract

This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system.

Furthermore, we take into account the written form of the spoken words: non-native speakers may rely on the writing of the words in order to pronounce them. This approach does not provide any further improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilingual Speech Recognition for Indian Languages

Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications

Speaker-Independent Automatic Speech Recognition System for Mobile Phone Applications in Punjabi

Notes

For native speakers it would be more efficient to use a native ASR.

References

Bartkova, K., & Jouvet, D. (2006). Using multilingual units for improved modeling of pronunciation variants. In Proceedings IEEE international conference on acoustic, speech and signal processing, Toulouse, France.
Google Scholar
Bartkova, K., & Jouvet, D. (2007). On using units trained on foreign data for improved multiple accent speech recognition. Speech Communication, 49, 836–846.
Article Google Scholar
Bisani, M., & Ney, H. (2003). Multigram-based grapheme-to-phoneme conversion for LVCSR. In Proceedings Interspeech.
Google Scholar
Bouselmi, G., Fohr, D., Illina, I., & Haton, J.-P. (2005). Fully automated non-native speech recognition using confusion-based acoustic model integration. In Proceedings Interspeech, Lisboa.
Google Scholar
Bouselmi, G., Fohr, D., Illina, I., & Haton, J.-P. (2006). Fully automated non-native speech recognition using confusion-based acoustic model integration and graphemic constraints. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 345–348), Toulouse, France.
Google Scholar
Bouselmi, G., Fohr, D., & Illina, I. (2007). Combined acoustic and pronunciation modelling for non-native speech recognition. In Proceedings Interspeech (pp. 1449–1452), Antwerp, Belgium.
Google Scholar
Clarke, C., & Jurafsky, D. (2006). Limitations of MLLR adaptation with Spanish-accented English: an error analysis. In Proceedings international conference on spoken language processing (pp. 1117–1120), Pittsburgh, PA, USA.
Google Scholar
Coile, B. V. (1990). Inductive learning of grapheme-to-phoneme rules. In Proceedings international conference on spoken language processing.
Google Scholar
Compernolle, D. V. (2001). Recognizing speech of goats, wolves, sheep and … non-natives. Speech Communication, 35(1–2), 71–79.
Article MATH Google Scholar
Cremelie, N., & Martens, J.-P. (1997). Automatic rule based generation of word pronunciation networks. In Proceedings of Eurospeech97 (pp. 2459–2462).
Google Scholar
Cremelie, N., & Martens, J.-P. (1999). In search of better pronunciation models for speech recognition. Speech Communication, 29(2–4), 115–136.
Article Google Scholar
Flege, J., Schirru, C., & MacKay, I. (2003). Interaction between the native and second language phonetic subsystems. Speech Communication, 40, 467–491.
Article Google Scholar
Gillick, L., & Cox, S. (1989). Some statistical issues in the comparison of speech recognition algorithms. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 532–535).
Google Scholar
Goronzy, S., Rapp, S., & Kompe, R. (2004). Generating non-native pronunciation variants for lexicon adaptation. Speech Communication, 42, 109–123.
Article Google Scholar
He, X., & Zhao, Y. (2003). Fast model selection based speaker adaptation for non native speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 298–307.
Jeffers, R. J., & Lehiste, I. (1979). Principles and methods for historical linguistics. Cambridge: MIT Press.
Google Scholar
Jurafsky, D., Ward, W., Jianping, Z., Herold, K., Xiuyang, Y., & Sen, Z. (2001). What kind of pronunciation variation is hard for triphones to model. In Proceedings IEEE international conference on acoustic, speech and signal processing.
Google Scholar
Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford: Blackwell Publishers.
Google Scholar
Lawson, A., Harris, D., & Grieco, J. (2003). Effect of foreign accent on speech recognition in the NATO N-4 corpus. In Proceedings interspeech (pp. 1505–1508), Geneva, Switzerland.
Google Scholar
Livescu, K., & Glass, J. (2000). Lexical modeling of non-native speech for automatic speech recognition. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 1683–1686), Istanbul, Turkey.
Google Scholar
Minematsu, N., Osaki, K., & Hirose, K. (2003). Improvement of non-native speech recognition by effectively modeling frequently observed pronunciation habits. In Proceedings interspeech (pp. 2597–2600), Geneva, Switzerland.
Google Scholar
Morgan, J. (2004). Making a speech recognizer tolerate non-native speech through Gaussian mixture merging. In Proceedings InSTIL/ICALL (pp. 213–216), Venice, Italy.
Google Scholar
Oh, Y. R., Yoon, J. S., & Kim, H. K. (2007). Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Communication, 49, 59–70.
Article Google Scholar
Raux, A. (2004). Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition. In Proceedings international conference on spoken language processing (pp. 613–616), Jeju Island, Korea.
Google Scholar
Saraclar, M., Nock, H., & Khudanpur, S. (2000). Pronunciation modeling by sharing Gaussian densities across phonetic models. Computer Speech & Language, 14, 137–160.
Article Google Scholar
Schaden, S. (2003). Generating non-native pronunciation lexicons by phonological rules. In Proceedings ICPhS (pp. 2545–2548).
Google Scholar
Stouten, F., & Martens, J.-P. (2007). Recognition of foreign names spoken by native speakers. In Proceedings Interspeech (pp. 2133–2136), Antwerp, Belgium.
Google Scholar
Tomokiyo, M., & Waibel, A. (2001). Adaptation methods for non-native speech. In Multilinguality in spoken language processing (pp. 137–140), Aalborg, Denmark.
Google Scholar
University, C. M. (1998). The CMU pronouncing dictionary v.0.6d. http://www.speech.cs.cmu.edu/.

Download references

Acknowledgements

This work was partially funded by the European project HIWIRE (Human Input that Works In Real Environments), contract number 507943, sixth framework program, information society technologies.

Author information

Authors and Affiliations

Speech Group, LORIA-CNRS & INRIA, BP 239, 54600, Vandoeuvre-les-Nancy, France
G. Bouselmi, D. Fohr & I. Illina

Authors

G. Bouselmi
View author publications
You can also search for this author in PubMed Google Scholar
D. Fohr
View author publications
You can also search for this author in PubMed Google Scholar
I. Illina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. Illina.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouselmi, G., Fohr, D. & Illina, I. Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling. Int J Speech Technol 15, 203–213 (2012). https://doi.org/10.1007/s10772-012-9134-8

Download citation

Received: 11 May 2011
Accepted: 07 February 2012
Published: 08 March 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10772-012-9134-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Abstract

Access this article

Similar content being viewed by others

Multilingual Speech Recognition for Indian Languages

Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications

Speaker-Independent Automatic Speech Recognition System for Mobile Phone Applications in Punjabi

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Abstract

Access this article

Similar content being viewed by others

Multilingual Speech Recognition for Indian Languages

Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications

Speaker-Independent Automatic Speech Recognition System for Mobile Phone Applications in Punjabi

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation