Skip to main content
Log in

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

  • Original Paper
  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system.

Furthermore, we take into account the written form of the spoken words: non-native speakers may rely on the writing of the words in order to pronounce them. This approach does not provide any further improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. For native speakers it would be more efficient to use a native ASR.

References

  • Bartkova, K., & Jouvet, D. (2006). Using multilingual units for improved modeling of pronunciation variants. In Proceedings IEEE international conference on acoustic, speech and signal processing, Toulouse, France.

    Google Scholar 

  • Bartkova, K., & Jouvet, D. (2007). On using units trained on foreign data for improved multiple accent speech recognition. Speech Communication, 49, 836–846.

    Article  Google Scholar 

  • Bisani, M., & Ney, H. (2003). Multigram-based grapheme-to-phoneme conversion for LVCSR. In Proceedings Interspeech.

    Google Scholar 

  • Bouselmi, G., Fohr, D., Illina, I., & Haton, J.-P. (2005). Fully automated non-native speech recognition using confusion-based acoustic model integration. In Proceedings Interspeech, Lisboa.

    Google Scholar 

  • Bouselmi, G., Fohr, D., Illina, I., & Haton, J.-P. (2006). Fully automated non-native speech recognition using confusion-based acoustic model integration and graphemic constraints. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 345–348), Toulouse, France.

    Google Scholar 

  • Bouselmi, G., Fohr, D., & Illina, I. (2007). Combined acoustic and pronunciation modelling for non-native speech recognition. In Proceedings Interspeech (pp. 1449–1452), Antwerp, Belgium.

    Google Scholar 

  • Clarke, C., & Jurafsky, D. (2006). Limitations of MLLR adaptation with Spanish-accented English: an error analysis. In Proceedings international conference on spoken language processing (pp. 1117–1120), Pittsburgh, PA, USA.

    Google Scholar 

  • Coile, B. V. (1990). Inductive learning of grapheme-to-phoneme rules. In Proceedings international conference on spoken language processing.

    Google Scholar 

  • Compernolle, D. V. (2001). Recognizing speech of goats, wolves, sheep and … non-natives. Speech Communication, 35(1–2), 71–79.

    Article  MATH  Google Scholar 

  • Cremelie, N., & Martens, J.-P. (1997). Automatic rule based generation of word pronunciation networks. In Proceedings of Eurospeech97 (pp. 2459–2462).

    Google Scholar 

  • Cremelie, N., & Martens, J.-P. (1999). In search of better pronunciation models for speech recognition. Speech Communication, 29(2–4), 115–136.

    Article  Google Scholar 

  • Flege, J., Schirru, C., & MacKay, I. (2003). Interaction between the native and second language phonetic subsystems. Speech Communication, 40, 467–491.

    Article  Google Scholar 

  • Gillick, L., & Cox, S. (1989). Some statistical issues in the comparison of speech recognition algorithms. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 532–535).

    Google Scholar 

  • Goronzy, S., Rapp, S., & Kompe, R. (2004). Generating non-native pronunciation variants for lexicon adaptation. Speech Communication, 42, 109–123.

    Article  Google Scholar 

  • He, X., & Zhao, Y. (2003). Fast model selection based speaker adaptation for non native speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 298–307.

  • Jeffers, R. J., & Lehiste, I. (1979). Principles and methods for historical linguistics. Cambridge: MIT Press.

    Google Scholar 

  • Jurafsky, D., Ward, W., Jianping, Z., Herold, K., Xiuyang, Y., & Sen, Z. (2001). What kind of pronunciation variation is hard for triphones to model. In Proceedings IEEE international conference on acoustic, speech and signal processing.

    Google Scholar 

  • Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford: Blackwell Publishers.

    Google Scholar 

  • Lawson, A., Harris, D., & Grieco, J. (2003). Effect of foreign accent on speech recognition in the NATO N-4 corpus. In Proceedings interspeech (pp. 1505–1508), Geneva, Switzerland.

    Google Scholar 

  • Livescu, K., & Glass, J. (2000). Lexical modeling of non-native speech for automatic speech recognition. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 1683–1686), Istanbul, Turkey.

    Google Scholar 

  • Minematsu, N., Osaki, K., & Hirose, K. (2003). Improvement of non-native speech recognition by effectively modeling frequently observed pronunciation habits. In Proceedings interspeech (pp. 2597–2600), Geneva, Switzerland.

    Google Scholar 

  • Morgan, J. (2004). Making a speech recognizer tolerate non-native speech through Gaussian mixture merging. In Proceedings InSTIL/ICALL (pp. 213–216), Venice, Italy.

    Google Scholar 

  • Oh, Y. R., Yoon, J. S., & Kim, H. K. (2007). Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Communication, 49, 59–70.

    Article  Google Scholar 

  • Raux, A. (2004). Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition. In Proceedings international conference on spoken language processing (pp. 613–616), Jeju Island, Korea.

    Google Scholar 

  • Saraclar, M., Nock, H., & Khudanpur, S. (2000). Pronunciation modeling by sharing Gaussian densities across phonetic models. Computer Speech & Language, 14, 137–160.

    Article  Google Scholar 

  • Schaden, S. (2003). Generating non-native pronunciation lexicons by phonological rules. In Proceedings ICPhS (pp. 2545–2548).

    Google Scholar 

  • Stouten, F., & Martens, J.-P. (2007). Recognition of foreign names spoken by native speakers. In Proceedings Interspeech (pp. 2133–2136), Antwerp, Belgium.

    Google Scholar 

  • Tomokiyo, M., & Waibel, A. (2001). Adaptation methods for non-native speech. In Multilinguality in spoken language processing (pp. 137–140), Aalborg, Denmark.

    Google Scholar 

  • University, C. M. (1998). The CMU pronouncing dictionary v.0.6d. http://www.speech.cs.cmu.edu/.

Download references

Acknowledgements

This work was partially funded by the European project HIWIRE (Human Input that Works In Real Environments), contract number 507943, sixth framework program, information society technologies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. Illina.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouselmi, G., Fohr, D. & Illina, I. Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling. Int J Speech Technol 15, 203–213 (2012). https://doi.org/10.1007/s10772-012-9134-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9134-8

Keywords

Navigation