Skip to main content
Log in

Accent Issues in Large Vocabulary Continuous Speech Recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper addresses accent1 issues in large vocabulary continuous speech recognition. Cross-accent experiments show that the accent problem is very dominant in speech recognition. Analysis based on multivariate statistical tools (principal component analysis and independent component analysis) confirms that accent is one of the key factors in speaker variability. Considering different applications, we proposed two methods for accent adaptation. When a certain amount of adaptation data was available, pronunciation dictionary modeling was adopted to reduce recognition errors caused by pronunciation mistakes. When a large corpus was collected for each accent type, accent-dependent models were trained and a Gaussian mixture model-based accent identification system was developed for model selection. We report experimental results for the two schemes and verify their efficiency in each situation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Berkling, K., Zissman, M., Vonwiller, J., and Cleirigh, C. (1998). Improving accent identification through knowledge of English syllable structure. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 89–92.

    Google Scholar 

  • Chang, E., Zhou, J., Huang, C., Di, S., and Lee, K.F. (2000). Large vocabulary mandarin speech recognition with different approaches in modeling tones. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 983–986.

    Google Scholar 

  • Chen, T., Huang, C., Chang, E., and Wang, J. (2001). Automatic accent identification using Gaussian mixture models. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Italy.

  • Chen, T., Huang, C., Chang, E., and Wang, J. (2002). On the use of Gaussian mixture model for speaker variability analysis. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 1249–1252.

    Google Scholar 

  • Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38.

    Google Scholar 

  • Fung, P. and Liu, W.K. (1999). Fast accent identification and accented speech recognition. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 221–224.

    Google Scholar 

  • Gales, M.J.F. (2000). Cluster adaptive training of hidden Markov models. IEEE Transactions on Speech and Audio Processing, 8:417–428.

    Google Scholar 

  • Hansen, J.H.L. and Arslan, L.M. (1995). Foreign accent classification using source generator based prosodic features. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 836–839.

    Google Scholar 

  • Hotellings, H. (1933). Analysis of a complex of statistical variables into principle components. J. Educ. Psychol., 24:417–441, 498-520.

    Google Scholar 

  • Hu, Z.H. (1999). Understanding and adapting to speaker variability using correlation-based principal component analysis. PhD Dissertation, Oregon Graduate Institute of Science and Technology.

  • Huang, C., Chang, E., Zhou, J.L., and Lee, K.F. (2000). Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition. Proc. International Conference on Spoken Language Processing, vol. 3, pp. 818–821.

    Google Scholar 

  • Huang, C., Chen, T., Li, S., Chang, E., and Zhou, J.L. (2001). Analysis of speaker variability. Proc. European Conference on Speech Communication and Technology. Denmark, vol. 2, pp. 1377–1380.

    Google Scholar 

  • Huang, C., Chen, T., and Chang, E. (2002) Speaker selection training for large vocabulary continuous speech recognition, Proc. International Conference on Acoustics, Speech, and Signal Processing. Florida, USA. vol. 1, pp. 609–612.

    Google Scholar 

  • Humphries, J.J. and Woodland, P.C. (1998). The use of accentspecific pronunciation dictionaries in acoustic model training. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 317–320.

    Google Scholar 

  • Hyvarinen, A. and Oja, E. (2000). Independent component analysis: algorithms and application. Neural Networks, 13:411–430.

    Google Scholar 

  • Lee, C.-H., Lin C.-H., and Juang, B.-H. (1991). A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Transactions on Signal Processing, 39:806–814.

    Google Scholar 

  • Leggetter, C.J. and Woodland, P.C. (1995). Maximum likely-hood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171–185.

    Google Scholar 

  • Liu, M.K., Xu, B., Huang, T.Y., Deng, Y.G., and Li, C.R. (2000). Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1025–1028.

    Google Scholar 

  • Malayath, N., Hermansky, H., and Kain, A. (1997). Towards decomposing the sources of variability in speech. Proc. European Conference on Speech Communication and Technology, vol. 1, pp. 497–500.

    Google Scholar 

  • Riley, M.D. and Ljolje, A. (1996). Automatic generation of detailed pronunciation lexicon. Automatic Speech and Speaker Recognition: Advanced Topics. Kluwer Academic Press, ch. 12, pp. 285-302.

  • Riley, M.D., Byrne, W., Finke, M., Khudanpur, S., Ljolje, A., McDonough, J., Nock, H., Saraclar, M., Wooters, C., and Zavaliagkos, G. (1999). Stochastic pronunciation modeling from hand-labelled phonetic corpora. Speech Communication, 29:209–224.

    Google Scholar 

  • Strik, H. and Cucchiarini, C. (1998) Modeling pronunciation variation for ASR: Overview and comparison of methods. Proc. ETRW Workshop on Modeling Pronunciation Variation for ASR, Kerkrade, pp. 137-144.

  • Teixeira, C., Trancoso, I., and Serralheiro, A. (1996). Accent identification. Proc. International Conference on Spoken Language Processing, vol. 3, pp. 1784–1787.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, C., Chen, T. & Chang, E. Accent Issues in Large Vocabulary Continuous Speech Recognition. International Journal of Speech Technology 7, 141–153 (2004). https://doi.org/10.1023/B:IJST.0000017014.52972.1d

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJST.0000017014.52972.1d

Navigation