Abstract
This paper addresses accent1 issues in large vocabulary continuous speech recognition. Cross-accent experiments show that the accent problem is very dominant in speech recognition. Analysis based on multivariate statistical tools (principal component analysis and independent component analysis) confirms that accent is one of the key factors in speaker variability. Considering different applications, we proposed two methods for accent adaptation. When a certain amount of adaptation data was available, pronunciation dictionary modeling was adopted to reduce recognition errors caused by pronunciation mistakes. When a large corpus was collected for each accent type, accent-dependent models were trained and a Gaussian mixture model-based accent identification system was developed for model selection. We report experimental results for the two schemes and verify their efficiency in each situation.
Similar content being viewed by others
References
Berkling, K., Zissman, M., Vonwiller, J., and Cleirigh, C. (1998). Improving accent identification through knowledge of English syllable structure. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 89–92.
Chang, E., Zhou, J., Huang, C., Di, S., and Lee, K.F. (2000). Large vocabulary mandarin speech recognition with different approaches in modeling tones. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 983–986.
Chen, T., Huang, C., Chang, E., and Wang, J. (2001). Automatic accent identification using Gaussian mixture models. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Italy.
Chen, T., Huang, C., Chang, E., and Wang, J. (2002). On the use of Gaussian mixture model for speaker variability analysis. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 1249–1252.
Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38.
Fung, P. and Liu, W.K. (1999). Fast accent identification and accented speech recognition. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 221–224.
Gales, M.J.F. (2000). Cluster adaptive training of hidden Markov models. IEEE Transactions on Speech and Audio Processing, 8:417–428.
Hansen, J.H.L. and Arslan, L.M. (1995). Foreign accent classification using source generator based prosodic features. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 836–839.
Hotellings, H. (1933). Analysis of a complex of statistical variables into principle components. J. Educ. Psychol., 24:417–441, 498-520.
Hu, Z.H. (1999). Understanding and adapting to speaker variability using correlation-based principal component analysis. PhD Dissertation, Oregon Graduate Institute of Science and Technology.
Huang, C., Chang, E., Zhou, J.L., and Lee, K.F. (2000). Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition. Proc. International Conference on Spoken Language Processing, vol. 3, pp. 818–821.
Huang, C., Chen, T., Li, S., Chang, E., and Zhou, J.L. (2001). Analysis of speaker variability. Proc. European Conference on Speech Communication and Technology. Denmark, vol. 2, pp. 1377–1380.
Huang, C., Chen, T., and Chang, E. (2002) Speaker selection training for large vocabulary continuous speech recognition, Proc. International Conference on Acoustics, Speech, and Signal Processing. Florida, USA. vol. 1, pp. 609–612.
Humphries, J.J. and Woodland, P.C. (1998). The use of accentspecific pronunciation dictionaries in acoustic model training. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 317–320.
Hyvarinen, A. and Oja, E. (2000). Independent component analysis: algorithms and application. Neural Networks, 13:411–430.
Lee, C.-H., Lin C.-H., and Juang, B.-H. (1991). A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Transactions on Signal Processing, 39:806–814.
Leggetter, C.J. and Woodland, P.C. (1995). Maximum likely-hood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171–185.
Liu, M.K., Xu, B., Huang, T.Y., Deng, Y.G., and Li, C.R. (2000). Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1025–1028.
Malayath, N., Hermansky, H., and Kain, A. (1997). Towards decomposing the sources of variability in speech. Proc. European Conference on Speech Communication and Technology, vol. 1, pp. 497–500.
Riley, M.D. and Ljolje, A. (1996). Automatic generation of detailed pronunciation lexicon. Automatic Speech and Speaker Recognition: Advanced Topics. Kluwer Academic Press, ch. 12, pp. 285-302.
Riley, M.D., Byrne, W., Finke, M., Khudanpur, S., Ljolje, A., McDonough, J., Nock, H., Saraclar, M., Wooters, C., and Zavaliagkos, G. (1999). Stochastic pronunciation modeling from hand-labelled phonetic corpora. Speech Communication, 29:209–224.
Strik, H. and Cucchiarini, C. (1998) Modeling pronunciation variation for ASR: Overview and comparison of methods. Proc. ETRW Workshop on Modeling Pronunciation Variation for ASR, Kerkrade, pp. 137-144.
Teixeira, C., Trancoso, I., and Serralheiro, A. (1996). Accent identification. Proc. International Conference on Spoken Language Processing, vol. 3, pp. 1784–1787.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Huang, C., Chen, T. & Chang, E. Accent Issues in Large Vocabulary Continuous Speech Recognition. International Journal of Speech Technology 7, 141–153 (2004). https://doi.org/10.1023/B:IJST.0000017014.52972.1d
Issue Date:
DOI: https://doi.org/10.1023/B:IJST.0000017014.52972.1d