Skip to main content
Log in

Tone Modeling for Continuous Mandarin Speech Recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Tone study is very important for Mandarin speech recognition. In this paper, a Mixture Stochastic Polynomial Tone Model (MSPTM) is proposed for tone modeling in continuous Mandarin speech. In this model the pitch contour, main representative of tone pattern, is described as a mixed stochastic trajectory. The mean trajectory is represented by a polynomial function of normalized time while the variance is time varying. Effective training and tone recognition algorithms were developed. The experimental results based on the proposed MSPTM showed 40.7% tone recognition error rate reduction relative to the traditional Hidden Markov Model (HMM) tone model. We also present a decision tree based approach to learning the tone pattern variation in continuous speech. The phonetic and linguistic factors that may affect the tone patterns were taken into consideration while constructing the tree. After the tree was established, 28 different tone patterns were obtained. We found that in addition to the tone of the neighboring syllable, Consonant/Vowel type of the syllable and the position of the syllable in the utterance also made important contributions to tone pattern variations in continuous speech. Finally, a new approach of integrating tone information into the search process at word level is discussed. Experiments on continuous Mandarin speech recognition showed that the new tone model and tone information integration method were efficient, achieving a 16.2% relative character error rate reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cao,Y., Huang, T.-Y., Xu, B., and Li, C.-R. (2000). Astochastic polynomial tone model for continuous Mandarin speech. ICSLP'2000 Proceedings.

  • Chang, P.-C, Sun, S.-W., and Chen, S.-H. (1972). Mandarin Tone recognition by multilayer perception. IEEE Trans. On Audio and Electroacoustic, 20:367–377.

    Google Scholar 

  • Chen, C.J., Gopinath, R.A., and Monkowshi, M.D. (1997). New method in continuous Mandarin speech recognition. In ICASSP'97 Proceedings (CDROM).

  • Chen, S.-H., Hwang, S.-H., and Wang, Y.-R. (1998). An RNNbased prosodic information synthesizer for Mandarin text-tospeech. IEEE Trans. on Speech and Signal Processing, 6(3):226–239.

    Google Scholar 

  • Chen, S.-H. and Wang, Y.-R. (1995). Tone recognition of continuous Mandarin speech based on neural networks. IEEE Trans. on Speech and Signal Processing, 3(2):146–150.

    Google Scholar 

  • Dempster, A.P., Larid, N.M., and Rubin, D.B. (1977). Maximum-likelihood from Incomplete Data via the EM algorithm. Journal of Royal Statistical Society Series B, 39:13–18.

    Google Scholar 

  • Hastie, T. and Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society (B), 58:155–176.

    Google Scholar 

  • Huang, H. and Seide, F. (2000). Pitch tracking and tone features for Mandarin speech recognition. ICASSP'2000 Proceedings, pp. 1523-1526.

  • Jain, A.K. et al. (2000). Statistical pattern recognition: A review. IEEE Trans on Pattern Analysis and Machine Intelligence, 22(1):4–37.

    Google Scholar 

  • Juang, B.H. and Katagiri, S. (1992). Discriminative learning for minimum error training. IEEE Trans. on Signal Processing, 40(12):3043–3051.

    Google Scholar 

  • Juang, B.H., Chou, W., and Lee, C.-H. (1997). Minimum classification error rate methods for speech recognition. IEEE Trans. on Speech and Audio Processing, 5(3):257–265.

    Google Scholar 

  • Lee, T., Carlson, R., and Granstorm, B. (1998). Context-dependent duration modeling for continuous speech recognition. ICSLP'98 Proceedings (CDROM).

  • Lin, M.-C. (1998). The Acoustic and Perceptual Characteristics of Chinese Mandarin Speech. Chinese Language (in Chinese), No. 2.

  • Ma, B. et al. (1996). Context-dependent acoustic models in Chinese speech recognition. In ICASSP'96 Proceedings (CDROM).

  • Russell, M. and Moore, R. (1985). Explicit modeling of state occupancy in Hidden Markov models for automatic speech recognition. ICASSP'1985, Proceedings, pp. 2376-2379.

  • Wang, C. and Seneff, S. (1998). A study of tone and tempo in continuous Mandarin digital strings and their application in telephone quality speech recognition. ICSLP'98 Proceedings, pp. 695-698.

  • Wang, H.-M. et al. (1997). Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data. IEEE Trans. on Speech and Audio Processing, 5(2):196–201.

    Google Scholar 

  • Wang, Y.R. et al. (1994). Tone recognition of continuous Chinese speech based on Hidden Markov model. Int. J. Pattern Recognition and Artificial Intelligence, 8(1):233–246.

    Google Scholar 

  • Wong,Y.W. and Chang, E. (2001). The effect of pitch and lexical tone on different Mandarin speech recognition tasks. Eurospeech'2001 Proceedings (CDROM).

  • Zhao, L. et al. (1997). HMM based recognition of Chinese tones in continuous speech. The First China-Japan Workshop on Spoken Language Processing Proceedings (CDROM).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, Y., Zhang, S., Huang, T. et al. Tone Modeling for Continuous Mandarin Speech Recognition. International Journal of Speech Technology 7, 115–128 (2004). https://doi.org/10.1023/B:IJST.0000017012.11970.6a

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJST.0000017012.11970.6a

Navigation