Skip to main content

Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Abstract

Tone plays an important lexical role in spoken tonal languages like Mandarin Chinese. In this paper we propose a two-pass search strategy for improving tonal syllable recognition performance. In the first pass, instantaneous F0 information is employed along with corresponding cepstral information in a 2-stream HMM based decoding. The F0 stream, which incorporates both discrete voiced/unvoiced information and continuous F0 contour, is modeled with a multi-space distribution. With just the first-pass decoding, we recently reported a relative improvement of 24% reduction of tonal syllable recognition errors on a Mandarin Chinese database [5]. In the second pass, F0 information over a horizontal, longer time span is used to build explicit tone models for rescoring the lattice generated in the first pass. Experimental results on the same Mandarin database show that an additional 8% relative error reduction of tonal syllable recognition is obtained by the second-pass search, lattice rescoring with enhanced tone models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hirst, D., Espesser, R.: Automatic Modeling of Fundamental Frequency Using a Quadratic Spline Function. Travaux de l’Institut de Phonétique d’Aix 15, 71–85 (1993)

    Google Scholar 

  2. Chen, C.J., Gopinath, R.A., Monkowski, M.D., Picheny, M.A., Shen, K.: New Methods in Continuous Mandarin Speech Recognition. In: Proc. Eurospeech 1997, pp. 1543–1546 (1997)

    Google Scholar 

  3. Chang, E., Zhou, J.-L., Di, S., Huang, C., Lee, K.-F.: Large Vocabulary Mandarin Speech Recognition with Different Approach in Modeling Tones. In: Proc. ICSLP 2000, pp. 983–986 (2000)

    Google Scholar 

  4. Freij, G.J., Fallside, F.: Lexical Stress Recognition Using Hidden Markov Models. In: Proc. ICASSP 1988, pp. 135–138 (1988)

    Google Scholar 

  5. Wang, H.L., Qian, Y., Soong, F.K., Zhou, J.-L., Han, J.Q.: A Multi-Space Distribution (MSD) Approach to Speech Recognition of Tonal Languages. In: Proc. ICSLP 2006 (2006)

    Google Scholar 

  6. Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space Probability Distribution HMM. IEICE Trans. Inf. & Syst. E85-D(3), 455–464 (2002)

    Google Scholar 

  7. Lin, C.H., Wu, C.H., Ting, P.Y., Wang, H.M.: Framework for Recognition of Mandarin Syllables with Tones Using Sub-syllabic Units. Journal of Speech Communication 18(2), 175–190 (1996)

    Article  Google Scholar 

  8. Qian, Y., Soong, F.K., Lee, T.: Tone-enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR. In: Proc. ICASSP 2006, pp. 133–136 (2006)

    Google Scholar 

  9. Tian, Y., Zhou, J.-L., Chu, M., Chang, E.: Tone Recognition with Fractionized Models and Outlined Features. In: Proc. ICASSP 2004, pp. 105–108 (2004)

    Google Scholar 

  10. Qian, Y.: Use of Tone Information in Cantonese LVCSR Based on Generalized Character Posterior Probability Decoding. PhD. Thesis, CUHK (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, H., Qian, Y., Soong, F., Zhou, JL., Han, J. (2006). Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_47

Download citation

  • DOI: https://doi.org/10.1007/11939993_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics