Skip to main content
Log in

Prosodic modeling for Mandarin Chinese speech embedded with English spelling

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Since the letter-by-letter spelling of English words or acronyms appears in Mandarin Chinese speech quite often, a prosodic modeling approach for Mandarin speech mixed with English spelling is proposed to help understand the influence of embedded foreign language and make the foreign language congruous with the primary language’s prosody. Two important prosodic features, duration and pitch-mean (as a role of intonation), are discussed based on an English-Mandarin bilingual speech database, in which all the English words are in spelling style, that is, are read letter by letter. This approach considers several additive affecting factors that contribute to the variations of duration and pitch-mean. The parameters of the two modeling units were automatically estimated using the expectation-maximization (EM) algorithm. Experimental results showed that the root mean squared errors (RMSEs) obtained in the training and test sets were 8.93 and 9.00 ms in the reconstructed duration, and 0.41 and 0.83 ms in the pitch-mean respectively. This model provides a way to separate the effects of several major factors. All of the inferred weights values of the affecting factors were in close agreement with our prior linguistic knowledge. In addition, the model can provide useful cues to determine the prosodic phrase boundaries, including those occurring at intersyllable locations, with or without punctuation marks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abe, M., & Sato, H. (1992). Two-stage f0 control model using syllable based f0 units. In ICASSP 2 (pp. 53–56).

  • Badino, L., Barolo, C., & Quazza, S. (2004a). A general approach to TTS reading of mixed-language texts. In Proc. of 5th ISCA tutorial and research workshop on speech synthesis.

  • Badino, L., Barolo, C., & Quazza, S. (2004b). Language independent phoneme mapping for foreign TTS. In Proc. of 5th ISCA tutorial and research workshop on speech synthesis.

  • Batliner, A., Kompe, R., Kiebling, A., Niemann, H., & Noth, E. (1996). Syntactic-prosodic labeling of large spontaneous speech data-bases. In ICSLP (pp. 1720–1723).

  • Bellegarda, J. R., Silverman, K. E. A., Lenzo, K., & Anderson, V. (2001). Statistical prosodic modeling: from corpus design to parameter estimation. IEEE Transactions on Speech and Audio Processing, 9(1), 52–66.

    Article  Google Scholar 

  • Butterfield, J. M., & Pan, N. (2007). A comparison of English and Chinese Internet language. http://www.public.iastate.edu/~napan/works/A%20Comparison%20of%20English%20and%20Chinese%20Internet%20Language.pdf.

  • Chen, S. H., & Wang, Y. R. (1990). Vector quantization of pitch information in Mandarin speech. IEEE Transactions on Communications, 38(9), 1317–1320.

    Article  Google Scholar 

  • Chou, F. C., Tseng, C. Y., Chen, K. J., & Lee, L. S. (1997). A Chinese text-to-speech based on part-of-speech analysis, prosodic modeling and non-uniform units. In ICASSP (pp. 923–926).

  • de Tournemire, S. (1997). Identification and automatic generation of prosodic contours for a text-to-speech synthesis system in French. In Eurospeech.

  • Hammer, Ø., Harper, D. A. T., & Ryan, P. D. (2001). PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica, 4(1), 9.

    Google Scholar 

  • Hirose, K., & Fujisaki, H. (1982). Analysis and synthesis of voice fundamental frequency contours of spoken sentences. In ICASSP (pp. 950–953).

  • Hsieh, H. Y., Lyu, R. Y., & Lee, L. S. (1996). Use of prosodic information to integrate acoustic and linguistic knowledge in continuous Mandarin speech recognition with very large vocabulary. In ICSLP 2 (pp. 809–812).

  • Ishikawa, Y., & Ebihara, T. (1997). On the global f0 shape model using a transition network for Japanese text-to-speech systems. In Eurospeech.

  • Kuo, W. C., Wang, Y. R., Lu, H. M., & Chen, S. H. (2002). An NN-based approach to prosody generation for English word spelling in English-Chinese bilingual TTS. In Proc. of international conf. on Chinese spoken language processing (pp. 29–32).

  • Latorre, J., Iwano, K., & Furui, S. (2005). Polyglot synthesis using a mixture of monolingual. In ICASSP.

  • Li, H., Chen, F., Shen, L. Q., & Ma, X. J. (2003). Trainable Cantonese/English dual language speech synthesis system. In IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 508–511).

  • Meng, H. M., Keung, C. K., Siu, K. C., Fung, T. Y., & Ching, P. C. (2002). CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects. In ICSLP (pp. 2373–2376).

  • Ni, J. H., Wang, R. H., & Hirose, K. (1997). Quantitative analysis and formulation of tone concatenation in Chinese f0 contours. In EUROSPEECH.

  • Pennington, M. C. (1997). Language in Hong Kong at century’s end. Hong Kong: Hong Kong University Press.

    Google Scholar 

  • Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52, 591–611.

    MATH  MathSciNet  Google Scholar 

  • Shih, C. (1997). Declination in Mandarin. In Proceedings of an ESCA workshop on intonation: theory, models and applications (pp. 293–296).

  • Shih, C. L., & Ao, B. (1997). Duration study for the bell laboratories Mandarin text-to-speech system. In Progress in speech synthesis (pp. 383–399). Berlin: Springer.

    Google Scholar 

  • Wand, M. P. (1997). Data-based choice of histogram bin width. American Statistician, 51, 59–64.

    Article  Google Scholar 

  • Wang, W. J., Liao, Y. F., & Chen, S. H. (1999). Prosodic modeling of Mandarin speech and its application to lexical decoding. In Eurospeech 2 (pp. 743–746).

  • Wightman, C. W., & Ostendorf, M. (1994). Automatic labeling of prosodic patterns. IEEE Transactions on Speech and Audio Processing, 2(4), 469–481.

    Article  Google Scholar 

  • Wu, Z. J. (1990). Can poly-syllabic tone-sandhi patterns be the invariant units of intonation in spoken standard Chinese. In ICSLP 12(4) (pp. 1–4).

  • Yang, Y., & Wang, B. (2002). Acoustic correlates of hierarchical prosodic boundary in Mandarin. In Speech prosody.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Hsing Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, WH. Prosodic modeling for Mandarin Chinese speech embedded with English spelling. Int J Speech Technol 13, 13–27 (2010). https://doi.org/10.1007/s10772-010-9067-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-010-9067-z

Keywords

Navigation