Skip to main content
Log in

Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In real speech, not like lexical words (LWs), prosodic words (PWs) are basic rhythmic units. The naturalness of a Text-to-Speech (TTS) system is directly influenced by the segmentation of the PWs. Most of the PWs are the combination of several LWs. In this paper, three Lexical Combination Models are proposed to combine LWs into PWs, including a Directed Acyclic Graph Model, a Segmentation Model and a Markov Model (MM). To cope with the situation where some long LWs should be segmented into two or more PWs, a Lexical Split Model (LSM) is applied to the long LWs. Experimental results prove that relatively constant results with various training data can be obtained from a MM. The Transformation-Based Error Driven Learning (TBED) algorithm, for its high performance of individual property, is applied in combination with the MM to improve the precision of PW segmentation. Experiments show that among the three proposed models, the MM combined with TBED and LSM, leads to the best performance, in which a precision of 93.00% and a recall of 93.23% are achieved. The perception test indicates that by using PWs as the lowest prosodic units a speech sounds more natural and acceptable than by using LWs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Brill, E., & Resnik, P. (1994). A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th international conference on computational linguistics (COLING-94) (pp. 1198–1204). Kyoto, Japan.

  • Chen, S. H., Hwang, S. H., & Wang, Y. R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Transactions on Speech Audio Processing, 6, 226–239.

    Article  Google Scholar 

  • Chen, K., Johnson, M. H., & Cohen, A. (2004). An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model. In Proceedings of IEEE international conference on acoustics, speech, and signal processing 2004 (ICASSP 2004) (pp. 509–512). Montreal, Canada.

  • Chou, F. C., Tseng, C. Y., & Lee, L. S. (1998). Automatic segmental and prosodic labeling of Mandarin speech database. In Proceeding of the fifth international conference on spoken language processing (pp. 1263–1266). Sydney, Australia.

  • Chu, M., & Qian, Y. (2001). Locating boundaries for prosodic constituents in unrestricted Mandarin texts. Journal of Computational Linguistics and Chinese Language Processing, 6(1), 61–82.

    Google Scholar 

  • Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18, 281–290.

    Google Scholar 

  • Bachenko, J., & Fitzpatrick, E. (1990). A computational grammar of discourse neutral prosodic phrasing in English. Computational Linguistics, 16(3), 155–170.

    Google Scholar 

  • Nespor, M., & Vogel, I. (1986). Prosodic phonology. Dordrecht: Foris.

    Google Scholar 

  • Ostendorf, M., & Veilleux, N. (1994). A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics, 20(1), 27–54.

    Google Scholar 

  • Qian, Y., & Chu, M. (2001). Segmenting unrestricted Chinese text into prosodic words instead of lexical words. In Proceeding of international conference on acoustics, speech and signal processing 2001 (ICASSP2001) (pp. 825–828). Salt Lake City, USA.

  • Tseng, C. Y., & Chou, F. C. (1999). A prosodic labeling system for mandarin speech database. In XIVth international congress of phonetic sciences (pp. 2379–2382). San Francisco, USA.

  • Selkirk, E. (1984). Phonology and syntax: the relation between sound and structure. Cambridge: MIT Press.

    Google Scholar 

  • Selkirk, E. (1986). On derived domains in sentence phonology. Phonology Yearbook, 3, 371–405.

    Google Scholar 

  • Wang, M. Q., & Hirschberg, J. (1992). Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6, 175–196.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanqiu Shao.

Additional information

This paper is supported by NSFC Project (60503071); 973 Natural Basic Research Program of China (2004CB318102); Postdoctor Science Foundation of P. R. China (20070420275).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, Y., Han, J., Liu, T. et al. Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system. Int J Speech Technol 10, 45–55 (2007). https://doi.org/10.1007/s10772-008-9013-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-008-9013-5

Keywords

Navigation