Abstract
In real speech, not like lexical words (LWs), prosodic words (PWs) are basic rhythmic units. The naturalness of a Text-to-Speech (TTS) system is directly influenced by the segmentation of the PWs. Most of the PWs are the combination of several LWs. In this paper, three Lexical Combination Models are proposed to combine LWs into PWs, including a Directed Acyclic Graph Model, a Segmentation Model and a Markov Model (MM). To cope with the situation where some long LWs should be segmented into two or more PWs, a Lexical Split Model (LSM) is applied to the long LWs. Experimental results prove that relatively constant results with various training data can be obtained from a MM. The Transformation-Based Error Driven Learning (TBED) algorithm, for its high performance of individual property, is applied in combination with the MM to improve the precision of PW segmentation. Experiments show that among the three proposed models, the MM combined with TBED and LSM, leads to the best performance, in which a precision of 93.00% and a recall of 93.23% are achieved. The perception test indicates that by using PWs as the lowest prosodic units a speech sounds more natural and acceptable than by using LWs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Brill, E., & Resnik, P. (1994). A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th international conference on computational linguistics (COLING-94) (pp. 1198–1204). Kyoto, Japan.
Chen, S. H., Hwang, S. H., & Wang, Y. R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Transactions on Speech Audio Processing, 6, 226–239.
Chen, K., Johnson, M. H., & Cohen, A. (2004). An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model. In Proceedings of IEEE international conference on acoustics, speech, and signal processing 2004 (ICASSP 2004) (pp. 509–512). Montreal, Canada.
Chou, F. C., Tseng, C. Y., & Lee, L. S. (1998). Automatic segmental and prosodic labeling of Mandarin speech database. In Proceeding of the fifth international conference on spoken language processing (pp. 1263–1266). Sydney, Australia.
Chu, M., & Qian, Y. (2001). Locating boundaries for prosodic constituents in unrestricted Mandarin texts. Journal of Computational Linguistics and Chinese Language Processing, 6(1), 61–82.
Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18, 281–290.
Bachenko, J., & Fitzpatrick, E. (1990). A computational grammar of discourse neutral prosodic phrasing in English. Computational Linguistics, 16(3), 155–170.
Nespor, M., & Vogel, I. (1986). Prosodic phonology. Dordrecht: Foris.
Ostendorf, M., & Veilleux, N. (1994). A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics, 20(1), 27–54.
Qian, Y., & Chu, M. (2001). Segmenting unrestricted Chinese text into prosodic words instead of lexical words. In Proceeding of international conference on acoustics, speech and signal processing 2001 (ICASSP2001) (pp. 825–828). Salt Lake City, USA.
Tseng, C. Y., & Chou, F. C. (1999). A prosodic labeling system for mandarin speech database. In XIVth international congress of phonetic sciences (pp. 2379–2382). San Francisco, USA.
Selkirk, E. (1984). Phonology and syntax: the relation between sound and structure. Cambridge: MIT Press.
Selkirk, E. (1986). On derived domains in sentence phonology. Phonology Yearbook, 3, 371–405.
Wang, M. Q., & Hirschberg, J. (1992). Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6, 175–196.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is supported by NSFC Project (60503071); 973 Natural Basic Research Program of China (2004CB318102); Postdoctor Science Foundation of P. R. China (20070420275).
Rights and permissions
About this article
Cite this article
Shao, Y., Han, J., Liu, T. et al. Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system. Int J Speech Technol 10, 45–55 (2007). https://doi.org/10.1007/s10772-008-9013-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-008-9013-5