Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system

Shao, Yanqiu; Han, Jiqing; Liu, Ting; Zhao, Yongzhen

doi:10.1007/s10772-008-9013-5

Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system

Published: 18 December 2008

Volume 10, pages 45–55, (2007)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Yanqiu Shao¹,
Jiqing Han²,
Ting Liu² &
…
Yongzhen Zhao²

68 Accesses
Explore all metrics

Abstract

In real speech, not like lexical words (LWs), prosodic words (PWs) are basic rhythmic units. The naturalness of a Text-to-Speech (TTS) system is directly influenced by the segmentation of the PWs. Most of the PWs are the combination of several LWs. In this paper, three Lexical Combination Models are proposed to combine LWs into PWs, including a Directed Acyclic Graph Model, a Segmentation Model and a Markov Model (MM). To cope with the situation where some long LWs should be segmented into two or more PWs, a Lexical Split Model (LSM) is applied to the long LWs. Experimental results prove that relatively constant results with various training data can be obtained from a MM. The Transformation-Based Error Driven Learning (TBED) algorithm, for its high performance of individual property, is applied in combination with the MM to improve the precision of PW segmentation. Experiments show that among the three proposed models, the MM combined with TBED and LSM, leads to the best performance, in which a precision of 93.00% and a recall of 93.23% are achieved. The perception test indicates that by using PWs as the lowest prosodic units a speech sounds more natural and acceptable than by using LWs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Processing and Prosody

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

Improving Word Recognition in Speech Transcriptions by Decision-Level Fusion of Stemming and Two-Way Phoneme Pruning

References

Brill, E., & Resnik, P. (1994). A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th international conference on computational linguistics (COLING-94) (pp. 1198–1204). Kyoto, Japan.
Chen, S. H., Hwang, S. H., & Wang, Y. R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Transactions on Speech Audio Processing, 6, 226–239.
Article Google Scholar
Chen, K., Johnson, M. H., & Cohen, A. (2004). An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model. In Proceedings of IEEE international conference on acoustics, speech, and signal processing 2004 (ICASSP 2004) (pp. 509–512). Montreal, Canada.
Chou, F. C., Tseng, C. Y., & Lee, L. S. (1998). Automatic segmental and prosodic labeling of Mandarin speech database. In Proceeding of the fifth international conference on spoken language processing (pp. 1263–1266). Sydney, Australia.
Chu, M., & Qian, Y. (2001). Locating boundaries for prosodic constituents in unrestricted Mandarin texts. Journal of Computational Linguistics and Chinese Language Processing, 6(1), 61–82.
Google Scholar
Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18, 281–290.
Google Scholar
Bachenko, J., & Fitzpatrick, E. (1990). A computational grammar of discourse neutral prosodic phrasing in English. Computational Linguistics, 16(3), 155–170.
Google Scholar
Nespor, M., & Vogel, I. (1986). Prosodic phonology. Dordrecht: Foris.
Google Scholar
Ostendorf, M., & Veilleux, N. (1994). A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics, 20(1), 27–54.
Google Scholar
Qian, Y., & Chu, M. (2001). Segmenting unrestricted Chinese text into prosodic words instead of lexical words. In Proceeding of international conference on acoustics, speech and signal processing 2001 (ICASSP2001) (pp. 825–828). Salt Lake City, USA.
Tseng, C. Y., & Chou, F. C. (1999). A prosodic labeling system for mandarin speech database. In XIVth international congress of phonetic sciences (pp. 2379–2382). San Francisco, USA.
Selkirk, E. (1984). Phonology and syntax: the relation between sound and structure. Cambridge: MIT Press.
Google Scholar
Selkirk, E. (1986). On derived domains in sentence phonology. Phonology Yearbook, 3, 371–405.
Google Scholar
Wang, M. Q., & Hirschberg, J. (1992). Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6, 175–196.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational Linguistics, School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Yanqiu Shao
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
Jiqing Han, Ting Liu & Yongzhen Zhao

Authors

Yanqiu Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jiqing Han
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yongzhen Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanqiu Shao.

Additional information

This paper is supported by NSFC Project (60503071); 973 Natural Basic Research Program of China (2004CB318102); Postdoctor Science Foundation of P. R. China (20070420275).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, Y., Han, J., Liu, T. et al. Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system. Int J Speech Technol 10, 45–55 (2007). https://doi.org/10.1007/s10772-008-9013-5

Download citation

Received: 07 September 2006
Accepted: 17 November 2008
Published: 18 December 2008
Issue Date: March 2007
DOI: https://doi.org/10.1007/s10772-008-9013-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system

Abstract

Access this article

Similar content being viewed by others

Speech Processing and Prosody

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

Improving Word Recognition in Speech Transcriptions by Decision-Level Fusion of Stemming and Two-Way Phoneme Pruning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system

Abstract

Access this article

Similar content being viewed by others

Speech Processing and Prosody

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

Improving Word Recognition in Speech Transcriptions by Decision-Level Fusion of Stemming and Two-Way Phoneme Pruning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation