Hostname: page-component-76fb5796d-wq484 Total loading time: 0 Render date: 2024-04-26T01:03:39.863Z Has data issue: false hasContentIssue false

A rule-based phrase parser for real-time text-to-speech synthesis

Published online by Cambridge University Press:  12 September 2008

Joan Bachenko
Affiliation:
AT&T Bell Laboratories, Murray Hill, NJ 07974, USA e-mail: (joan-b@research.att.com)
Eileen Fitzpatrick
Affiliation:
AT&T Bell Laboratories, Murray Hill, NJ 07974, USA e-mail: (emf@ulysses.att.com)
Jeffrey Daugherty
Affiliation:
AT&T Bell Laboratories, Naperville, IL 60566, USA e-mail: (dragon@iexist.att.com)

Abstract

Text-to-speech systems are currently designed to work on complete sentences and paragraphs, thereby allowing front end processors access to large amounts of linguistic context. Problems with this design arise when applications require text to be synthesized in near real time, as it is being typed. How does the system decide which incoming words should be collected and synthesized as a group when prior and subsequent word groups are unknown? We describe a rule-based parser that uses a three cell buffer and phrasing rules to identify break points for incoming text. Words up to the break point are synthesized as new text is moved into the buffer; no hierarchical structure is built beyond the lexical level. The parser was developed for use in a system that synthesizes written telecommunications by Deaf and hard of hearing people. These are texts written entirely in upper case, with little or no punctuation, and using a nonstandard variety of English (e.g. WHEN DO I WILL CALL BACK YOU). The parser performed well in a three month field trial utilizing tens of thousands of texts. Laboratory tests indicate that the parser exhibited a low error rate when compared with a human reader.

Type
Articles
Copyright
Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bachenko, J., and Fitzpatrick, E.. (1990) A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics, 16:155–70.Google Scholar
Charrow, V., (1974) Deaf English. Technical Report 236, Institute for Mathematical Studies in the Social Sciences, Stanford University.Google Scholar
Church, K., A stochastic parts program and noun phrase parser for unrestricted text. (1988) Proceedings of the Second Conference on Applied Natural Language Processing (ACL), 136–43.CrossRefGoogle Scholar
Hindle, D., (1983) User Manual for Fidditch, a Determinstic Parser. NRL Technical Memorandum #7590–142.Google Scholar
Karn, H. E., (1993) An algorithm for generating phrase boundaries for the automatic assignment of prosodic contours in a text-to-speech system for Spanish. Journal of the Acoustical Society America, 94, No.3, Pt. 2:1842.CrossRefGoogle Scholar
Kukich, K., (1992) Spelling correction for the telecommunications network for the deaf. Communications of the ACM.CrossRefGoogle Scholar
Levow, G.-A., (1993) An experimental discourse-neutral prosodic phrasing system for Mandarin Chinese. In Proceedings of the First Annual Conference of the Pacific Association for Computational Linguistics (PACL1NG '93),Simon Fraser University,Vancouver, British Columbia, Canada, pp. 121–30.Google Scholar
Liberman, M. Y., and Church, K. W., (1992) Text analysis and word pronunciation in textto-speech synthesis. In Furui, S., and Sondhi, M., eds.), Advances in Signal Processing pp. 791831.Google Scholar
Olive, J. P., and Liberman, M. Y., (1985) Text-to-speech - an overview. Journal of the Acoustic Society of America, Supplement 1:78, S6.Google Scholar
Ostendorf, M., and Veilleux, N. M., (1994) A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics, 20: 2754.Google Scholar
Ostendorf, M., Wightman, C. W., and Veilleux, N. M., (1993) Parse scoring with prosodic information: an analysis/synthesis approach. Computer Speech and Language, 7: 193210.CrossRefGoogle Scholar
Quene, H., and Kager, R., (1992) The derivation of prosody for text-to-speech from prosodic sentence structure. Computer Speech and Language, 6: 7798.CrossRefGoogle Scholar
Silverman, K., Basson, S., and Levas, S. 1990 Evaluating synthesizer performance: is segmental intelligibility enough? Proceedings of the 1990 International Conference on Spoken Language Processing, pp.981–4..CrossRefGoogle Scholar
Suri, L., (1991) Language transfer: a foundation for correcting the written English of ASL signers. University of Delaware Technical Report #91–19.Google Scholar
Tsao, Y.-C. (1990) A lexical study of sentences typed by hearing-impaired TDD users. Proceedings of 13th International Symposium,Human Factors in Telecommunications,Torino, Italy..Google Scholar
Wang, M., and Hirschberg, J., (1992) Automatic classification of intonational phrase boundaries. Computer Speech and Language 6, 175196.CrossRefGoogle Scholar