A rule-based phrase parser for real-time text-to-speech synthesis

Joan Bachenko; Eileen Fitzpatrick; Jeffrey Daugherty

doi:10.1017/S1351324900000140

A rule-based phrase parser for real-time text-to-speech synthesis

Published online by Cambridge University Press: 12 September 2008

Joan Bachenko ,

Eileen Fitzpatrick and

Jeffrey Daugherty

Show author details

Joan Bachenko: Affiliation:
AT&T Bell Laboratories, Murray Hill, NJ 07974, USA e-mail: (joan-b@research.att.com)
Eileen Fitzpatrick: Affiliation:
AT&T Bell Laboratories, Murray Hill, NJ 07974, USA e-mail: (emf@ulysses.att.com)
Jeffrey Daugherty: Affiliation:
AT&T Bell Laboratories, Naperville, IL 60566, USA e-mail: (dragon@iexist.att.com)

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Text-to-speech systems are currently designed to work on complete sentences and paragraphs, thereby allowing front end processors access to large amounts of linguistic context. Problems with this design arise when applications require text to be synthesized in near real time, as it is being typed. How does the system decide which incoming words should be collected and synthesized as a group when prior and subsequent word groups are unknown? We describe a rule-based parser that uses a three cell buffer and phrasing rules to identify break points for incoming text. Words up to the break point are synthesized as new text is moved into the buffer; no hierarchical structure is built beyond the lexical level. The parser was developed for use in a system that synthesizes written telecommunications by Deaf and hard of hearing people. These are texts written entirely in upper case, with little or no punctuation, and using a nonstandard variety of English (e.g. WHEN DO I WILL CALL BACK YOU). The parser performed well in a three month field trial utilizing tens of thousands of texts. Laboratory tests indicate that the parser exhibited a low error rate when compared with a human reader.

Type: Articles
Information: Natural Language Engineering , Volume 1 , Issue 2 , June 1995 , pp. 191 - 212

DOI: https://doi.org/10.1017/S1351324900000140 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bachenko, J., and Fitzpatrick, E.. (1990) A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics, 16:155–70.Google Scholar

Charrow, V., (1974) Deaf English. Technical Report 236, Institute for Mathematical Studies in the Social Sciences, Stanford University.Google Scholar

Church, K., A stochastic parts program and noun phrase parser for unrestricted text. (1988) Proceedings of the Second Conference on Applied Natural Language Processing (ACL), 136–43.CrossRef Google Scholar

Hindle, D., (1983) User Manual for Fidditch, a Determinstic Parser. NRL Technical Memorandum #7590–142.Google Scholar

Karn, H. E., (1993) An algorithm for generating phrase boundaries for the automatic assignment of prosodic contours in a text-to-speech system for Spanish. Journal of the Acoustical Society America, 94, No.3, Pt. 2:1842.CrossRef Google Scholar

Kukich, K., (1992) Spelling correction for the telecommunications network for the deaf. Communications of the ACM.CrossRef Google Scholar

Levow, G.-A., (1993) An experimental discourse-neutral prosodic phrasing system for Mandarin Chinese. In Proceedings of the First Annual Conference of the Pacific Association for Computational Linguistics (PACL1NG '93),Simon Fraser University,Vancouver, British Columbia, Canada, pp. 121–30.Google Scholar

Liberman, M. Y., and Church, K. W., (1992) Text analysis and word pronunciation in textto-speech synthesis. In Furui, S., and Sondhi, M., eds.), Advances in Signal Processing pp. 791–831.Google Scholar

Olive, J. P., and Liberman, M. Y., (1985) Text-to-speech - an overview. Journal of the Acoustic Society of America, Supplement 1:78, S6.Google Scholar

Ostendorf, M., and Veilleux, N. M., (1994) A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics, 20: 27–54.Google Scholar

Ostendorf, M., Wightman, C. W., and Veilleux, N. M., (1993) Parse scoring with prosodic information: an analysis/synthesis approach. Computer Speech and Language, 7: 193–210.CrossRef Google Scholar

Quene, H., and Kager, R., (1992) The derivation of prosody for text-to-speech from prosodic sentence structure. Computer Speech and Language, 6: 77–98.CrossRef Google Scholar

Silverman, K., Basson, S., and Levas, S. 1990 Evaluating synthesizer performance: is segmental intelligibility enough? Proceedings of the 1990 International Conference on Spoken Language Processing, pp.981–4..CrossRef Google Scholar

Suri, L., (1991) Language transfer: a foundation for correcting the written English of ASL signers. University of Delaware Technical Report #91–19.Google Scholar

Tsao, Y.-C. (1990) A lexical study of sentences typed by hearing-impaired TDD users. Proceedings of 13th International Symposium,Human Factors in Telecommunications,Torino, Italy..Google Scholar

Wang, M., and Hirschberg, J., (1992) Automatic classification of intonational phrase boundaries. Computer Speech and Language 6, 175–196.CrossRef Google Scholar

Article contents

A rule-based phrase parser for real-time text-to-speech synthesis

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests