Abstract
This paper describes the port to software of a mature text-to-speech synthesis technology that has been sold as a series of hardware products for over ten years. Originally developed as an alternative to a character cell terminal and for telephony applications, today it is also used to provide people with visually disabilities access to information. The quality of text-to-speech is extremely high in both intelligibility and naturalness and uses a digital formant synthesizer to simulate the human vocal tract. Prior to very high speed processors, the computational demands of this synthesizer placed an extreme load on a workstation. This study used a Digital Equipment AlphaModel 600 workstation to simultanoeusly convert many text streams to speech. The power of modern RISC processors allows applications to freely use speech for output. This capability has prompted the need for a text-to-speech application programming interface (API). The API that we have developed for TTS software is supported on multiple platforms and multiple operating systems. This paper presents a description of the TTS software architecture. The API is also specified. Finally, our experience in porting the TTS code base from the previous hardware platforms is described.
Similar content being viewed by others
References
Allen, J., Hunnicutt, S., and Klatt, D.H., (1987).From Text to Speech: The MITalk System, Cambridge: Cambridge University Press.
Bruckert, E., Minow, M., and Tetschner, W. (1983). Three-tiered software and VLSI aid development system to read text aloud,Electronics.
Divay, M. and Vitale, A.J. (forthcoming). Algorithms for Grapheme-Phoneme Translation in French and English
Fant, G. (1960).Acoustic Theory of Speech Production, Netherlands: Mouton and Co. N. V.
Flanagan, J.L. (1972).Analysis, Synthesis, and Perception. 2nd Ed. New York: Springer-Verlag.
Fromkin, V. and Rodman, R. (1994).An Introduction to Language, 5th Ed. New York: Holt, Rinehart and Winston.
Klatt, D.H. (1980). Software for a Cascade/Parallel Formant Synthesizer.Journal of the Acoustical Society of America, 67:971–975.
Klatt, D.H. (1987). Review of Text-to-Speech Conversion for English.Journal of the Acoustical Society of America, 82(3):737–793.
Klatt, D.H. and Klatt, L.C. (1990). Analysis, Synthesis, and Perception of Voice Quality Variations among Female and Male Talkers.Journal of the Acoustical Society of America, 87:820–857.
Psioni, D.B., Nusbaum, H.C., and Greene, B.G. (1985). Perception of Synthetic Speech Generated by Rule.Proceedings of the IEEE, 73(11):1665–1676.
Rabiner, L.R. and Gold, B. (1975).Theory And Application of Digital Signal Processing. London: Prentice Hall.
Rabiner, L.R. and Schafer, R.W. (1978).Digital Processing of Speech Signals. London: Prentice Hall.
Schmandt, C. (1994).Voice Communication with Computers, New York: Van Nostrand Reinhold.
Tierney, J. (1975). Digital Frequency Synthesizers, Chapter V. In J. Gorski-Popel (Ed.),Frequency Synthesis: Techniques and Applications, N.Y.: IEEE Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hallahan, W.I., Vitale, A.J. Software text-to-speech. Int J Speech Technol 1, 121–134 (1997). https://doi.org/10.1007/BF02277193
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02277193