Skip to main content
Log in

Abstract

This paper describes the port to software of a mature text-to-speech synthesis technology that has been sold as a series of hardware products for over ten years. Originally developed as an alternative to a character cell terminal and for telephony applications, today it is also used to provide people with visually disabilities access to information. The quality of text-to-speech is extremely high in both intelligibility and naturalness and uses a digital formant synthesizer to simulate the human vocal tract. Prior to very high speed processors, the computational demands of this synthesizer placed an extreme load on a workstation. This study used a Digital Equipment AlphaModel 600 workstation to simultanoeusly convert many text streams to speech. The power of modern RISC processors allows applications to freely use speech for output. This capability has prompted the need for a text-to-speech application programming interface (API). The API that we have developed for TTS software is supported on multiple platforms and multiple operating systems. This paper presents a description of the TTS software architecture. The API is also specified. Finally, our experience in porting the TTS code base from the previous hardware platforms is described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allen, J., Hunnicutt, S., and Klatt, D.H., (1987).From Text to Speech: The MITalk System, Cambridge: Cambridge University Press.

    Google Scholar 

  • Bruckert, E., Minow, M., and Tetschner, W. (1983). Three-tiered software and VLSI aid development system to read text aloud,Electronics.

  • Divay, M. and Vitale, A.J. (forthcoming). Algorithms for Grapheme-Phoneme Translation in French and English

  • Fant, G. (1960).Acoustic Theory of Speech Production, Netherlands: Mouton and Co. N. V.

    Google Scholar 

  • Flanagan, J.L. (1972).Analysis, Synthesis, and Perception. 2nd Ed. New York: Springer-Verlag.

    Google Scholar 

  • Fromkin, V. and Rodman, R. (1994).An Introduction to Language, 5th Ed. New York: Holt, Rinehart and Winston.

    Google Scholar 

  • Klatt, D.H. (1980). Software for a Cascade/Parallel Formant Synthesizer.Journal of the Acoustical Society of America, 67:971–975.

    Article  Google Scholar 

  • Klatt, D.H. (1987). Review of Text-to-Speech Conversion for English.Journal of the Acoustical Society of America, 82(3):737–793.

    Article  PubMed  Google Scholar 

  • Klatt, D.H. and Klatt, L.C. (1990). Analysis, Synthesis, and Perception of Voice Quality Variations among Female and Male Talkers.Journal of the Acoustical Society of America, 87:820–857.

    Article  PubMed  Google Scholar 

  • Psioni, D.B., Nusbaum, H.C., and Greene, B.G. (1985). Perception of Synthetic Speech Generated by Rule.Proceedings of the IEEE, 73(11):1665–1676.

    Google Scholar 

  • Rabiner, L.R. and Gold, B. (1975).Theory And Application of Digital Signal Processing. London: Prentice Hall.

    Google Scholar 

  • Rabiner, L.R. and Schafer, R.W. (1978).Digital Processing of Speech Signals. London: Prentice Hall.

    Google Scholar 

  • Schmandt, C. (1994).Voice Communication with Computers, New York: Van Nostrand Reinhold.

    Google Scholar 

  • Tierney, J. (1975). Digital Frequency Synthesizers, Chapter V. In J. Gorski-Popel (Ed.),Frequency Synthesis: Techniques and Applications, N.Y.: IEEE Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hallahan, W.I., Vitale, A.J. Software text-to-speech. Int J Speech Technol 1, 121–134 (1997). https://doi.org/10.1007/BF02277193

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02277193

Keywords

Navigation