Skip to main content
Log in

Audio formatting—Making spoken text and math comprehensible

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

ASTER is an interactive computing system foraudio formatting electronic documents (presently, documents written in (LA)TEX) to produce audio documents. ASTER can speak both literary texts and highly technical documents that contain complex mathematics. In fact, the effective speaking of mathematics is a key goal of ASTER. To this end, a listener can request that segments of text or mathematics be spoken using several different rendering styles, in an interactive fashion. Listeners can themselves construct rendering rules and styles, if they feel it necessary.

In this paper, we describe the rendering component of ASTER—the system for writing rules for speaking various parts of text and mathematics—and discuss some of the principles that were used in developing rules for making spoken text, mathematics, and tables comprehensible.

Visual communication is characterized by the eye's ability to actively access parts of a two-dimensional display. The reader is active, while the display is passive. This active-passive role is reversed by the temporal nature of oral communication: information flows actively past a passive listener. This prohibits multiple views—it is impossible to first obtain a high-level view and then “look” at details. These shortcomings become severe when presenting complex mathematics orally.

Audio formatting, which renders information structure in a manner attuned to an auditory display, overcomes these problems. Audio layout, composed of fleeting and persistent cues, conveys complex structure without detracting from the content. ASTER is interactive, and the ability to browse information structure and obtain multiple views enablesactive listening.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Buxton W., Gaver W. and Bly S. (1988) The use of nonspeech audio at the interface. Tutorial Notes, CHI '88.

  • Buxton W. (1989) Introduction to this special issue on nonspeech audio. Human Computer Interaction 4(1):1–9.

    Google Scholar 

  • Gaver William. (1993) Synthesizing auditory icons. Proceedings of INTERCHI 1993, pp. 228–235.

  • Raman T.V. (1994)Audio System for Technical Readings. PhD thesis, Cornell University.URL http://www.research.digital.-com/CRL/personal/raman/raman.html.

  • Steele Guy L. (1990) Common Lisp The Language. Digital Press, Bedford, Mass, second edition.

    Google Scholar 

  • Accredited Standards Committee X3J13. Programming Language-Common Lisp—Draft Proposed. CBEMA, 1993.URL FTP://parcftp. xerox.com/pub/cl/dpANS2.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raman, T.V., Gries, D. Audio formatting—Making spoken text and math comprehensible. Int J Speech Technol 2, 21–31 (1997). https://doi.org/10.1007/BF02215801

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02215801

Keywords

Navigation