Audemes at work: Investigating features of non-speech sounds to maximize content recognition

https://doi.org/10.1016/j.ijhcs.2012.09.003Get rights and content

Abstract

To access interactive systems, blind users can leverage their auditory senses by using non-speech sounds. The structure of existing non-speech sounds, however, is geared toward conveying atomic operations at the user interface (e.g., opening a file) rather than evoking broader, theme-based content typical of educational material (e.g., an historical event). To address this problem, we investigate audemes, a new category of non-speech sounds whose semiotic structure and flexibility open new horizons for the aural interaction with content-rich applications. Three experiments with blind participants examined the attributes of an audeme that most facilitate the accurate recognition of their meaning. A sequential concatenation of different sound types (music, sound effect) yielded the highest meaning recognition, whereas an overlapping arrangement of sounds of the same type (music, music) yielded the lowest meaning recognition. We discuss seven guidelines to design well-formed audemes.

Highlights

► We introduce audemes as new non-speech sounds. ► Audemes are designed to aurally communicate articulated content. ► Audemes made of serial sounds facilitate meaning recognition. ► Females outperform males in recognizing audeme content. ► We propose guidelines to design well-formed audemes.

Introduction

Users interact with computer interfaces through different sensory modalities. Traditionally, the main modality has been visual, through the graphical user interface (GUI). Although touchscreens and haptic interactions have gained wider acceptance and have begun to significantly impact interface design, these primarily supplement the visual modality, which remains dominant. Walker et al. (2006), however, argue that with the increasing number of mobile devices, in which the screen real estate is significantly reduced, the auditory channel has begun to assume much greater importance. Brewster et al. (1993) point out the potential of non-speech sounds to reduce the stress on the user to focus attention on smaller or non-stationary visual targets. Additionally, non-speech sounds can be used complementarily to the visual output to increase the quantity of information conveyed to the user. Many other researchers have acknowledged the relevance of sound in user interfaces and investigated ways to utilize this modality (Brewster, 1994, Brewster et al., 1995a, Conversy, 1998, Edwards, 1989, Kantowitz and Sorkin, 1983, Sanders and McCormick, 1987).

Non-speech sounds have been used either to complement visual interfaces by adding another information channel, or used as the main channel when dealing with the visually impaired community, or used in “eyes free” context applications, such as driving. In these contexts, sounds have served as a valuable asset to communicate information where the visual channel was limited. Also, studies advocate that sound can be an influential cue for memory enhancement (Sánchez and Flores, 2003), and researchers have used sound to teach blind children math by communicating the formal traits of the formulas, such as its length and complexity (Sánchez and Flores, 2005, Stevens et al., 1994). The use of non-speech sounds in these different circumstances, suggests the increasing importance of non-speech sounds and efforts to perfect their communication abilities through user interfaces.

Despite the recognized importance of non-speech sounds, their design has been mainly conducted in an ad hoc manner, rather than following any guidelines. The lack of guidelines for designing non-speech sounds for interfaces has been noted by Back and Des (1996), who maintain that sounds should be designed as narratives or stories. Mustonen (2008) and Pirhonen et al. (2007) have noted this problem and suggested devising a theoretical framework for non-speech sound design, which can be used in auditory displays. Brewster et al. (1995a) have empirically derived design guidelines for a specific type of non-speech sound, but those are not comprehensive to the various non-speech sounds that exist.

Non-speech sounds have been the primary interest of our research for five years. In a previous study (Mannheimer et al., 2009), we developed audemes, which are short non-speech sound symbols, comprising various combinations of sound effect and music sounds. Audemes are a type of non-speech sounds whose meaning is derived from the combination of sounds. For example, combining sounds of a “key jangle,” which is used to represent keys, and a “car engine starting,” used to represent “a car,” generates the meaning of “driving.” Further, if we add a sound of “seagulls and surfing,” the final audeme could mean “vacation and trip to the beach.”

Just as the other designers of non-speech sounds, we have also enjoyed a wide freedom to explore various constructive and semantic strategies in generating audemes. However, we maintain that the time is ripe to consider formulating guidelines to help researchers and application designers achieve a more standardized approach to design non-speech sounds, one that could be more easily learned and applied in the everyday practice. To achieve this goal, we conducted three experiments with blind and visually impaired participants in which we examined the optimal combination of audeme attributes that can be used to facilitate accurate recognition of audeme meanings. This paper reports the creation of seven basic guidelines that can be used to design well-formed audemes.

The following research questions are central to this study:

Q1: How well do audemes aid in recognizing information?

The goal of this question is to establish the level of effectiveness different types of audemes have in terms of recognizing any textual content that is associated with them. Once this is established, it is important to determine the time period of how long an audeme can be effective, so that reinforcement can be applied to maintain the effective link between the audeme and the content it is representing.

Q2: Which combination of audemes is the best for accurate information recognition?

Audemes are created from different attributes (music or sound effects, serial or parallel arrangement, and so on) so it is important to understand which combinations are the most effective for recognizing the content associated with them.

The outcome of these research questions will help us better understand the nature of audemes along with their strengths and weaknesses. Moreover, it will help derive a set of initial guidelines for designing effective audemes to be used by acoustic interface designers.

The remainder of the paper is organized as follows. Section 2 will review related work that focused on models of learning using sounds, the meaning sounds generate based on modes of listening theory, and the different types of sounds in Human–Computer Interaction. Section 3 introduces three experiments conducted to generate guidelines for creating effective audemes. Section 4 highlights synopsis of findings from the three experiments. Section 5 describes the discussion and limitations to the three experiments along with hints to future work. Finally, Section 6 describes the concluding remarks.

Section snippets

The role of sounds in multimedia learning

The process of learning with multimedia material (Mayer and Moreno, 2003) has been conceptualized by contrasting verbal and non-verbal representations as elements which are encoded and stored in different substructures of short- and long-term memory (Clark and Paivio, 1991, Kosslyn, 1994, Baddeley, 1999). Verbal representations, such as words, are encoded in the form of propositional representations, while non-verbal representations, such as pictures, are encoded and stored in analogical

General approach

We investigate the structure of audemes composed of source, semiotic and syntactic attributes.

Source attributes are the type of sounds used to create audemes. Based on the typical components of non-speech sounds found in the literature, we distinguish two broad groups of sound types:

  • a.

    Music – snippets of songs, including instrumental songs, of varying genres: classic, rock, etc.

  • b.

    Sound effects – pre-recorded or artificially created sounds. This group of sounds mainly consists of (i) abstract

Synopsis of the findings

The following is a synopsis of the findings derived from the three experiments.

Learnability of the audemes: Audemes can be successfully learned and used to help participants remember the information associated with them. In the beginning of the process of audemes getting learned by participants, variations in their attributes (source, semiotic and syntactic) are essential, but this effect fades once they are learned. In answering the first research question (Section 1.1), the audemes were shown

Discussion

This study confirmed the following hypotheses:

  • audemes created of serial combinations of sounds will yield higher meaning recognitions when compared to parallel combinations of sounds (H1);

  • audemes created from sounds in the causal and/or referential modes of listening yielded a higher meaning recognition when compared to audemes created from sounds in the reduced mode (H3); and

  • frequent exposure to audemes has a high impact in their learnability (H4).

However, the hypothesis that the meaning of

Conclusion

In this paper we have presented a study consisting of three experiments. In the first experiment we identified the characteristics of an audeme that will have an effect on the correct recognition of its meaning. In this longitudinal experiment, we found that audemes created of serial concatenation of sounds yield higher recognition scores compared to audemes created of parallel sounds. Also, mixing different types of sounds (music and sound effect) yields higher recognition scores compared to

Acknowledgments

This work was supported by a grant from the Nina Mason Pulliam Charitable Trust and NSF Award #1018054 “Navigating the Aural Web.” Researchers thank the students and the staff of ISBVI. The study has been approved by the IUPUI (Indiana University–Purdue University Indianapolis) IRB #IRB-01-0704-74B.

References (66)

  • S.A. Brewster

    The design of sonically-enhanced widgets

    Interacting with Computers

    (1998)
  • S. Brewster et al.

    Parallel earcons: reducing the length of audio messages

    International Journal of Human Computer Studies

    (1995)
  • A.M. Glenberg et al.

    Comprehension of illustrated text: pictures help to build mental models

    Journal of Memory and Language

    (1992)
  • Absar, R., Guastavino, C., 2008. Usability of non-speech sounds in user interfaces. In: Proceedings of the...
  • M. Back et al.

    Micro-narratives in sound design: context, character, and caricature in waveform manipulation. In: ICAD Proceedings. International Community for Auditory Display

    (1996)
  • A.D. Baddeley

    Essentials of Human Memory

    (1999)
  • L. Barrett et al.

    Sex differences in emotional awareness

    Personality and Social Psychology Bulletin

    (2000)
  • R.M. Bernard

    Using extended captions to improve learning from instructional illustrations

    British Journal of Educational Technology

    (1990)
  • M. Blattner et al.

    Earcons and icons: their structure and common design principles

    Human–Computer Interaction

    (1989)
  • G. Bower

    Mood and memory

    American psychologist

    (1981)
  • Brewster, S., 1994. Providing a Structured Method for Integrating Non-Speech Audio into Human–Computer Interfaces....
  • Brewster, S., Wright, P., and Edwards, A., 1993. An evaluation of earcons for use in auditory human–computer...
  • S. Brewster et al.

    Experimentally derived guidelines for the creation of earcons

    Adjunct Proceedings of HCI

    (1995)
  • S.A. Brewster et al.

    Earcons as a method of providing navigational cues in a menu hierarchy. In Proceedings of the HCI’96

    (1996)
  • B. Challis et al.

    Weasel: a computer based system for providing non-visual access to music notation

    ACM SIGCAPH Computers and the Physically Handicapped

    (2000)
  • M. Chion et al.

    Audio-Vision: Sound on Screen

    (1994)
  • J.M. Clark et al.

    Dual coding theory and education

    Educational Psychology Review

    (1991)
  • Conversy, S., 1998. Ad-hoc synthesis of auditory icons. ICAD', vol. 98, pp....
  • B. Deatherage

    Auditory and other sensory forms of information presentation

    Human Engineering Guide to Equipment Design

    (1972)
  • F. Delogu et al.

    Non-visual explo-ration of geographic maps: does sonification help? Disability & rehabilitation

    Assistive Technology

    (2010)
  • Dingler, T., Lindsay, J., Walker, B. N., 2008. Learnability of sound cues for environmental features: auditory icons,...
  • B. Dorner et al.

    Towards an american sign language interface

    Artificial Intelligence Review

    (1994)
  • M. Doucet et al.

    Blind subjects process auditory spectral cues more efficiently than sighted individuals

    Experimental Brain Research

    (2005)
  • A. Edwards

    Soundtrack: an auditory interface for blind users

    Human–Computer Interaction

    (1989)
  • M. Ehrman et al.

    Effects of sex differences, career choice, and psychological type on adult language learning strategies

    Modern Language Journal

    (1988)
  • Fabiani, M., Dubus, G., Bresin, R., 2010. Interactive sonification of emotionally expressive gestures by means of music...
  • Ferati, M., Mannheimer, S., Bolchini, D., 2009. Acoustic interaction design through audemes: experiences with the...
  • Ferati, M., Mannheimer, S., Bolchini, D., 2011. Usability evaluation of acoustic interfaces for the blind. In:...
  • Fernstrom, M., Brazil, E., 2004. Human–computer interaction design based on interactive sonification hearing actions or...
  • W. Gaver

    Auditory icons: using sound in computer interfaces

    Human–Computer Interaction

    (1986)
  • W. Gaver

    The SonicFinder: an interface that uses auditory icons

    Human–Computer Interaction

    (1989)
  • Gaver, W., Smith, R., O'Shea, T., 1991. Effective sounds in complex systems: the ARKola simulation. In: Proceedings of...
  • Gerth, J., 1992. Performance Based Refinement of a Synthetic Auditory Ambience: Identifying and Discriminating Auditory...
  • Cited by (14)

    View all citing articles on Scopus
    View full text