research-article

Sound sample detection and numerosity estimation using auditory display

Authors:
Hannes Gamper

Aalto University, Finland

Aalto University, Finland
View Profile

,
Christina Dicke

University of Canterbury, New Zealand

University of Canterbury, New Zealand
View Profile

,
Mark Billinghurst

University of Canterbury, New Zealand

University of Canterbury, New Zealand
View Profile

,
Kai Puolamäki

Aalto University, Finland

Aalto University, Finland
View Profile

Authors Info & Claims

ACM Transactions on Applied Perception Volume 10 Issue 1Article No.: 4pp 1–18https://doi.org/10.1145/2422105.2422109

Published:04 March 2013Publication History

ACM Transactions on Applied Perception

Abstract

This article investigates the effect of various design parameters of auditory information display on user performance in two basic information retrieval tasks. We conducted a user test with 22 participants in which sets of sound samples were presented. In the first task, the test participants were asked to detect a given sample among a set of samples. In the second task, the test participants were asked to estimate the relative number of instances of a given sample in two sets of samples. We found that the stimulus onset asynchrony (SOA) of the sound samples had a significant effect on user performance in both tasks. For the sample detection task, the average error rate was about 10% with an SOA of 100 ms. For the numerosity estimation task, an SOA of at least 200 ms was necessary to yield average error rates lower than 30%. Other parameters, including the samples' sound type (synthesized speech or earcons) and spatial quality (multichannel loudspeaker or diotic headphone playback), had no substantial effect on user performance. These results suggest that diotic, or indeed monophonic, playback with appropriately chosen SOA may be sufficient in practical applications for users to perform the given information retrieval tasks, if information about the sample location is not relevant. If location information was provided through spatial playback of the samples, test subjects were able to simultaneously detect and localize a sample with reasonable accuracy.

References

Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. 1989. Earcons and icons: Their structure and common design principles. Hum.-Comput. Interact. 4, 11--44. Google ScholarDigital Library
Bonebright, T. and Nees, M. 2009. Most earcons do not interfere with spoken passage comprehension. Appl. Cognitive Psychol. 23, 3, 431--445.Google ScholarCross Ref
Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, Cambridge, MA.Google Scholar
Brewster, S. A. 2002. Overcoming the lack of screen space on mobile computers. Personal Ubiquitous Comput. 6, 188--205. Google ScholarDigital Library
Brewster, S. A., Raty, V.-P., and Kortekangas, A. 1995. Representing complex hierarchies with earcons. Tech. rep., ERCIM.Google Scholar
Brewster, S. A., Wright, P. C., and Edwards, A. D. N. 1993. An evaluation of earcons for use in auditory human computer interfaces. In Proceedings of the ACM CHI 93 Conference on Human Factors in Computing Systems. S. Ashlund et al., Eds., ACM, New York, 222--227. Google ScholarDigital Library
Brewster, S. A., Wright, P. C., and Edwards, A. D .N. 1995. Experimentally derived guidelines for the creation of earcons. In Proceedings of the BCS HCI. M. Kirby et al., Eds., Cambridge University Press, Cambridge, UK, 155--159.Google Scholar
Bronkhorst, A. W. 2000. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker condition. Acoustica 86, 117--128.Google Scholar
Brown, L., Brewster, S. A., Ramloll, R., Yu, W., and Riedel, B. 2002. Browsing modes for exploring sonified line graphs. In Proceedings of the BCS HCI Conference. X. Faulkner et al., Eds., Springer, Berlin, 6--9.Google Scholar
Brungart, D. S., Ericson, M., and Simpson, B. D. 2002. Design considerations for improving the effectiveness of multitalker speech displays. In Proceedings of the 8th International Conference on Auditory Display (ICAD'02). R. Nakatsu and H. Kawahara, Eds., Advanced Telecommunications Research Institute (ATR), Kyoto.Google Scholar
Brungart, D. S. and Simpson, B. D. 2002. The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal. J. Acoust. Soc. Amer. 112, 2, 664--676.Google ScholarCross Ref
Brungart, D. S., Simpson, B. D., Ericson, M. A., and Scott, K. R. 2001. Informational and energetic masking effects in the perception of multiple simultaneous talkers. J. Acoust. Soc. Amer. 110, 5, 2527--2538.Google ScholarCross Ref
Cherry, E. C. 1953. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Amer. 25, 5, 975--979.Google ScholarCross Ref
Darwin, C. J. and Hukin, R. W. 2000. Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. J. Acoust. Soc. Amer. 107, 2, 970--977.Google ScholarCross Ref
Dingler, T., Lindsay, J., and Walker, B. N. 2008. Learnabiltiy of sound cues for environmental features: Auditory icons, earcons, spearcons, and speech. In Proceedings of the 14th International Conference on Auditory Display.Google Scholar
Dudoit, S., Shaffer, J. P., and Boldrick, J. C. 2003. Multiple hypothesis testing in microarray experiments. Statist. Sci. 18, 1, 71--103.Google ScholarCross Ref
Garzonis, S., Jones, S., Jay, T., and O'Neill, E. 2009. Auditory icon and earcon mobile service notifications: Intuitiveness, learnability, memorability and preference. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI'09). S. Greenberg et al., Eds., ACM, New York, 1513--1522. Google ScholarDigital Library
Gaver, W. W. 1986. Auditory icons: Using sound in computer interfaces. Hum.-Comput. Interact. 2, 2, 167--177. Google ScholarDigital Library
Healey, C. G., Booth, K. S., and Enns, J. T. 1996. High-speed visual estimation using preattentive processing. ACM Trans. Comput.-Hum. Interact. 3, 2, 107--135. Google ScholarDigital Library
Hill, J. psignifit.http://www.bootstrap-software.org/psignifit/.Google Scholar
Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 6, 65--70.Google Scholar
Hornof, A. J., Zhang, Y., and Halverson, T. 2010. Knowing where and when to look in a time-critical multimodal dual task. In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI'10). ACM, New York, 2103--2112. Google ScholarDigital Library
Ihlefeld, A. and Shinn-Cunningham, B. 2008a. Spatial release from energetic and informational masking in a divided speech identification task. J. Acoust. Soc. Amer. 123, 6, 4380--4392.Google ScholarCross Ref
Ihlefeld, A. and Shinn-Cunningham, B. 2008b. Spatial release from energetic and informational masking in a selective speechidentification task. J. Acoust. Soc. Amer. 123, 6, 4369--4379.Google ScholarCross Ref
Julesz, B. and Bergen, J. R. 1987. Textons, the fundamental elements in preattentive vision and perception of textures. In Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, M. A. Fischler and O. Firschein, Eds., Morgan Kaufmann, San Francisco, CA, 243--256. Google ScholarDigital Library
Karshmer, A. I., Brawner, P., and Reiswig, G. 1994. An experimental sound-based hierarchical menu navigation system for visually handicapped use of graphical user interfaces. In Proceedings of the 1st Annual ACM Conference on Assistive Technologies (Assets'94). ACM, New York, 123--128. Google ScholarDigital Library
Kidd, G., Arbogast, T. L., Mason, C. R., and Gallun, F. J. 2005. The advantage of knowing where to listen. J. Acoust. Soc. Amer. 118, 6, 3804--3815.Google ScholarCross Ref
Kidd, J. G., Mason, C. R., Best, V., and Marrone, N. 2010. Stimulus factors influencing spatial release from speech-on-speech masking. J. Acoust. Soc. Amer. 128, 4, 1965--1978.Google ScholarCross Ref
Klein, S. A. 2001. Measuring, estimating, and understanding the psychometric function: A commentary. Percept. Psychophys. 63, 8, 1421--1455.Google ScholarCross Ref
Larsen, E., Iyer, N., Lansing, C. R., and Feng, A. S. 2008. On the minimum audible difference in direct-to-reverberant energy ratio. J. Acoust. Soc. Amer. 124, 1, 450--461.Google ScholarCross Ref
McGookin, D. and Brewster, S. A. 2004a. Space, the final front earcon: The identification of concurrently presented earcons in a synthetic spatialized auditory environment. In Proceedings of the 10th International Conference on Auditory Display (ICAD'04). S. Barrass and P. Vickers, Eds..Google Scholar
McGookin, D. and Brewster, S. A. 2004b. Understanding concurrent earcons: Applying auditory scene analysis principles to concurrent earcon recognition. ACM Trans. Appl. Percept. 1, 2, 130--155. Google ScholarDigital Library
Michalski, R. and Grobelny, J. 2008. The role of colour preattentive processing in human--computer interaction task efficiency: A preliminary study. Int. J. Industrial Ergonom. 38, 3--4, 321--332.Google ScholarCross Ref
Nees, M. and Walker, B. 2009. Auditory Interfaces and Sonification. In The Universal Access Handbook, C. Stephanidis, Ed., L. Erlbaum Assoc., Mahwah, NJ, 507--522.Google Scholar
Peres, S. C., Best, V., Brock, D., Frauenberger, C., Hermann, T., Neuhoff, J.G., Valgerdaur, L., Shinn-Cunningham, B., and Stockman, T. 2008. Auditory interfaces. In HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces, D. Penrose and M. James, Eds., Morgan Kaufmann, San Francisco, CA, 147--195.Google Scholar
Ramloll, R., Yu, W., Riedel, B., and Brewster, S. 2001. Using non-speech sounds to improve access to 2D tabular numerical information for visually impaired users. In Proceedings of the 15th Annual Conference of the British HCI Group. Springer, Berlin, 515--529.Google Scholar
Sagi, D. and Julesz, B. 1985. “Where” and “what” in vision. Science 228, 4704, 1217--1219.Google Scholar
Sawhney, N. and Schmandt, C. 2000. Nomadic radio: Speech and audio interaction for contextual messaging in nomadic environments. ACM Trans. Comput.-Hum. Interact. 7, 3, 353--383. Google ScholarDigital Library
Sheskin, D. 2000. Handbook of Parametric and Nonparametric Statistical Procedures. Chapman&Hall/CRC.Google Scholar
Shinn-Cunningham, B., Lehnert, H., Kramer, G., Wenzel, E., and Durlach, N. 1997. Auditory displays. In Binaural and Spatial Hearing in Real and Virtual Environments, R. H. Gilkey and T.R. Anderson, Eds., Lawrence Erlbaum Assoc., Mahwah, NJ, 611--663.Google Scholar
Therneau, T. M., Atkinson, B., and Ripley, B. 2011. rpart: Recursive partitioning. http://cran.r-project.org/package=rpart.Google Scholar
Tran, T., Letowski, T., and Abouchacra, K. 2000. Evaluation of acoustic beacon characteristics for navigation tasks. Ergonomics 43, 6, 807--827.Google ScholarCross Ref
Treisman, A. 1986. Preattentive processing in vision. In Papers from the 2nd Workshop. Vol. 13 on Human and Machine Vision II. Vol. 3. Academic Press, San Diego, CA, 313--334. Google ScholarDigital Library
Treutwein, B. and Strasburger, H. 1999. Fitting the psychometric function. Percept. Psychophys. 61, 1, 87--106.Google ScholarCross Ref
Vargas, M. L. M. and Anderson, S. 2003. Combining speech and earcons to assist menu navigation. In Proceedings of the 9th International Conference on Auditory Display (ICAD'03). E. Brazil and B. Shinn-Cunningham, Eds., Boston University Publications Boston, MA, 38--41.Google Scholar
Walker, B. N., Nance, A., and Lindsay, J. 2006. Spearcons: Speech-based earcons improve navigation performance in auditory menus. In Proceedings of the 12th International Conference on Auditory Display (ICAD'06). T. Stockman et al., Eds., 63--68.Google Scholar
Wichmann, F. A. and Hill, N. J. 2001. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept. Psychophys. 63, 8, 1293--1313.Google ScholarCross Ref

Index Terms

Recommendations

Immersive auditory display system 'sound cask': three-dimensional sound field reproduction system based on the boundary surface control principle
VRST '18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology

Sound cask was developed to realize the perfect 3D auditory display that creates 3D sound waves around the listener's head just the same as the primary sound field, based on the boundary surface control (BoSC) principle.

If we consider the sound ...
Read More
An RSS-feed auditory aggregator using earcons
AM '11: Proceedings of the 6th Audio Mostly Conference: A Conference on Interaction with Sound

In this work we present a data sonification framework based on parallel/concurrent sonic earcons' representations for monitoring in real-time information related to stock market. The information under consideration is conveyed through the well-known ...
Read More
Lyricon Lyrics + Earcons Improves Identification of Auditory Cues
Proceedings, Part II, of the 4th International Conference on Design, User Experience, and Usability: Users and Interactions - Volume 9187

Auditory researchers have developed various non-speech cues in designing auditory user interfaces. A preliminary study of "lyricons" lyricsï ź+ï źearcons [1] has provided a novel approach to devising auditory cues in electronic products, by combining ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Applied Perception Volume 10, Issue 1
February 2013
120 pages
ISSN:1544-3558
EISSN:1544-3965
DOI:10.1145/2422105
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 March 2013
- Revised: 1 September 2012
- Accepted: 1 September 2012
- Received: 1 November 2011
Published in tap Volume 10, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SOA
Spatial sound
diotic headphone playback
earcons
speech synthesis
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 289
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sound sample detection and numerosity estimation using auditory display

ACM Transactions on Applied Perception

Abstract

References

Cited By

Index Terms

Recommendations

Immersive auditory display system 'sound cask': three-dimensional sound field reproduction system based on the boundary surface control principle

An RSS-feed auditory aggregator using earcons

Lyricon Lyrics + Earcons Improves Identification of Auditory Cues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sound sample detection and numerosity estimation using auditory display

ACM Transactions on Applied Perception

Abstract

References

Cited By

Index Terms

Recommendations

Immersive auditory display system 'sound cask': three-dimensional sound field reproduction system based on the boundary surface control principle

An RSS-feed auditory aggregator using earcons

Lyricon Lyrics + Earcons Improves Identification of Auditory Cues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media