skip to main content
research-article

Sound sample detection and numerosity estimation using auditory display

Published:04 March 2013Publication History
Skip Abstract Section

Abstract

This article investigates the effect of various design parameters of auditory information display on user performance in two basic information retrieval tasks. We conducted a user test with 22 participants in which sets of sound samples were presented. In the first task, the test participants were asked to detect a given sample among a set of samples. In the second task, the test participants were asked to estimate the relative number of instances of a given sample in two sets of samples. We found that the stimulus onset asynchrony (SOA) of the sound samples had a significant effect on user performance in both tasks. For the sample detection task, the average error rate was about 10% with an SOA of 100 ms. For the numerosity estimation task, an SOA of at least 200 ms was necessary to yield average error rates lower than 30%. Other parameters, including the samples' sound type (synthesized speech or earcons) and spatial quality (multichannel loudspeaker or diotic headphone playback), had no substantial effect on user performance. These results suggest that diotic, or indeed monophonic, playback with appropriately chosen SOA may be sufficient in practical applications for users to perform the given information retrieval tasks, if information about the sample location is not relevant. If location information was provided through spatial playback of the samples, test subjects were able to simultaneously detect and localize a sample with reasonable accuracy.

References

  1. Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. 1989. Earcons and icons: Their structure and common design principles. Hum.-Comput. Interact. 4, 11--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bonebright, T. and Nees, M. 2009. Most earcons do not interfere with spoken passage comprehension. Appl. Cognitive Psychol. 23, 3, 431--445.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  4. Brewster, S. A. 2002. Overcoming the lack of screen space on mobile computers. Personal Ubiquitous Comput. 6, 188--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brewster, S. A., Raty, V.-P., and Kortekangas, A. 1995. Representing complex hierarchies with earcons. Tech. rep., ERCIM.Google ScholarGoogle Scholar
  6. Brewster, S. A., Wright, P. C., and Edwards, A. D. N. 1993. An evaluation of earcons for use in auditory human computer interfaces. In Proceedings of the ACM CHI 93 Conference on Human Factors in Computing Systems. S. Ashlund et al., Eds., ACM, New York, 222--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brewster, S. A., Wright, P. C., and Edwards, A. D .N. 1995. Experimentally derived guidelines for the creation of earcons. In Proceedings of the BCS HCI. M. Kirby et al., Eds., Cambridge University Press, Cambridge, UK, 155--159.Google ScholarGoogle Scholar
  8. Bronkhorst, A. W. 2000. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker condition. Acoustica 86, 117--128.Google ScholarGoogle Scholar
  9. Brown, L., Brewster, S. A., Ramloll, R., Yu, W., and Riedel, B. 2002. Browsing modes for exploring sonified line graphs. In Proceedings of the BCS HCI Conference. X. Faulkner et al., Eds., Springer, Berlin, 6--9.Google ScholarGoogle Scholar
  10. Brungart, D. S., Ericson, M., and Simpson, B. D. 2002. Design considerations for improving the effectiveness of multitalker speech displays. In Proceedings of the 8th International Conference on Auditory Display (ICAD'02). R. Nakatsu and H. Kawahara, Eds., Advanced Telecommunications Research Institute (ATR), Kyoto.Google ScholarGoogle Scholar
  11. Brungart, D. S. and Simpson, B. D. 2002. The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal. J. Acoust. Soc. Amer. 112, 2, 664--676.Google ScholarGoogle ScholarCross RefCross Ref
  12. Brungart, D. S., Simpson, B. D., Ericson, M. A., and Scott, K. R. 2001. Informational and energetic masking effects in the perception of multiple simultaneous talkers. J. Acoust. Soc. Amer. 110, 5, 2527--2538.Google ScholarGoogle ScholarCross RefCross Ref
  13. Cherry, E. C. 1953. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Amer. 25, 5, 975--979.Google ScholarGoogle ScholarCross RefCross Ref
  14. Darwin, C. J. and Hukin, R. W. 2000. Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. J. Acoust. Soc. Amer. 107, 2, 970--977.Google ScholarGoogle ScholarCross RefCross Ref
  15. Dingler, T., Lindsay, J., and Walker, B. N. 2008. Learnabiltiy of sound cues for environmental features: Auditory icons, earcons, spearcons, and speech. In Proceedings of the 14th International Conference on Auditory Display.Google ScholarGoogle Scholar
  16. Dudoit, S., Shaffer, J. P., and Boldrick, J. C. 2003. Multiple hypothesis testing in microarray experiments. Statist. Sci. 18, 1, 71--103.Google ScholarGoogle ScholarCross RefCross Ref
  17. Garzonis, S., Jones, S., Jay, T., and O'Neill, E. 2009. Auditory icon and earcon mobile service notifications: Intuitiveness, learnability, memorability and preference. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI'09). S. Greenberg et al., Eds., ACM, New York, 1513--1522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gaver, W. W. 1986. Auditory icons: Using sound in computer interfaces. Hum.-Comput. Interact. 2, 2, 167--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Healey, C. G., Booth, K. S., and Enns, J. T. 1996. High-speed visual estimation using preattentive processing. ACM Trans. Comput.-Hum. Interact. 3, 2, 107--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hill, J. psignifit.http://www.bootstrap-software.org/psignifit/.Google ScholarGoogle Scholar
  21. Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 6, 65--70.Google ScholarGoogle Scholar
  22. Hornof, A. J., Zhang, Y., and Halverson, T. 2010. Knowing where and when to look in a time-critical multimodal dual task. In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI'10). ACM, New York, 2103--2112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ihlefeld, A. and Shinn-Cunningham, B. 2008a. Spatial release from energetic and informational masking in a divided speech identification task. J. Acoust. Soc. Amer. 123, 6, 4380--4392.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ihlefeld, A. and Shinn-Cunningham, B. 2008b. Spatial release from energetic and informational masking in a selective speechidentification task. J. Acoust. Soc. Amer. 123, 6, 4369--4379.Google ScholarGoogle ScholarCross RefCross Ref
  25. Julesz, B. and Bergen, J. R. 1987. Textons, the fundamental elements in preattentive vision and perception of textures. In Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, M. A. Fischler and O. Firschein, Eds., Morgan Kaufmann, San Francisco, CA, 243--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Karshmer, A. I., Brawner, P., and Reiswig, G. 1994. An experimental sound-based hierarchical menu navigation system for visually handicapped use of graphical user interfaces. In Proceedings of the 1st Annual ACM Conference on Assistive Technologies (Assets'94). ACM, New York, 123--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kidd, G., Arbogast, T. L., Mason, C. R., and Gallun, F. J. 2005. The advantage of knowing where to listen. J. Acoust. Soc. Amer. 118, 6, 3804--3815.Google ScholarGoogle ScholarCross RefCross Ref
  28. Kidd, J. G., Mason, C. R., Best, V., and Marrone, N. 2010. Stimulus factors influencing spatial release from speech-on-speech masking. J. Acoust. Soc. Amer. 128, 4, 1965--1978.Google ScholarGoogle ScholarCross RefCross Ref
  29. Klein, S. A. 2001. Measuring, estimating, and understanding the psychometric function: A commentary. Percept. Psychophys. 63, 8, 1421--1455.Google ScholarGoogle ScholarCross RefCross Ref
  30. Larsen, E., Iyer, N., Lansing, C. R., and Feng, A. S. 2008. On the minimum audible difference in direct-to-reverberant energy ratio. J. Acoust. Soc. Amer. 124, 1, 450--461.Google ScholarGoogle ScholarCross RefCross Ref
  31. McGookin, D. and Brewster, S. A. 2004a. Space, the final front earcon: The identification of concurrently presented earcons in a synthetic spatialized auditory environment. In Proceedings of the 10th International Conference on Auditory Display (ICAD'04). S. Barrass and P. Vickers, Eds..Google ScholarGoogle Scholar
  32. McGookin, D. and Brewster, S. A. 2004b. Understanding concurrent earcons: Applying auditory scene analysis principles to concurrent earcon recognition. ACM Trans. Appl. Percept. 1, 2, 130--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michalski, R. and Grobelny, J. 2008. The role of colour preattentive processing in human--computer interaction task efficiency: A preliminary study. Int. J. Industrial Ergonom. 38, 3--4, 321--332.Google ScholarGoogle ScholarCross RefCross Ref
  34. Nees, M. and Walker, B. 2009. Auditory Interfaces and Sonification. In The Universal Access Handbook, C. Stephanidis, Ed., L. Erlbaum Assoc., Mahwah, NJ, 507--522.Google ScholarGoogle Scholar
  35. Peres, S. C., Best, V., Brock, D., Frauenberger, C., Hermann, T., Neuhoff, J.G., Valgerdaur, L., Shinn-Cunningham, B., and Stockman, T. 2008. Auditory interfaces. In HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces, D. Penrose and M. James, Eds., Morgan Kaufmann, San Francisco, CA, 147--195.Google ScholarGoogle Scholar
  36. Ramloll, R., Yu, W., Riedel, B., and Brewster, S. 2001. Using non-speech sounds to improve access to 2D tabular numerical information for visually impaired users. In Proceedings of the 15th Annual Conference of the British HCI Group. Springer, Berlin, 515--529.Google ScholarGoogle Scholar
  37. Sagi, D. and Julesz, B. 1985. “Where” and “what” in vision. Science 228, 4704, 1217--1219.Google ScholarGoogle Scholar
  38. Sawhney, N. and Schmandt, C. 2000. Nomadic radio: Speech and audio interaction for contextual messaging in nomadic environments. ACM Trans. Comput.-Hum. Interact. 7, 3, 353--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sheskin, D. 2000. Handbook of Parametric and Nonparametric Statistical Procedures. Chapman&Hall/CRC.Google ScholarGoogle Scholar
  40. Shinn-Cunningham, B., Lehnert, H., Kramer, G., Wenzel, E., and Durlach, N. 1997. Auditory displays. In Binaural and Spatial Hearing in Real and Virtual Environments, R. H. Gilkey and T.R. Anderson, Eds., Lawrence Erlbaum Assoc., Mahwah, NJ, 611--663.Google ScholarGoogle Scholar
  41. Therneau, T. M., Atkinson, B., and Ripley, B. 2011. rpart: Recursive partitioning. http://cran.r-project.org/package=rpart.Google ScholarGoogle Scholar
  42. Tran, T., Letowski, T., and Abouchacra, K. 2000. Evaluation of acoustic beacon characteristics for navigation tasks. Ergonomics 43, 6, 807--827.Google ScholarGoogle ScholarCross RefCross Ref
  43. Treisman, A. 1986. Preattentive processing in vision. In Papers from the 2nd Workshop. Vol. 13 on Human and Machine Vision II. Vol. 3. Academic Press, San Diego, CA, 313--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Treutwein, B. and Strasburger, H. 1999. Fitting the psychometric function. Percept. Psychophys. 61, 1, 87--106.Google ScholarGoogle ScholarCross RefCross Ref
  45. Vargas, M. L. M. and Anderson, S. 2003. Combining speech and earcons to assist menu navigation. In Proceedings of the 9th International Conference on Auditory Display (ICAD'03). E. Brazil and B. Shinn-Cunningham, Eds., Boston University Publications Boston, MA, 38--41.Google ScholarGoogle Scholar
  46. Walker, B. N., Nance, A., and Lindsay, J. 2006. Spearcons: Speech-based earcons improve navigation performance in auditory menus. In Proceedings of the 12th International Conference on Auditory Display (ICAD'06). T. Stockman et al., Eds., 63--68.Google ScholarGoogle Scholar
  47. Wichmann, F. A. and Hill, N. J. 2001. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept. Psychophys. 63, 8, 1293--1313.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Sound sample detection and numerosity estimation using auditory display

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Applied Perception
                  ACM Transactions on Applied Perception  Volume 10, Issue 1
                  February 2013
                  120 pages
                  ISSN:1544-3558
                  EISSN:1544-3965
                  DOI:10.1145/2422105
                  Issue’s Table of Contents

                  Copyright © 2013 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 4 March 2013
                  • Revised: 1 September 2012
                  • Accepted: 1 September 2012
                  • Received: 1 November 2011
                  Published in tap Volume 10, Issue 1

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader