skip to main content
10.1145/3613904.3642062acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

A Human-AI Collaborative Approach for Designing Sound Awareness Systems

Published: 11 May 2024 Publication History

Abstract

Current sound recognition systems for deaf and hard of hearing (DHH) people identify sound sources or discrete events. However, these systems do not distinguish similar sounding events (e.g., a patient monitor beep vs. a microwave beep). In this paper, we introduce HACS, a novel futuristic approach to designing human-AI sound awareness systems. HACS assigns AI models to identify sounds based on their characteristics (e.g., a beep) and prompts DHH users to use this information and their contextual knowledge (e.g., “I am in a kitchen”) to recognize sound events (e.g., a microwave). As a first step for implementing HACS, we articulated a sound taxonomy that classifies sounds based on sound characteristics using insights from a multi-phased research process with people of mixed hearing abilities. We then performed a qualitative (with 9 DHH people) and a quantitative (with a sound recognition model) evaluation. Findings demonstrate the initial promise of HACS for designing accurate and reliable human-AI systems.

Supplemental Material

MP4 File - Video Presentation
Video Presentation
Transcript for: Video Presentation
ZIP File - Study Materials
A compressed folder containing: - Formative-Codebook.pdf: The final codebook for the formative study - Interview Protocol - Formative.pdf: The interview protocol we used for the formative study - PE1-Codebook.pdf: The final codebook for the Preliminary Evaluation 1 - Interview Protocol - Formative.pdf: The interview protocol we used for the Preliminary Evaluation 1

References

[1]
Oliver Alonzo, Hijung Valentina Shin, and Dingzeyu Li. 2022. Beyond Subtitles: Captioning and Visualizing Non-speech Sounds to Improve Accessibility of User-Generated Videos. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’22), October 22, 2022, New York, NY, USA. Association for Computing Machinery, New York, NY, USA, 1–12. . https://doi.org/10.1145/3517428.3544808
[2]
Oliver Bones, Trevor J. Cox, and William J. Davies. 2018. Sound Categories: Category Formation and Evidence-Based Taxonomies. Front. Psychol. 9, (July 2018), 1277. https://doi.org/10.3389/fpsyg.2018.01277
[3]
Danielle Bragg, Nicholas Huynh, and Richard E. Ladner. 2016. A Personalizable Mobile Sound Detector App Design for Deaf and Hard-of-Hearing Users. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility, October 23, 2016, Reno Nevada USA. ACM, Reno Nevada USA, 3–13. . https://doi.org/10.1145/2982142.2982171
[4]
Virginia Braun and Victoria Clarke. 2021. Thematic Analysis: A Practical Guide. SAGE Publications.
[5]
A. L. Brown, Jian Kang, and Truls Gjestland. 2011. Towards standardization in soundscape preference assessment. Applied Acoustics 72, 6 (May 2011), 387–392. https://doi.org/10.1016/j.apacoust.2011.01.001
[6]
Anna Cavender and Richard E. Ladner. 2008. Hearing Impairments. In Web Accessibility: A Foundation for Research, Simon Harper and Yeliz Yesilada (eds.). Springer, London, 25–35. https://doi.org/10.1007/978-1-84800-050-6_3
[7]
Hang Do, Quan Dang, Jeremy Zhengqi Huang, and Dhruv Jain. 2023. AdaptiveSound: An Interactive Feedback-Loop System to Improve Sound Recognition for Deaf and Hard of Hearing Users. In The 25th International ACM SIGACCESS Conference on Computers and Accessibility, October 22, 2023, New York NY USA. ACM, New York NY USA, 1–12. . https://doi.org/10.1145/3597638.3608390
[8]
Leah Findlater, Bonnie Chinh, Dhruv Jain, Jon Froehlich, Raja Kushalnagar, and Angela Carey Lin. 2019. Deaf and Hard-of-hearing Individuals’ Preferences for Wearable and Mobile Sound Awareness Technologies. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, May 02, 2019, Glasgow Scotland Uk. ACM, Glasgow Scotland Uk, 1–13. . https://doi.org/10.1145/3290605.3300276
[9]
William W. Gaver. 1993. What in the World Do We Hear?: An Ecological Approach to Auditory Event Perception. Ecological Psychology 5, 1 (March 1993), 1–29. https://doi.org/10.1207/s15326969eco0501_1
[10]
Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio Set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, New Orleans, LA. IEEE, New Orleans, LA, 776–780. . https://doi.org/10.1109/ICASSP.2017.7952261
[11]
Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin Wilson. 2017. CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017. 131–135. . https://doi.org/10.1109/ICASSP.2017.7952132
[12]
Joseph Hill, Diane Lillo-Martin, and Sandra Wood. 2018. Sign Languages: Structures and Contexts. Routledge, London. https://doi.org/10.4324/9780429020872
[13]
Jeremy Zhengqi Huang, Hriday Chhabria, and Dhruv Jain. 2023. “Not There Yet”: Feasibility and Challenges of Mobile Sound Recognition to Support Deaf and Hard-of-Hearing People. In The 25th International ACM SIGACCESS Conference on Computers and Accessibility, October 22, 2023, New York NY USA. ACM, New York NY USA, 1–14. . https://doi.org/10.1145/3597638.3608431
[14]
Lingjiang Huang and Jian Kang. 2015. The sound environment and soundscape preservation in historic city centres—the case study of Lhasa. Environ Plann B Plann Des 42, 4 (July 2015), 652–674. https://doi.org/10.1068/b130073p
[15]
Dhruv Jain, Kelly Mack, Akli Amrous, Matt Wright, Steven Goodman, Leah Findlater, and Jon E. Froehlich. 2020. HomeSound: An Iterative Field Deployment of an In-Home Sound Awareness System for Deaf or Hard of Hearing Users. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, April 21, 2020, Honolulu HI USA. ACM, Honolulu HI USA, 1–12. . https://doi.org/10.1145/3313831.3376758
[16]
Dhruv Jain, Hung Ngo, Pratyush Patel, Steven Goodman, Leah Findlater, and Jon Froehlich. 2020. SoundWatch: Exploring Smartwatch-based Deep Learning Approaches to Support Sound Awareness for Deaf and Hard of Hearing Users. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility, October 26, 2020, Virtual Event Greece. ACM, Virtual Event Greece, 1–13. . https://doi.org/10.1145/3373625.3416991
[17]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (December 2014). Retrieved December 11, 2023 from https://www.semanticscholar.org/paper/Adam%3A-A-Method-for-Stochastic-Optimization-Kingma-Ba/a6cb366736791bcccc5c8639de5a8f9636bf87e8
[18]
Gierad Laput, Karan Ahuja, Mayank Goel, and Chris Harrison. 2018. Ubicoustics: Plug-and-Play Acoustic Activity Recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, October 11, 2018, Berlin Germany. ACM, Berlin Germany, 213–224. . https://doi.org/10.1145/3242587.3242609
[19]
Scott K. Liddell. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511615054
[20]
PerMagnus Lindborg. 2016. A taxonomy of sound sources in restaurants. Applied Acoustics 110, (September 2016), 297–310. https://doi.org/10.1016/j.apacoust.2016.03.032
[21]
Lloyd May, So Yeon Park, and Jonathan Berger. 2023. Enhancing Non-Speech Information Communicated in Closed Captioning Through Critical Design. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’23), October 22, 2023, New York, NY, USA. Association for Computing Machinery, New York, NY, USA, 1–14. . https://doi.org/10.1145/3597638.3608398
[22]
Dalibor Mitrović, Matthias Zeppelzauer, and Christian Breiteneder. 2010. Chapter 3 - Features for Content-Based Audio Retrieval. In Advances in Computers. Elsevier, 71–150. https://doi.org/10.1016/S0065-2458(10)78003-7
[23]
T. Nakatani and HIroshi G. Okuno. 1998. Sound Ontology for Computational Auditory Scence Analysis. July 01, 1998. . Retrieved August 30, 2023 from https://www.semanticscholar.org/paper/Sound-Ontology-for-Computational-Auditory-Scence-Nakatani-Okuno/c81832ddcaba13f595510b8338f40fabf535ebbb
[24]
Michael Oliver. 1996. Understanding Disability. Macmillan Education UK, London. https://doi.org/10.1007/978-1-349-24269-6
[25]
Carol Padden and Tom Humphries. 1990. Deaf in America: Voices from a Culture. Harvard University Press, Cambridge, MA.
[26]
R. S. Rosen. 2007. Representations of Sound in American Deaf Literature. Journal of Deaf Studies and Deaf Education 12, 4 (April 2007), 552–565. https://doi.org/10.1093/deafed/enm010
[27]
Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. 2014. A Dataset and Taxonomy for Urban Sound Research. In Proceedings of the 22nd ACM international conference on Multimedia (MM ’14), November 03, 2014, New York, NY, USA. Association for Computing Machinery, New York, NY, USA, 1041–1044. . https://doi.org/10.1145/2647868.2655045
[28]
Wendy Sandler and Diane Lillo-Martin. 2006. Sign Language and Linguistic Universals. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781139163910
[29]
R. Murray Schafer. 1993. The Soundscape. Retrieved September 1, 2023 from https://www.simonandschuster.com/books/The-Soundscape/R-Murray-Schafer/9780892814558
[30]
O. M. Strand and A. Egeberg. 2004. Cepstral mean and variance normalization in the model domain. 2004. . Retrieved September 14, 2023 from https://www.semanticscholar.org/paper/Cepstral-mean-and-variance-normalization-in-the-Strand-Egeberg/0de27e275803a000babcfa5c06c0683ee1df76e0
[31]
Clayton Valli and Ceil Lucas. 2000. Linguistics of American Sign Language: An Introduction. Gallaudet University Press.
[32]
A. M. Young. 1999. Hearing parents’ adjustment to a deaf child-the impact of a cultural-linguistic model of deafness. Journal of Social Work Practice 13, 2 (November 1999), 157–176. https://doi.org/10.1080/026505399103386
[33]
2010. ReCal2: Reliability for 2 Coders – Deen Freelon, Ph.D. Retrieved September 11, 2023 from http://dfreelon.org/utils/recalfront/recal2/
[34]
Freesound - Freesound. Retrieved September 11, 2023 from https://freesound.org/

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
May 2024
18961 pages
ISBN:9798400703300
DOI:10.1145/3613904
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHI '24

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 798
    Total Downloads
  • Downloads (Last 12 months)798
  • Downloads (Last 6 weeks)73
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media