ABSTRACT
Sound recognition tools have wide-ranging impacts for deaf and hard of hearing (DHH) people from being informed of safety-critical information (e.g., fire alarms, sirens) to more mundane but still useful information (e.g., door knock, microwave beeps). However, prior sound recognition systems use models that are pre-trained on generic sound datasets and do not adapt well to diverse variations of real-world sounds. We introduce AdaptiveSound, a real-time system for portable devices (e.g., smartphones) that allows DHH users to provide corrective feedback to the sound recognition model to adapt the model to diverse acoustic environments. AdaptiveSound is informed by prior surveys of sound recognition systems, where DHH users strongly desired the ability to provide feedback to a pre-trained sound recognition model to fine-tune it to their environments. Through quantitative experiments and field evaluations with 12 DHH users, we show that AdaptiveSound can achieve a significantly higher accuracy (+14.6%) than prior state-of-the art systems in diverse real-world locations (e.g., homes, parks, streets, and malls) with little end-user effort (about 10 minutes of feedback).
- Adavanne, S., Politis, A. and Virtanen, T. 2019. TAU Moving Sound Events 2019 - Ambisonic, Anechoic, Synthetic IR and Moving Source Dataset [Data set]. Zenodo.Google Scholar
- AudioSet Label Accuracy: https://research.google.com/audioset/dataset/index.html. Accessed: 2021-04-06.Google Scholar
- BBC Sound Effects: http://bbcsfx.acropolis.org.uk/. Accessed: 2019-09-18.Google Scholar
- Bragg, D., Huynh, N. and Ladner, R.E. 2016. A Personalizable Mobile Sound Detector App Design for Deaf and Hard-of-Hearing Users. Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (New York, New York, USA, 2016), 3–13.Google ScholarDigital Library
- Cavender, A. and Ladner, R.E. 2008. Hearing impairments. Web accessibility. Springer. 25–35.Google Scholar
- Cavender, A. and Ladner, R.E. 2008. Hearing impairments. Web accessibility. Springer. 25–35.Google Scholar
- Chrysos, G.G., Kossaifi, J. and Zafeiriou, S. 2018. Robust conditional generative adversarial networks. arXiv preprint arXiv:1805.08657. (2018).Google Scholar
- [8] Findlater, L., Chinh, B., Jain, D., Froehlich, J., Kushalnagar, R. and Lin, A.C. 2019. Deaf and Hard-of-hearing Individuals’ Preferences for Wearable and Mobile Sound Awareness Technologies. SIGCHI Conference on Human Factors in Computing Systems (CHI). (2019), 1–13.Google Scholar
- Finn, C., Abbeel, P. and Levine, S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. International Conference on Machine Learning (2017), 1126–1135.Google Scholar
- Fonseca, E., Pons Puig, J., Favory, X., Font Corbera, F., Bogdanov, D., Ferraro, A., Oramas, S., Porter, A. and Serra, X. 2017. Freesound datasets: a platform for the creation of open audio datasets. Hu X, Cunningham SJ, Turnbull D, Duan Z, editors. Proceedings of the 18th ISMIR Conference; 2017 oct 23-27; Suzhou, China.[Canada]: International Society for Music Information Retrieval; 2017. p. 486-93. (2017).Google Scholar
- Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M. and Ritter, M. 2017. Audio set: An ontology and human-labeled dataset for audio events. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), 776–780.Google ScholarDigital Library
- Gepperth, A. and Hammer, B. 2016. Incremental learning algorithms and applications. European symposium on artificial neural networks (ESANN) (2016).Google Scholar
- Goodman, S.M., Liu, P., Jain, D., McDonnell, E.J., Froehlich, J.E. and Findlater, L. 2021. Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 5, 2 (2021), 1–23.Google ScholarDigital Library
- Goodman, S.M., Liu, P., Jain, D., McDonnell, E.J., Froehlich, J.E. and Findlater, L. 2021. Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 5, 2 (2021), 1–23.Google ScholarDigital Library
- Gosain, A. and Sardana, S. 2017. Handling class imbalance problem using oversampling techniques: A review. 2017 international conference on advances in computing, communications and informatics (ICACCI) (2017), 79–85.Google Scholar
- Hands-on with iOS 14’s Sound Recognition feature that listens for doorbells, smoke alarms, more: https://9to5mac.com/2020/10/28/how-to-use-iphone-sound-recognition-ios-14/. Accessed: 2021-03-08.Google Scholar
- Jain, D., Lin, A.C., Amalachandran, M., Zeng, A., Guttman, R., Findlater, L. and Froehlich, J. 2019. Exploring Sound Awareness in the Home for People who are Deaf or Hard of Hearing. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019), 94:1-94:13.Google ScholarDigital Library
- Jain, D., Mack, K., Amrous, A., Wright, M., Goodman, S., Findlater, L. and Froehlich, J.E. 2020. HomeSound: An Iterative Field Deployment of an In-Home Sound Awareness System for Deaf or Hard of Hearing Users. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2020), 1–12.Google ScholarDigital Library
- Jain, D., Ngo, H., Patel, P., Goodman, S., Findlater, L. and Froehlich, J. 2020. SoundWatch: Exploring Smartwatch-based Deep Learning Approaches to Support Sound Awareness for Deaf and Hard of Hearing Users. ACM SIGACCESS conference on Computers and accessibility (2020), 1–13.Google ScholarDigital Library
- Jain, D., Nguyen, K., Goodman, S., Grossman-Kahn, R., Ngo, H., Kusupati, A., Du, R., Olwal, A., Findlater, L. and Froehlich, J. 2021. ProtoSound: A Personalized, Scalable Sound Recognition System for d/Deaf and Hard of Hearing Users. SIGCHI Conference on Human Factors in Computing Systems (CHI) (2021), 1–16.Google Scholar
- Karbasi, M., Ahadi, S.M. and Bahmanian, M. 2011. Environmental sound classification using spectral dynamic features. 2011 8th International Conference on Information, Communications & Signal Processing (2011), 1–5.Google ScholarCross Ref
- Kay, M., Kola, T., Hullman, J.R. and Munson, S.A. 2016. When (ish) is my bus? user-centered visualizations of uncertainty in everyday, mobile predictive systems. Proceedings of the 2016 chi conference on human factors in computing systems (2016), 5092–5103.Google ScholarDigital Library
- Kingma, D.P. and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. (2014).Google Scholar
- Krippendorff, K. 2018. Content analysis: An introduction to its methodology. Sage publications.Google Scholar
- Ladd, P. and Lane, H. 2013. Deaf ethnicity, deafhood, and their relationship. Sign Language Studies. 13, 4 (2013), 565–579.Google ScholarCross Ref
- Ladd, P. and Lane, H. 2013. Deaf ethnicity, deafhood, and their relationship. Sign Language Studies. 13, 4 (2013), 565–579.Google ScholarCross Ref
- Laput, G., Ahuja, K., Goel, M. and Harrison, C. 2018. Ubicoustics: Plug-and-play acoustic activity recognition. The 31st Annual ACM Symposium on User Interface Software and Technology (2018), 213–224.Google Scholar
- Live Transcribe & Sound Notifications – Apps on Google Play: https://play.google.com/store/apps/details?id=com.google.audio.hearing.visualization.accessibility.scribe. Accessed: 2021-04-04.Google Scholar
- Matthews, T., Fong, J., Ho-Ching, F.W.-L. and Mankoff, J. 2006. Evaluating non-speech sound visualizations for the deaf. Behaviour & Information Technology. 25, 4 (Jul. 2006), 333–351.Google ScholarCross Ref
- Mesaros, A., Heittola, T. and Virtanen, T. 2016. TUT Sound events 2016.Google Scholar
- Mirza, M. and Osindero, S. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. (2014).Google Scholar
- Moore, M.S. 1992. For Hearing people only: Answers to some of the most commonly asked questions about the Deaf community, its culture, and the" Deaf Reality". Deaf Life Press.Google Scholar
- Narwane, S. V and Sawarkar, S.D. 2019. Machine learning and class imbalance: A literature survey. Ind. Eng. J. 12, (2019).Google Scholar
- Network Sound Effects Library: https://www.sound-ideas.com/Product/199/Network-Sound-Effects-Library. Accessed: 2019-09-15.Google Scholar
- Piczak, K.J. 2015. ESC: Dataset for environmental sound classification. Proceedings of the 23rd ACM international conference on Multimedia (2015), 1015–1018.Google ScholarDigital Library
- Ramos, G., Meek, C., Simard, P., Suh, J. and Ghorashi, S. 2020. Interactive machine teaching: a human-centered approach to building machine-learned models. Human–Computer Interaction. 35, 5–6 (2020), 413–451.Google Scholar
- Ribicic, H., Waser, J., Gurbat, R., Sadransky, B. and Gröller, M.E. 2012. Sketching uncertainty into simulations. IEEE Transactions on Visualization and Computer Graphics. 18, 12 (2012), 2255–2264.Google ScholarDigital Library
- Salamon, J., Jacoby, C. and Bello, J.P. 2014. A Dataset and Taxonomy for Urban Sound Research. 22nd {ACM} International Conference on Multimedia (ACM-MM’14) (Orlando, FL, USA, 2014), 1041–1044.Google Scholar
- Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.-C. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 4510–4520.Google Scholar
- Sicong, L., Zimu, Z., Junzhao, D., Longfei, S., Han, J. and Wang, X. 2017. UbiEar: Bringing Location-independent Sound Awareness to the Hard-of-hearing People with Smartphones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 1, 2 (2017), 17.Google ScholarDigital Library
- Strand, O.M. and Egeberg, A. 2004. Cepstral mean and variance normalization in the model domain. COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction (2004).Google Scholar
- Torrey, L. and Shavlik, J. 2010. Transfer learning. Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global. 242–264.Google Scholar
- UPC-TALP dataset: http://www.talp.upc.edu/content/upc-talp-database-isolated-meeting-room-acoustic-events. Accessed: 2019-09-18.Google Scholar
- Wang, Y., Yao, Q., Kwok, J.T. and Ni, L.M. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys (CSUR). 53, 3 (2020), 1–34.Google ScholarDigital Library
- Wang, Z., Chen, C. and Dong, D. 2021. Lifelong incremental reinforcement learning with online Bayesian inference. IEEE Transactions on Neural Networks and Learning Systems. 33, 8 (2021), 4003–4016.Google ScholarCross Ref
- Wardekker, J.A., van der Sluijs, J.P., Janssen, P.H.M., Kloprogge, P. and Petersen, A.C. 2008. Uncertainty communication in environmental assessments: views from the Dutch science-policy interface. Environmental science & policy. 11, 7 (2008), 627–641.Google Scholar
- Wu, J., Harrison, C., Bigham, J.P. and Laput, G. 2020. Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020), 1–14.Google ScholarDigital Library
- Wu, X., Huang, W., Wu, X., Wu, S. and Huang, J. 2022. Classification of thermal image of clinical burn based on incremental reinforcement learning. Neural Computing and Applications. (2022), 1–14.Google Scholar
- Zajadacz, A. 2015. Evolution of models of disability as a basis for further policy changes in accessible tourism. Journal of Tourism Futures. 1, 3 (2015), 189–202.Google ScholarCross Ref
Index Terms
- AdaptiveSound: An Interactive Feedback-Loop System to Improve Sound Recognition for Deaf and Hard of Hearing Users
Recommendations
SoundWatch: Exploring Smartwatch-based Deep Learning Approaches to Support Sound Awareness for Deaf and Hard of Hearing Users
ASSETS '20: Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and AccessibilitySmartwatches have the potential to provide glanceable, always-available sound feedback to people who are deaf or hard of hearing. In this paper, we present a performance evaluation of four low-resource deep learning sound classification models: MobileNet,...
ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing SystemsRecent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We ...
How people who are deaf, Deaf, and hard of hearing use technology in creative sound activities
ASSETS '22: Proceedings of the 24th International ACM SIGACCESS Conference on Computers and AccessibilityCreative sound activities, such as music playing and audio engineering, are said to have been democratized with the development of technology. Yet, the use of technology in creative sound activities by people who are deaf, Deaf, and hard of hearing (DHH)...
Comments