Abstract
As sensor-based inference models move out of laboratories into the real-world, it is of crucial importance that these models retain their performance under changing hardware and environment conditions that are expected to occur in-the-wild. This chapter motivates this challenging research problem in the context of audio sensing models, by presenting three empirical studies which evaluate the impact of hardware and environment variabilities on cloud-scale as well as embedded-scale audio models. Our results show that even the state-of-the-art deep learning models show significant performance degradation in the presence of ambient acoustic noise, and more surprisingly under scenarios of microphone variability, with accuracy losses as high as 15% in some scenarios. Further, we provide intuition on how this problem of model robustness relates to the broader topic of dataset-shift in the machine learning literature, and highlight future research directions for the mobile sensing community which include the investigation of domain adaptation and domain generalization solutions in the context of sensing systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We chose this speaker due to its flat frequency response in the human speech frequency range.
References
Nokia sleep tracker (2018). https://health.nokia.com/uk/en/sleep/
FitBit (2017). https://www.fitbit.com
New technology tracks food intake by monitoring wrist movements. http://gadgetsandwearables.com/2017/03/29/food-tracking/ (2017). Accessed 20 June 2019 10:48:03
Empatica wristband (2018). https://www.empatica.com/en-gb/research/e4/. Accessed 20 June 2019 10:48:03
Narrative clip (2017). http://getnarrative.com/narrative-clip-1. Accessed 1 Sept 2017
Google glass (2016). https://developers.google.com/glass/distribute/glass-at-work. Accessed 20 June 2019 10:48:03
Hao T, Xing G, Zhou G (ACM, 2013), SenSys’13. https://doi.org/10.1145/2517351.2517359
Lu HEA (2019) Proceedings of Sensys ’10. ACM, pp 71–84
Blunck H et al (2013) In: Proceedings of the 2013 ACM Ubicomp. ACM, pp 1087–1098
Rachuri K et al (2010) In: Proceedings of Ubicomp’10. ACM, pp 281–290
Amft O, Stäger M, Lukowicz P, Tröster G (2005) In: Ubicomp Springer, pp 56–72
Variani E, Lei X, McDermott E, Moreno IL, Gonzalez-Dominguez J (2014) In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4052–4056
Chen G, Parada C, Heigold G (2014) In: ICASSP. IEEE, pp 4087–4091
Hardware to emulate amazon echo. https://tinyurl.com/y84d6r2n/
Stisen A et al (2015) In: Proceedings of Sensys. ACM, pp 127–140
Chon Y et al (2013) In: Ubicomp. ACM, pp 3–12
Lee Y, Min C, Hwang J, Lee I, Hwang Y, Ju C, Yoo M, Moon U, Lee J, Song J (2013) In: Proceeding of Mobisys’13. ACM, pp 375–388
Yang S, Wiliem A, Lovell BC (2016) In: 2016 international conference on image and vision computing New Zealand (IVCNZ). IEEE, pp 1–6
Chandler B, Mingolla E (2016) Computational intelligence and neuroscience
Mathur A, Isopoussu A, Kawsar E, Smith R, Lane ND, Berthouze N (2018) In: Proceedings of the 2018 ACM international joint conference and 2018 international symposium on pervasive and ubiquitous computing and wearable computers. ACM, New York, NY, USA, 2018) UbiComp’18, pp 1409–1413. https://doi.org/10.1145/3267305.3267505
Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, et al (2014) arXiv:1412.5567
Panayotov V, Chen G, Povey D, Khudanpur S (2015) In: ICASSP. IEEE, pp 5206–5210
Speech Commands Dataset. https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html. Accessed 20 June 2019 10:48:03
Google Speech API. https://cloud.google.com/speech-to-text/
Bing Speech API. https://azure.microsoft.com/en-us/services/cognitive-services/speech/
Xiong W, Wu L, Alleva F, Droppo J, Huang X, Stolcke A (2017) ArXiv e-prints
Microsoft speech recognition. https://tinyurl.com/ybnm9zdj/
Google speech recognition. https://tinyurl.com/y7dm37vw/
Zhang Y, Suda N, Lai L, Chandra V (2017) arXiv:1711.07128
Matrix Voice. https://www.matrix.one/products/voice/
ReSpeaker. https://respeaker.io/
Piczak KJ (2015) In: ACM multimedia. ACM, pp 1015–1018
Sugiyama M, Lawrence ND, Schwaighofer A et al (2017) Dataset shift in machine learning. The MIT Press
Blitzer J, McDonald R, Pereira (2006) In: Proceedings of the 2006 conference on empirical methods in natural language processing, pp 120–128
Blanchard G, Lee G, Scott C (2011) Advances in neural information processing systems 2178–2186
Muandet K, Balduzzi D, Schölkopf B (2013) ICML 10–18
Mathur A et al (2018) In: IPSN. IEEE
Zhu JY, Park T, Isola P, Efros AA (2017) CVPR 2223–2232
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mathur, A., Isopoussu, A., Kawsar, F., Smith, R., Berthouze, N., Lane, N.D. (2019). Towards the Design and Evaluation of Robust Audio-Sensing Systems. In: Kawaguchi, N., Nishio, N., Roggen, D., Inoue, S., Pirttikangas, S., Van Laerhoven, K. (eds) Human Activity Sensing. Springer Series in Adaptive Environments. Springer, Cham. https://doi.org/10.1007/978-3-030-13001-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-13001-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13000-8
Online ISBN: 978-3-030-13001-5
eBook Packages: Computer ScienceComputer Science (R0)