Towards the Design and Evaluation of Robust Audio-Sensing Systems

Mathur, Akhil; Isopoussu, Anton; Kawsar, Fahim; Smith, Robert; Berthouze, Nadia; Lane, Nicholas D.

doi:10.1007/978-3-030-13001-5_4

Akhil Mathur¹⁰,
Anton Isopoussu¹¹,
Fahim Kawsar¹¹,
Robert Smith¹²,
Nadia Berthouze¹² &
…
Nicholas D. Lane¹³

Part of the book series: Springer Series in Adaptive Environments ((SPSADENV))

478 Accesses

Abstract

As sensor-based inference models move out of laboratories into the real-world, it is of crucial importance that these models retain their performance under changing hardware and environment conditions that are expected to occur in-the-wild. This chapter motivates this challenging research problem in the context of audio sensing models, by presenting three empirical studies which evaluate the impact of hardware and environment variabilities on cloud-scale as well as embedded-scale audio models. Our results show that even the state-of-the-art deep learning models show significant performance degradation in the presence of ambient acoustic noise, and more surprisingly under scenarios of microphone variability, with accuracy losses as high as 15% in some scenarios. Further, we provide intuition on how this problem of model robustness relates to the broader topic of dataset-shift in the machine learning literature, and highlight future research directions for the mobile sensing community which include the investigation of domain adaptation and domain generalization solutions in the context of sensing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We chose this speaker due to its flat frequency response in the human speech frequency range.

References

Nokia sleep tracker (2018). https://health.nokia.com/uk/en/sleep/
FitBit (2017). https://www.fitbit.com
New technology tracks food intake by monitoring wrist movements. http://gadgetsandwearables.com/2017/03/29/food-tracking/ (2017). Accessed 20 June 2019 10:48:03
Empatica wristband (2018). https://www.empatica.com/en-gb/research/e4/. Accessed 20 June 2019 10:48:03
Narrative clip (2017). http://getnarrative.com/narrative-clip-1. Accessed 1 Sept 2017
Google glass (2016). https://developers.google.com/glass/distribute/glass-at-work. Accessed 20 June 2019 10:48:03
Hao T, Xing G, Zhou G (ACM, 2013), SenSys’13. https://doi.org/10.1145/2517351.2517359
Lu HEA (2019) Proceedings of Sensys ’10. ACM, pp 71–84
Google Scholar
Blunck H et al (2013) In: Proceedings of the 2013 ACM Ubicomp. ACM, pp 1087–1098
Google Scholar
Rachuri K et al (2010) In: Proceedings of Ubicomp’10. ACM, pp 281–290
Google Scholar
Amft O, Stäger M, Lukowicz P, Tröster G (2005) In: Ubicomp Springer, pp 56–72
Google Scholar
Variani E, Lei X, McDermott E, Moreno IL, Gonzalez-Dominguez J (2014) In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4052–4056
Google Scholar
Chen G, Parada C, Heigold G (2014) In: ICASSP. IEEE, pp 4087–4091
Google Scholar
Hardware to emulate amazon echo. https://tinyurl.com/y84d6r2n/
Stisen A et al (2015) In: Proceedings of Sensys. ACM, pp 127–140
Google Scholar
Chon Y et al (2013) In: Ubicomp. ACM, pp 3–12
Google Scholar
Lee Y, Min C, Hwang J, Lee I, Hwang Y, Ju C, Yoo M, Moon U, Lee J, Song J (2013) In: Proceeding of Mobisys’13. ACM, pp 375–388
Google Scholar
Yang S, Wiliem A, Lovell BC (2016) In: 2016 international conference on image and vision computing New Zealand (IVCNZ). IEEE, pp 1–6
Google Scholar
Chandler B, Mingolla E (2016) Computational intelligence and neuroscience
Google Scholar
Mathur A, Isopoussu A, Kawsar E, Smith R, Lane ND, Berthouze N (2018) In: Proceedings of the 2018 ACM international joint conference and 2018 international symposium on pervasive and ubiquitous computing and wearable computers. ACM, New York, NY, USA, 2018) UbiComp’18, pp 1409–1413. https://doi.org/10.1145/3267305.3267505
Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, et al (2014) arXiv:1412.5567
Panayotov V, Chen G, Povey D, Khudanpur S (2015) In: ICASSP. IEEE, pp 5206–5210
Google Scholar
Speech Commands Dataset. https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html. Accessed 20 June 2019 10:48:03
Google Speech API. https://cloud.google.com/speech-to-text/
Bing Speech API. https://azure.microsoft.com/en-us/services/cognitive-services/speech/
Xiong W, Wu L, Alleva F, Droppo J, Huang X, Stolcke A (2017) ArXiv e-prints
Google Scholar
Microsoft speech recognition. https://tinyurl.com/ybnm9zdj/
Google speech recognition. https://tinyurl.com/y7dm37vw/
Zhang Y, Suda N, Lai L, Chandra V (2017) arXiv:1711.07128
Matrix Voice. https://www.matrix.one/products/voice/
ReSpeaker. https://respeaker.io/
Piczak KJ (2015) In: ACM multimedia. ACM, pp 1015–1018
Google Scholar
Sugiyama M, Lawrence ND, Schwaighofer A et al (2017) Dataset shift in machine learning. The MIT Press
Google Scholar
Blitzer J, McDonald R, Pereira (2006) In: Proceedings of the 2006 conference on empirical methods in natural language processing, pp 120–128
Google Scholar
Blanchard G, Lee G, Scott C (2011) Advances in neural information processing systems 2178–2186
Google Scholar
Muandet K, Balduzzi D, Schölkopf B (2013) ICML 10–18
Google Scholar
Mathur A et al (2018) In: IPSN. IEEE
Google Scholar
Zhu JY, Park T, Isola P, Efros AA (2017) CVPR 2223–2232
Google Scholar

Download references

Author information

Authors and Affiliations

Nokia Bell Labs and University College London, London, England
Akhil Mathur
Nokia Bell Labs, London, England
Anton Isopoussu & Fahim Kawsar
University College London, London, England
Robert Smith & Nadia Berthouze
University of Oxford, Oxford, England
Nicholas D. Lane

Authors

Akhil Mathur
View author publications
You can also search for this author in PubMed Google Scholar
Anton Isopoussu
View author publications
You can also search for this author in PubMed Google Scholar
Fahim Kawsar
View author publications
You can also search for this author in PubMed Google Scholar
Robert Smith
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Berthouze
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas D. Lane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akhil Mathur .

Editor information

Editors and Affiliations

Institute of Innovation for Future Society, Nagoya University, Nagoya, Japan
Nobuo Kawaguchi
Department of Computer Science, Ritsumeikan University, Kyoto, Japan
Nobuhiko Nishio
University of Sussex, Brighton, UK
Daniel Roggen
Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Sozo Inoue
Center for Ubiquitous Computing, University of Oulu, Oulu, Finland
Susanna Pirttikangas
Ubiquitous Computing, University of Siegen, Siegen, Germany
Kristof Van Laerhoven

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mathur, A., Isopoussu, A., Kawsar, F., Smith, R., Berthouze, N., Lane, N.D. (2019). Towards the Design and Evaluation of Robust Audio-Sensing Systems. In: Kawaguchi, N., Nishio, N., Roggen, D., Inoue, S., Pirttikangas, S., Van Laerhoven, K. (eds) Human Activity Sensing. Springer Series in Adaptive Environments. Springer, Cham. https://doi.org/10.1007/978-3-030-13001-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-13001-5_4
Published: 10 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13000-8
Online ISBN: 978-3-030-13001-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics