Comparison of Speech Recognition and Natural Language Understanding Frameworks for Detection of Dangers with Smart Wearables

Mrozek, Dariusz; Kwaśnicki, Szymon; Sunderam, Vaidy; Małysiak-Mrozek, Bożena; Tokarz, Krzysztof; Kozielski, Stanisław

doi:10.1007/978-3-030-77970-2_36

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12745))

Included in the following conference series:

International Conference on Computational Science

1224 Accesses

Abstract

Wearable IoT devices that can register and transmit human voice can be invaluable in personal situations, such as summoning assistance in emergency healthcare situations. Such applications would benefit greatly from automated voice analysis to detect and classify voice signals. In this paper, we compare selected Speech Recognition (SR) and Natural Language Understanding (NLU) frameworks for Cloud-based detection of voice-based assistance calls. We experimentally test several services for speech-to-text transcription and intention recognition available on selected large Cloud platforms. Finally, we evaluate the influence of the manner of speaking and ambient noise on the quality of recognition of emergency calls. Our results show that many services can correctly translate voice to text and provide a correct interpretation of caller intent. Still, speech artifacts (tone, accent, diction), which can differ even for each individual in various situations, significantly influences the performance of speech recognition.

This work was supported by pro-quality grant for highly scored publications or issued patents (grant No 02/100/RGJ21/0009), the professorship grant (02/020/RGP19/0184) of the Rector of the Silesian University of Technology, Gliwice, Poland, and partially, by Statutory Research funds of Department of Applied Informatics, Silesian University of Technology, Gliwice, Poland (grant No BK-221/RAu7/2021).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
VoxForge open speech dataset with transcribed speech: http://www.voxforge.org/home/downloads/speech/english.

References

World Health Organization: Global health and aging. Tech. Rep. 11–7737, NIH Publication (2011)
Google Scholar
Ammari, T., Kaye, J., Tsai, J.Y., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26(3), 17 (2019)
Google Scholar
Austerjost, J., et al.: Introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments. SLAS Technol. Translating Life Sci. Innov. 23(5), 476–482 (2018)
Article Google Scholar
Bhosale, S., Sheikh, I., Dumpala, S.H., Kopparapu, S.K.: Transfer learning for low resource spoken language understanding without speech-to-text. In: 2019 IEEE Bombay Section Signature Conference (IBSSC), pp. 1–5 (2019)
Google Scholar
Braines, D., O’Leary, N., Thomas, A., Harborne, D., Preece, A.D., Webberley, W.M.: Conversational homes: a uniform natural language approach for collaboration among humans and devices. Int. J. Intell. Syst. 10(3), 223–237 (2017)
Google Scholar
Braun, D., Hernandez Mendez, A., Matthes, F., Langen, M.: Evaluating natural language understanding services for conversational question answering systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 174–185. Association for Computational Linguistics, Saarbrücken, Germany (2017)
Google Scholar
Coucke, A., et al.: Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. ArXiv abs/1805.10190 (2018)
Google Scholar
Cupek, R., et al.: Autonomous guided vehicles for smart industries - the state-of-the-art and research challenges. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 330–343. Springer International Publishing, Cham (2020)
Chapter Google Scholar
de Velasco, M., Justo, R., Antón, J., Carrilero, M., Torres, M.I.: Emotion detection from speech and text. Proc. IberSPEECH 2018, 68–71 (2018)
Article Google Scholar
Deng, L., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)
Google Scholar
Dimauro, G., Di Nicola, V., Bevilacqua, V., Caivano, D., Girardi, F.: Assessment of speech intelligibility in Parkinson’s disease using a speech-to-text system. IEEE Access 5, 22199–22208 (2017). https://doi.org/10.1109/ACCESS.2017.2762475
Article Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ICML 2006. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1143844.1143891
Grzechca, D., Ziebinski, A., Rybka, P.: Enhanced reliability of ADAS sensors based on the observation of the power supply current and neural network application. In: Nguyen, N.T., Papadopoulos, G.A., Jedrzejowicz, P., Trawiński, B., Vossen, G. (eds.) Computational Collective Intelligence, pp. 215–226. Springer International Publishing, Cham (2017)
Chapter Google Scholar
Kishore Kodali, R., Rajanarayanan, S.C., Boppana, L., Sharma, S., Kumar, A.: Low cost smart home automation system using smart phone. In: 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129), pp. 120–125 (2019)
Google Scholar
Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Commun. 38(1), 19–28 (2002)
Article Google Scholar
Lago, A.S., Dias, J.P., Ferreira, H.S.: Conversational interface for managing non-trivial internet-of-things systems. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 384–397. Springer International Publishing, Cham (2020)
Chapter Google Scholar
Laksono, T.P., Hidayatullah, A.F., Ratnasari, C.I.: Speech to text of patient complaints for bahasa Indonesia. In: 2018 International Conference on Asian Language Processing (IALP), pp. 79–84 (2018). https://doi.org/10.1109/IALP.2018.8629161
Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–6 (2019)
Google Scholar
López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I.L. (ed.) Advances in Human Factors and Systems Interaction, pp. 241–250. Springer International Publishing, Cham (2018)
Google Scholar
Mehrabani, M., Bangalore, S., Stern, B.: Personalized speech recognition for Internet of things. In: 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT), pp. 369–374 (2015). https://doi.org/10.1109/WF-IoT.2015.7389082
Mishakova, A., Portet, F., Desot, T., Vacher, M.: Learning natural language understanding systems from unaligned labels for voice command in smart homes. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 832–837 (2019)
Google Scholar
Mitrevski, M.: Conversational interface challenges. In: Developing Conversational Interfaces for iOS, pp. 217–228. Apress, Berkeley, CA (2018). https://doi.org/10.1007/978-1-4842-3396-2_8
Chapter Google Scholar
Mrozek, D., Koczur, A., Małysiak-Mrozek, B.: Fall detection in older adults with mobile IoT devices and machine learning in the cloud and on the edge. Inf. Sci. 537, 132–147 (2020)
Article Google Scholar
Mrozek, D., Milik, M., Małysiak-Mrozek, B., Tokarz, K., Duszenko, A., Kozielski, S.: Fuzzy intelligence in monitoring older adults with wearables. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 288–301. Springer International Publishing, Cham (2020)
Chapter Google Scholar
Schwitter, R.: Controlled natural languages for knowledge representation. In: Coling 2010: Posters, vol. 2, pp. 1113–1121 (2010)
Google Scholar
Sovariova Soosova, M.: Determinants of quality of life in the elderly. Central Euro. J. Nurs. Midwifery 7(3), 484–493 (2016)
Article Google Scholar
Vyas, M.: A Gaussian mixture model based speech recognition system using Matlab. Sign. Image Process. 4(4), 109–118 (2013)
Google Scholar
Wan, J., et al.: Wearable IoT enabled real-time health monitoring system. EURASIP J. Wirel. Commun. Netw. (1), 298 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Dariusz Mrozek, Szymon Kwaśnicki & Stanisław Kozielski
Department of Graphics, Computer Vision and Digital Systems, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Bożena Małysiak-Mrozek & Krzysztof Tokarz
Department of Computer Science, Emory University, Atlanta, GA, 30322, USA
Vaidy Sunderam

Authors

Dariusz Mrozek
View author publications
You can also search for this author in PubMed Google Scholar
Szymon Kwaśnicki
View author publications
You can also search for this author in PubMed Google Scholar
Vaidy Sunderam
View author publications
You can also search for this author in PubMed Google Scholar
Bożena Małysiak-Mrozek
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Tokarz
View author publications
You can also search for this author in PubMed Google Scholar
Stanisław Kozielski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dariusz Mrozek .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mrozek, D., Kwaśnicki, S., Sunderam, V., Małysiak-Mrozek, B., Tokarz, K., Kozielski, S. (2021). Comparison of Speech Recognition and Natural Language Understanding Frameworks for Detection of Dangers with Smart Wearables. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12745. Springer, Cham. https://doi.org/10.1007/978-3-030-77970-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-77970-2_36
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77969-6
Online ISBN: 978-3-030-77970-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics