Abstract
Wearable IoT devices that can register and transmit human voice can be invaluable in personal situations, such as summoning assistance in emergency healthcare situations. Such applications would benefit greatly from automated voice analysis to detect and classify voice signals. In this paper, we compare selected Speech Recognition (SR) and Natural Language Understanding (NLU) frameworks for Cloud-based detection of voice-based assistance calls. We experimentally test several services for speech-to-text transcription and intention recognition available on selected large Cloud platforms. Finally, we evaluate the influence of the manner of speaking and ambient noise on the quality of recognition of emergency calls. Our results show that many services can correctly translate voice to text and provide a correct interpretation of caller intent. Still, speech artifacts (tone, accent, diction), which can differ even for each individual in various situations, significantly influences the performance of speech recognition.
This work was supported by pro-quality grant for highly scored publications or issued patents (grant No 02/100/RGJ21/0009), the professorship grant (02/020/RGP19/0184) of the Rector of the Silesian University of Technology, Gliwice, Poland, and partially, by Statutory Research funds of Department of Applied Informatics, Silesian University of Technology, Gliwice, Poland (grant No BK-221/RAu7/2021).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
VoxForge open speech dataset with transcribed speech: http://www.voxforge.org/home/downloads/speech/english.
References
World Health Organization: Global health and aging. Tech. Rep. 11–7737, NIH Publication (2011)
Ammari, T., Kaye, J., Tsai, J.Y., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26(3), 17 (2019)
Austerjost, J., et al.: Introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments. SLAS Technol. Translating Life Sci. Innov. 23(5), 476–482 (2018)
Bhosale, S., Sheikh, I., Dumpala, S.H., Kopparapu, S.K.: Transfer learning for low resource spoken language understanding without speech-to-text. In: 2019 IEEE Bombay Section Signature Conference (IBSSC), pp. 1–5 (2019)
Braines, D., O’Leary, N., Thomas, A., Harborne, D., Preece, A.D., Webberley, W.M.: Conversational homes: a uniform natural language approach for collaboration among humans and devices. Int. J. Intell. Syst. 10(3), 223–237 (2017)
Braun, D., Hernandez Mendez, A., Matthes, F., Langen, M.: Evaluating natural language understanding services for conversational question answering systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 174–185. Association for Computational Linguistics, Saarbrücken, Germany (2017)
Coucke, A., et al.: Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. ArXiv abs/1805.10190 (2018)
Cupek, R., et al.: Autonomous guided vehicles for smart industries - the state-of-the-art and research challenges. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 330–343. Springer International Publishing, Cham (2020)
de Velasco, M., Justo, R., Antón, J., Carrilero, M., Torres, M.I.: Emotion detection from speech and text. Proc. IberSPEECH 2018, 68–71 (2018)
Deng, L., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)
Dimauro, G., Di Nicola, V., Bevilacqua, V., Caivano, D., Girardi, F.: Assessment of speech intelligibility in Parkinson’s disease using a speech-to-text system. IEEE Access 5, 22199–22208 (2017). https://doi.org/10.1109/ACCESS.2017.2762475
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ICML 2006. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1143844.1143891
Grzechca, D., Ziebinski, A., Rybka, P.: Enhanced reliability of ADAS sensors based on the observation of the power supply current and neural network application. In: Nguyen, N.T., Papadopoulos, G.A., Jedrzejowicz, P., Trawiński, B., Vossen, G. (eds.) Computational Collective Intelligence, pp. 215–226. Springer International Publishing, Cham (2017)
Kishore Kodali, R., Rajanarayanan, S.C., Boppana, L., Sharma, S., Kumar, A.: Low cost smart home automation system using smart phone. In: 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129), pp. 120–125 (2019)
Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Commun. 38(1), 19–28 (2002)
Lago, A.S., Dias, J.P., Ferreira, H.S.: Conversational interface for managing non-trivial internet-of-things systems. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 384–397. Springer International Publishing, Cham (2020)
Laksono, T.P., Hidayatullah, A.F., Ratnasari, C.I.: Speech to text of patient complaints for bahasa Indonesia. In: 2018 International Conference on Asian Language Processing (IALP), pp. 79–84 (2018). https://doi.org/10.1109/IALP.2018.8629161
Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–6 (2019)
López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I.L. (ed.) Advances in Human Factors and Systems Interaction, pp. 241–250. Springer International Publishing, Cham (2018)
Mehrabani, M., Bangalore, S., Stern, B.: Personalized speech recognition for Internet of things. In: 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT), pp. 369–374 (2015). https://doi.org/10.1109/WF-IoT.2015.7389082
Mishakova, A., Portet, F., Desot, T., Vacher, M.: Learning natural language understanding systems from unaligned labels for voice command in smart homes. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 832–837 (2019)
Mitrevski, M.: Conversational interface challenges. In: Developing Conversational Interfaces for iOS, pp. 217–228. Apress, Berkeley, CA (2018). https://doi.org/10.1007/978-1-4842-3396-2_8
Mrozek, D., Koczur, A., Małysiak-Mrozek, B.: Fall detection in older adults with mobile IoT devices and machine learning in the cloud and on the edge. Inf. Sci. 537, 132–147 (2020)
Mrozek, D., Milik, M., Małysiak-Mrozek, B., Tokarz, K., Duszenko, A., Kozielski, S.: Fuzzy intelligence in monitoring older adults with wearables. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 288–301. Springer International Publishing, Cham (2020)
Schwitter, R.: Controlled natural languages for knowledge representation. In: Coling 2010: Posters, vol. 2, pp. 1113–1121 (2010)
Sovariova Soosova, M.: Determinants of quality of life in the elderly. Central Euro. J. Nurs. Midwifery 7(3), 484–493 (2016)
Vyas, M.: A Gaussian mixture model based speech recognition system using Matlab. Sign. Image Process. 4(4), 109–118 (2013)
Wan, J., et al.: Wearable IoT enabled real-time health monitoring system. EURASIP J. Wirel. Commun. Netw. (1), 298 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mrozek, D., Kwaśnicki, S., Sunderam, V., Małysiak-Mrozek, B., Tokarz, K., Kozielski, S. (2021). Comparison of Speech Recognition and Natural Language Understanding Frameworks for Detection of Dangers with Smart Wearables. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12745. Springer, Cham. https://doi.org/10.1007/978-3-030-77970-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-77970-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77969-6
Online ISBN: 978-3-030-77970-2
eBook Packages: Computer ScienceComputer Science (R0)