Skip to main content

Comparison of Speech Recognition and Natural Language Understanding Frameworks for Detection of Dangers with Smart Wearables

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Abstract

Wearable IoT devices that can register and transmit human voice can be invaluable in personal situations, such as summoning assistance in emergency healthcare situations. Such applications would benefit greatly from automated voice analysis to detect and classify voice signals. In this paper, we compare selected Speech Recognition (SR) and Natural Language Understanding (NLU) frameworks for Cloud-based detection of voice-based assistance calls. We experimentally test several services for speech-to-text transcription and intention recognition available on selected large Cloud platforms. Finally, we evaluate the influence of the manner of speaking and ambient noise on the quality of recognition of emergency calls. Our results show that many services can correctly translate voice to text and provide a correct interpretation of caller intent. Still, speech artifacts (tone, accent, diction), which can differ even for each individual in various situations, significantly influences the performance of speech recognition.

This work was supported by pro-quality grant for highly scored publications or issued patents (grant No 02/100/RGJ21/0009), the professorship grant (02/020/RGP19/0184) of the Rector of the Silesian University of Technology, Gliwice, Poland, and partially, by Statutory Research funds of Department of Applied Informatics, Silesian University of Technology, Gliwice, Poland (grant No BK-221/RAu7/2021).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    VoxForge open speech dataset with transcribed speech: http://www.voxforge.org/home/downloads/speech/english.

References

  1. World Health Organization: Global health and aging. Tech. Rep. 11–7737, NIH Publication (2011)

    Google Scholar 

  2. Ammari, T., Kaye, J., Tsai, J.Y., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26(3), 17 (2019)

    Google Scholar 

  3. Austerjost, J., et al.: Introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments. SLAS Technol. Translating Life Sci. Innov. 23(5), 476–482 (2018)

    Article  Google Scholar 

  4. Bhosale, S., Sheikh, I., Dumpala, S.H., Kopparapu, S.K.: Transfer learning for low resource spoken language understanding without speech-to-text. In: 2019 IEEE Bombay Section Signature Conference (IBSSC), pp. 1–5 (2019)

    Google Scholar 

  5. Braines, D., O’Leary, N., Thomas, A., Harborne, D., Preece, A.D., Webberley, W.M.: Conversational homes: a uniform natural language approach for collaboration among humans and devices. Int. J. Intell. Syst. 10(3), 223–237 (2017)

    Google Scholar 

  6. Braun, D., Hernandez Mendez, A., Matthes, F., Langen, M.: Evaluating natural language understanding services for conversational question answering systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 174–185. Association for Computational Linguistics, Saarbrücken, Germany (2017)

    Google Scholar 

  7. Coucke, A., et al.: Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. ArXiv abs/1805.10190 (2018)

    Google Scholar 

  8. Cupek, R., et al.: Autonomous guided vehicles for smart industries - the state-of-the-art and research challenges. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 330–343. Springer International Publishing, Cham (2020)

    Chapter  Google Scholar 

  9. de Velasco, M., Justo, R., Antón, J., Carrilero, M., Torres, M.I.: Emotion detection from speech and text. Proc. IberSPEECH 2018, 68–71 (2018)

    Article  Google Scholar 

  10. Deng, L., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)

    Google Scholar 

  11. Dimauro, G., Di Nicola, V., Bevilacqua, V., Caivano, D., Girardi, F.: Assessment of speech intelligibility in Parkinson’s disease using a speech-to-text system. IEEE Access 5, 22199–22208 (2017). https://doi.org/10.1109/ACCESS.2017.2762475

    Article  Google Scholar 

  12. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ICML 2006. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1143844.1143891

  13. Grzechca, D., Ziebinski, A., Rybka, P.: Enhanced reliability of ADAS sensors based on the observation of the power supply current and neural network application. In: Nguyen, N.T., Papadopoulos, G.A., Jedrzejowicz, P., Trawiński, B., Vossen, G. (eds.) Computational Collective Intelligence, pp. 215–226. Springer International Publishing, Cham (2017)

    Chapter  Google Scholar 

  14. Kishore Kodali, R., Rajanarayanan, S.C., Boppana, L., Sharma, S., Kumar, A.: Low cost smart home automation system using smart phone. In: 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129), pp. 120–125 (2019)

    Google Scholar 

  15. Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Commun. 38(1), 19–28 (2002)

    Article  Google Scholar 

  16. Lago, A.S., Dias, J.P., Ferreira, H.S.: Conversational interface for managing non-trivial internet-of-things systems. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 384–397. Springer International Publishing, Cham (2020)

    Chapter  Google Scholar 

  17. Laksono, T.P., Hidayatullah, A.F., Ratnasari, C.I.: Speech to text of patient complaints for bahasa Indonesia. In: 2018 International Conference on Asian Language Processing (IALP), pp. 79–84 (2018). https://doi.org/10.1109/IALP.2018.8629161

  18. Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–6 (2019)

    Google Scholar 

  19. López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I.L. (ed.) Advances in Human Factors and Systems Interaction, pp. 241–250. Springer International Publishing, Cham (2018)

    Google Scholar 

  20. Mehrabani, M., Bangalore, S., Stern, B.: Personalized speech recognition for Internet of things. In: 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT), pp. 369–374 (2015). https://doi.org/10.1109/WF-IoT.2015.7389082

  21. Mishakova, A., Portet, F., Desot, T., Vacher, M.: Learning natural language understanding systems from unaligned labels for voice command in smart homes. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 832–837 (2019)

    Google Scholar 

  22. Mitrevski, M.: Conversational interface challenges. In: Developing Conversational Interfaces for iOS, pp. 217–228. Apress, Berkeley, CA (2018). https://doi.org/10.1007/978-1-4842-3396-2_8

    Chapter  Google Scholar 

  23. Mrozek, D., Koczur, A., Małysiak-Mrozek, B.: Fall detection in older adults with mobile IoT devices and machine learning in the cloud and on the edge. Inf. Sci. 537, 132–147 (2020)

    Article  Google Scholar 

  24. Mrozek, D., Milik, M., Małysiak-Mrozek, B., Tokarz, K., Duszenko, A., Kozielski, S.: Fuzzy intelligence in monitoring older adults with wearables. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 288–301. Springer International Publishing, Cham (2020)

    Chapter  Google Scholar 

  25. Schwitter, R.: Controlled natural languages for knowledge representation. In: Coling 2010: Posters, vol. 2, pp. 1113–1121 (2010)

    Google Scholar 

  26. Sovariova Soosova, M.: Determinants of quality of life in the elderly. Central Euro. J. Nurs. Midwifery 7(3), 484–493 (2016)

    Article  Google Scholar 

  27. Vyas, M.: A Gaussian mixture model based speech recognition system using Matlab. Sign. Image Process. 4(4), 109–118 (2013)

    Google Scholar 

  28. Wan, J., et al.: Wearable IoT enabled real-time health monitoring system. EURASIP J. Wirel. Commun. Netw. (1), 298 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Mrozek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mrozek, D., Kwaśnicki, S., Sunderam, V., Małysiak-Mrozek, B., Tokarz, K., Kozielski, S. (2021). Comparison of Speech Recognition and Natural Language Understanding Frameworks for Detection of Dangers with Smart Wearables. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12745. Springer, Cham. https://doi.org/10.1007/978-3-030-77970-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77970-2_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77969-6

  • Online ISBN: 978-3-030-77970-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics