Towards a Natural Human-Robot Interaction in an Industrial Environment

González-Docasal, Ander; Aceta, Cristina; Arzelus, Haritz; Álvarez, Aitor; Fernández, Izaskun; Kildal, Johan

doi:10.1007/978-981-15-8395-7_18

Towards a Natural Human-Robot Interaction in an Industrial Environment

Ander González-Docasal³⁷,
Cristina Aceta³⁸,
Haritz Arzelus³⁷,
Aitor Álvarez³⁷,
Izaskun Fernández³⁸ &
…
Johan Kildal³⁸

Chapter
First Online: 25 October 2020

875 Accesses
3 Citations

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 704))

Abstract

Nowadays, modern industry has adopted robots as part of their processes. In many scenarios, such machines collaborate with humans to perform specific tasks in their same environment or simply guide them in a natural, safe and efficient way. Our approach improves a previously conducted work on a multi-modal human-robot interaction system with different audio acquisition and speech recognition modules for a more natural communication. The semantic interpreter, with the aid of a knowledge manager, parses the resulting transcription and, using contextual information, selects the order that the operator has uttered and sends it to the robot to be executed. This setup is evaluated in a real manufacture scenario in a laboratory environment with a large set of end users both quantitatively and qualitatively. The gathered results reveal that the system behaves robustly and that the assignment was also considered by the end users as manageable, whilst the system in overall was received with a high level of trust and usability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amodei D, Anubhai R, Battenberg E, Case C, Casper J, Catanzaro B, Chen J, Chrzanowski M, Coates A, Diamos G, Elsen E, Engel J, Fan L, Fougner C, Han T, Hannun A, Jun B, LeGresley P, Lin L, Narang S, Ng A, Ozair S, Prenger R, Raiman J, Satheesh S, Seetapun D, Sengupta S, Wang Y, Wang Z, Wang C, Xiao B, Yogatama D, Zhan J, Zhu Z (2015) Deep speech 2: end-to-end speech recognition in English and Mandarin
Google Scholar
Anastasakos T, McDonough J, Schwartz R, Makhoul J (1996) A compact model for speaker-adaptive training. In: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol 2. IEEE, pp 1137–1140
Google Scholar
Antonelli D, Bruno G (2017) Human-robot collaboration using industrial robots. In: 2017 2nd International Conference on Electrical, Automation and Mechanical Engineering (EAME 2017). Atlantis Press
Google Scholar
Bernath C, Alvarez A, Arzelus H, Martínez CD (2018) Exploring E2E speech recognition systems for new languages. In: IberSPEECH, pp 102–106
Google Scholar
Brooke J et al (1996) Sus-a quick and dirty usability scale. Usability Eval Ind 189(194):4–7
Google Scholar
Campione E, Véronis J (1998) A multilingual prosodic database. In: Fifth International Conference on Spoken Language Processing
Google Scholar
Casacuberta F, Garcia R, Llisterri J, Nadeu C, Pardo J, Rubio A (1991) Development of Spanish corpora for speech research (ALBAYZIN). In: Workshop on International Cooperation and Standardization of Speech Databases and Speech I/O Assesment Methods, Chiavari, Italy, pp 26–28
Google Scholar
Charalambous G, Fletcher S, Webb P (2015) The development of a scale to evaluate trust in industrial human-robot collaboration. Int J Soc Robot 8. https://doi.org/10.1007/s12369-015-0333-8
Gnjatović M, Tasevski J, Nikolić M, Mišković D, Borovac B, Delić V (2012) Adaptive multimodal interaction with industrial robot. In: 2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics. IEEE, pp 329–333
Google Scholar
Gopinath RA (1998) Maximum likelihood modeling with gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), vol 2. IEEE, pp 661–664
Google Scholar
Heafield K (1998) KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, pp 187–197
Google Scholar
Kennedy J, Lemaignan S, Montassier C, Lavalade P, Irfan B, Papadopoulos F, Senft E, Belpaeme T (2017) Child speech recognition in human-robot interaction: evaluations and recommendations. In: Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, pp 82–90
Google Scholar
Kildal J, Fernández I, Lluvia I, Lázaro I, Aceta C, Vidal N, Susperregi L (2019) Evaluating the UX obtained from a service robot that provides ancillary way-finding support in an industrial environment. In: Advances in Manufacturing Technology XXXIII: Proceedings of the 17th International Conference on Manufacturing Research, Incorporating the 34th National Conference on Manufacturing Research, 10–12 September 2019, Queen’s University, Belfast, vol 9. IOS Press, p 61
Google Scholar
Lin Y, Min H, Zhou H, Chen M (2018) A natural language interaction based automatic operating system for industrial robot. In: International Conference on Intelligent Computing. Springer, pp 111–122
Google Scholar
Lleida E, Ortega A, Miguel A, Bazán-Gil V, Pérez C, Gómez M, de Prada A (2019) Albayzin 2018 evaluation: the iberspeech-RTVE challenge onspeech technologies for spanish broadcast media. Appl Sci 9(24):5412. https://doi.org/10.3390/app9245412
Maurtua I, Fernandez I, Tellaeche A, Kildal J, Susperregi L, Ibarguren A, Sierra B (2017) Natural multimodal communication for human-robot collaboration. Int J Adv Robot Syst 14:1–12. https://doi.org/10.1177/1729881417716043
Article Google Scholar
Padró L, Stanilovsky E (2012) Freeling 3.0: towards wider multilinguality. In: LREC2012
Google Scholar
Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk ED, Le QV (2019) Specaugment: a simple data augmentation method for automatic speech recognition. Interspeech 2019. https://doi.org/10.21437/interspeech.2019-2680
Peddinti V, Chen G, Manohar V, Ko T, Povey D, Khudanpur S (2015) JHU ASpIRE system: robust LVCSR with TDNNS, iVector adaptation and RNN-LMS. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp 539–546. https://doi.org/10.1109/ASRU.2015.7404842
Peddinti V, Povey D, Khudanpur S (2015) A time delay neural network architecture for efficient modeling of long temporal contexts. In: INTERSPEECH
Google Scholar
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society. IEEE Catalog No. CFP11SRW-USB
Google Scholar
Povey D, Kingsbury B, Mangu L, Saon G, Soltau H, Zweig G (2005) fMPE: discriminatively trained features for speech recognition. In: Proceedings.(ICASSP 2005). IEEE International Conference on Acoustics, Speech, and Signal Processing 2005, vol 1. IEEE, pp I–961
Google Scholar
Pozo A, Aliprandi C, Álvarez A, Mendes C, Neto J, Paulo S, Piccinini N, Raffaelli M (2014) SAVAS: collecting, annotating and sharing audiovisual language resources for automatic subtitling
Google Scholar

Download references

Acknowledgements

This work was supported by the Department of Economic Development and Competitiveness of the Basque Government via the LangileOK project.

Author information

Authors and Affiliations

Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Mikeletegi 57, 20009, Donostia – San Sebastián, Spain
Ander González-Docasal, Haritz Arzelus & Aitor Álvarez
Tekniker, Basque Research and Technology Alliance (BRTA), Parke Teknologikoa Iñaki Goenaga 5, Eibar, Spain
Cristina Aceta, Izaskun Fernández & Johan Kildal

Authors

Ander González-Docasal
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Aceta
View author publications
You can also search for this author in PubMed Google Scholar
Haritz Arzelus
View author publications
You can also search for this author in PubMed Google Scholar
Aitor Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Izaskun Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Johan Kildal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ander González-Docasal .

Editor information

Editors and Affiliations

Speech Technology Group - Information Processing and Telecommunications Center (IPTC), Universidad Politécnica de Madrid, Madrid, Spain
Luis Fernando D'Haro
Department of Languages and Computer Systems, Universidad de Granada, CITIC-UGR, Granada, Spain
Zoraida Callejas
Information Science, Nara Institute of Science and Technology, Ikoma, Japan
Satoshi Nakamura

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

González-Docasal, A., Aceta, C., Arzelus, H., Álvarez, A., Fernández, I., Kildal, J. (2021). Towards a Natural Human-Robot Interaction in an Industrial Environment. In: D'Haro, L.F., Callejas, Z., Nakamura, S. (eds) Conversational Dialogue Systems for the Next Decade. Lecture Notes in Electrical Engineering, vol 704. Springer, Singapore. https://doi.org/10.1007/978-981-15-8395-7_18

Download citation

DOI: https://doi.org/10.1007/978-981-15-8395-7_18
Published: 25 October 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8394-0
Online ISBN: 978-981-15-8395-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics