research-article

A Method of Fusing Gesture and Speech for Human-robot Interaction

Authors:

Junhong Meng,

Zhiquan Feng,

Tao XuAuthors Info & Claims

ICCDE '20: Proceedings of 2020 6th International Conference on Computing and Data Engineering

Pages 265 - 269

https://doi.org/10.1145/3379247.3379261

Published: 07 March 2020 Publication History

Get Access

Abstract

Aiming at the problem of using the unimodal model in human-robot interaction, in order to improve the operator's ability to control and interact with the robot and thus realize the natural and friendly interaction between human and robot, we propose a novel multimodal fusion architecture based on gesture and speech. Firstly, the convolutional neural network and Baidu API are used to recognize gesture and speech respectively. Secondly, the result of gesture prediction probability and the result of speech forward matching are normalized. Finally, the proposed multimodal fusion algorithm is used to fuse the two results, and the output results are filled into the intention slot. The operator's intention is determined by judging the fill integrity of intents the intention slot. The experimental results show that the proposed multimodal fusion architecture is superior to the unimodal model in recognition accuracy and interaction efficiency.

References

[1]

Liu, H., Wang, L. 2018. Gesture recognition for human-robot collaboration: A review. International Journal of Industrial Ergonomics. 68 (Nov. 2018), 355--367. DOI= https://doi.org/10.1016/j.ergon.2017.02.004.

Crossref

Google Scholar

[2]

Narain, B. R., Priscilla, D. Gesture Command Recognition System for Human Machine Interaction. Australian Journal of Basic and Applied Sciences. 10, 5 (Feb. 2016), 41--45.

Google Scholar

[3]

Ahuja, M. K., Singh, A. 2015. Static vision based hand gesture recognition using principal component analysis. In 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE). IEEE, Amritsar, India, 402--406.

Crossref

Google Scholar

[4]

Xu, D., Wu, X., Chen, Y. L., Xu, Y. Online dynamic gesture recognition for human robot interaction. Journal of Intelligent & Robotic Systems. 77 (Apr. 2015), 583--596.

Digital Library

Google Scholar

[5]

Gao, Q., Liu, J., Ju, Z., Li, Y., Zhang, T., Zhang, L. 2017. Static hand gesture recognition with parallel CNNs for space human-robot interaction. In International Conference on Intelligent Robotics and Applications. Springer, Cham, 462--473.

Crossref

Google Scholar

[6]

Mi, J., Tang, S., Deng, Z., Goerner, M., Zhang, J. Object affordance based multimodal fusion for natural Human-Robot interaction. Cognitive Systems Research. 54 (May. 2019), 128--137.

Crossref

Google Scholar

[7]

Liu, H., Fang, T., Zhou, T.Y., Wang, Y.Q., Wang L.H. Deep learning-based multimodal control interface for human-robot collaboration. Procedia CIRP. 72 (May. 2018), 3--8.

Crossref

Google Scholar

[8]

Schmidt, A., Deleforge, A., Kellermann, W. 2016. Ego-noise reduction using a motor data-guided multichannel dictionary. In IEEE/RSJ International Conference on Intelligent Robots & Systems. IEEE, Daejeon, South Korea.

Digital Library

Google Scholar

[9]

Ye, S. Realization of Speech Intelligence Control of Artificial Intelligence. Technology information. 17, 7 (2019), 25--27.

Google Scholar

[10]

Trick, S., Koert, D., Peters, J., Rothkopf, C. Multimodal Uncertainty Reduction for Intention Recognition in Human-Robot Interaction. arXiv preprint arXiv. (2019).

Google Scholar

[11]

Rodriguez, S., Pérez, K., Quintero, C., López, J., Rojas, E., Calderón, J. 2016. Identification of multimodal human-robot interaction using combined kernels. In Innovations in Bio-Inspired Computing and Applications. Springer, Cham, 263--273.

Google Scholar

[12]

Liu, H., Fang, T., Zhou, T., Wang, L. Towards Robust Human-Robot Collaborative Manufacturing: Multimodal Fusion. IEEE Access. 6 (2018), 74762--74771.

Crossref

Google Scholar

[13]

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., Berg, A.C. 2016. SSD: Single Shot MultiBox Detector. In European conference on computer vision. Springer, Cham, 21--37.

Crossref

Google Scholar

Cited By

View all

Makrylakis PBenardos P(2024)Integration of a NLP-Based Industrial Robot Programming SystemFlexible Automation and Intelligent Manufacturing: Manufacturing Innovation and Preparedness for the Changing World Order10.1007/978-3-031-74482-2_35(313-320)Online publication date: 9-Dec-2024
https://doi.org/10.1007/978-3-031-74482-2_35
Ionescu TSchlund S(2022)Programming cobots by voice: a pragmatic, web-based approachInternational Journal of Computer Integrated Manufacturing10.1080/0951192X.2022.214875436:1(86-109)Online publication date: 26-Nov-2022
https://doi.org/10.1080/0951192X.2022.2148754
Wang HLi XZhang X(2021)Multimodal Human-robot Interaction on Service Robot2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)10.1109/IAEAC50856.2021.9391068(2290-2295)Online publication date: 12-Mar-2021
https://doi.org/10.1109/IAEAC50856.2021.9391068

Index Terms

A Method of Fusing Gesture and Speech for Human-robot Interaction
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

MULTIMODAL FUSION AS COMMUNICATIVE ACTS DURING HUMAN–ROBOT INTERACTION

Research on dialog systems is a very active area in social robotics. During the last two decades, these systems have evolved from those based only on speech recognition and synthesis to the current and modern systems, which include new components and ...
Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

In this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Lexical Entrainment in Multi-party Human–Robot Interaction
Social Robotics
Abstract
This paper reports lexical entrainment in a multi-party human–robot interaction, wherein one robot and two humans serve as participants. Humans tend to use the same terms as their interlocutors while making conversation. This phenomenon is called ...

Comments

Information & Contributors

Information

Published In

ICCDE '20: Proceedings of 2020 6th International Conference on Computing and Data Engineering

January 2020

279 pages

ISBN:9781450376730

DOI:10.1145/3379247

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCDE 2020

ICCDE 2020: 2020 The 6th International Conference on Computing and Data Engineering

January 4 - 6, 2020

Sanya, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
175
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Makrylakis PBenardos P(2024)Integration of a NLP-Based Industrial Robot Programming SystemFlexible Automation and Intelligent Manufacturing: Manufacturing Innovation and Preparedness for the Changing World Order10.1007/978-3-031-74482-2_35(313-320)Online publication date: 9-Dec-2024
https://doi.org/10.1007/978-3-031-74482-2_35
Ionescu TSchlund S(2022)Programming cobots by voice: a pragmatic, web-based approachInternational Journal of Computer Integrated Manufacturing10.1080/0951192X.2022.214875436:1(86-109)Online publication date: 26-Nov-2022
https://doi.org/10.1080/0951192X.2022.2148754
Wang HLi XZhang X(2021)Multimodal Human-robot Interaction on Service Robot2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)10.1109/IAEAC50856.2021.9391068(2290-2295)Online publication date: 12-Mar-2021
https://doi.org/10.1109/IAEAC50856.2021.9391068

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

MULTIMODAL FUSION AS COMMUNICATIVE ACTS DURING HUMAN–ROBOT INTERACTION

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue

Lexical Entrainment in Multi-party Human–Robot Interaction

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations