Action recognition makes the interaction in human-robot collaboration (HRC) more natural and enhances the efficiency of work. The common action recognition models can neither handle the undefined beforehand transitional actions of operators in HRC nor quickly modify the action classes to be recognized according to the change of collaboration process. In this paper, an action recognition model for HRC assembly is proposed by fusing the outputs of multiple binary classification networks. Moreover, in order to meet both the action recognition speed and accuracy requirements in HRC applications, a spatio-temporal feature extraction network based on graph attention-gated recurrent unit network is designed for binary classification. The proposed model can identify the different operational actions from continuous skeletal data of the operator and distinguish between operational actions and transitional actions. Therefore, this model can reduce false action recognition and thus avoid the mistaken controls and the dangerous actions of robot serves HRC application better. Besides, due to the structure of fusion identification, this model is also well scalable and able to quickly adjust the action classes required to be recognized for HRC task with no need of retraining the entire recognition model. The case study of HRC personal computer assembly demonstrates that the proposed action recognition model achieves the accuracy of about 84% and the best effectiveness.

Similar content being viewed by others
Birch B, Griffiths C, Morgan A (2021) Environmental effects on reliability and accuracy of MFCC based voice recognition for industrial human-robot-interaction. Proc Inst Mech Eng B J Eng Manuf 235:1939–1948. https://doi.org/10.1177/09544054211014492
Bulling A, Blanke U, Schiele B (2014) A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput Surv 46(3):1–33. https://doi.org/10.1145/2499621
Cao Z, Hidalgo G, Simon T, … Sheikh Y (2021) OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43:172–186. https://doi.org/10.1109/tpami.2019.2929257
Carrara F, Elias P, Sedmidubsky J, Zezula P (2019) LSTM-based real-time action detection and prediction in human motion streams. Multimed Tools Appl 78:27309–27331. https://doi.org/10.1007/s11042-019-07827-3
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. https://doi.org/10.48550/arXiv.1406.1078
Coupeté E, Moutarde F, Manitsaris S (2015) Gesture recognition using a depth camera for human robot collaboration on assembly line. Procedia Manuf 3:518–525. https://doi.org/10.1016/j.promfg.2015.07.216
Dawar N, Kehtarnavaz N (2018) Action detection and recognition in continuous action streams by deep learning-based sensing fusion. IEEE Sensors J 18:9660–9668. https://doi.org/10.1109/jsen.2018.2872862
Dehghani A, Sarbishei O, Glatard T, Shihab E (2019) A quantitative comparison of overlapping and non-overlapping sliding windows for human activity recognition using inertial sensors. Sensors 19:5026. https://doi.org/10.3390/s19225026
Dos Santos CW, Filho NLD, Espíndola DB, Botelho SSC (2020) Situational awareness oriented interfaces on human-robot interaction for industrial welding processes. IFAC-PapersOnLine 53:10168–10173. https://doi.org/10.1016/j.ifacol.2020.12.2744
Inkulu AK, Bahubalendruni MR, Dara A, SankaranarayanaSamy K (2021) Challenges and opportunities in human robot collaboration context of industry 4.0-a state of the art review. Industrial robot: the international journal of robotics research and application. https://doi.org/10.1108/ir-04-2021-0077
Keselman L, Woodfill JI, Grunnet-Jepsen A, Bhowmik A (2017) Intel(R) RealSense(TM) stereoscopic depth cameras. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, In, pp 1–10. https://doi.org/10.1109/cvprw.2017.167
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops, pp 9-14. https://doi.org/10.1109/cvprw.2010.5543273
Li Y, Lan C, Xing J, et al (2016) Online human action detection using joint classification-regression recurrent neural networks. In: Advanced Data Mining and Applications. Advanced Data Mining and Applications, pp. 203–220. https://doi.org/10.1007/978-3-319-46478-7_13
Li R, Liu Z, Tan J (2018) Human motion segmentation using collaborative representations of 3D skeletal sequences. IET Comput Vis 12:434–442. https://doi.org/10.1049/iet-cvi.2016.0385
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595-3603. https://doi.org/10.48550/arXiv.1904.12659.
Liu S, Liu P (2022) Benchmarking and optimization of robot motion planning with motion planning pipeline. Int J Adv Manuf Technol 118(3):949–961. https://doi.org/10.1007/s00170-021-07985-5
Liu H, Fang T, Zhou T, … Wang L (2018) Deep learning-based multimodal control Interface for human-robot collaboration. Procedia CIRP 72:3–8. https://doi.org/10.1016/j.procir.2018.03.224
Liu P, Yu H, Cang S (2019) Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances. Nonlinear Dynamics 98(2):1447–1464. https://doi.org/10.1007/s11071-019-05170-8
Ma C-Y, Chen M-H, Kira Z, Alregib G (2019) TS-LSTM and temporal-inception: exploiting spatiotemporal dynamics for activity recognition. Signal Process Image Commun 71:76–87. https://doi.org/10.1016/j.image.2018.09.003
Manosha Chathuramali KG, Rodrigo R (2012) Faster human activity recognition with SVM. https://doi.org/10.1109/icter.2012.6421415
Ogenyi UE, Liu J, Yang C, Ju Z, Liu H (2021) Physical human–robot collaboration: robotic systems, learning methods, collaborative strategies, sensors, and actuators. IEEE Trans Cybern 51:1888–1901. https://doi.org/10.1109/tcyb.2019.2947532
Ren B, Liu M, Ding R, Liu H (2020) A survey on 3d skeleton-based action recognition using learning method. https://doi.org/10.48550/arXiv.2002.05907.
Schlagenhauf F, Sreeram S, Singhose W (2018) Comparison of Kinect and Vicon motion capture of upper-body joint angle tracking. In: 2018 IEEE 14th international conference on control and automation, pp 674-679. https://doi.org/10.1109/icca.2018.8444349
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton, New Jersey
Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1010–1019. https://doi.org/10.1109/cvpr.2016.115
Sun L, Zhao C, Yan Z, Liu P, Duckett T, Stolkin R (2018) A novel weakly-supervised approach for RGB-D-based nuclear waste object detection. IEEE Sensors J 19(9):3487–3500. https://doi.org/10.1109/jsen.2018.2888815
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, In, pp 5693–5703. https://doi.org/10.1109/cvpr.2019.00584
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. https://doi.org/10.48550/arXiv.1710.10903
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 588–595. https://doi.org/10.1109/cvpr.2014.82
Wang K-J, Santoso D (2022) A smart operator advice model by deep learning for motion recognition in human–robot coexisting assembly line. Int J Adv Manuf Technol 119:865–884. https://doi.org/10.1007/s00170-021-08319-1
Wang P, Liu H, Wang L, Gao RX (2018) Deep learning-based human motion recognition for predictive context-aware human-robot collaboration. CIRP Ann 67:17–20. https://doi.org/10.1016/j.cirp.2018.04.066
Wang L, Gao R, Váncza J, Krüger J, Wang XV, Makris S, Chryssolouris G (2019) Symbiotic human-robot collaborative assembly. CIRP Ann 68:701–726. https://doi.org/10.1016/j.cirp.2019.05.002
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: thirty-second AAAI conference on artificial intelligence. https://doi.org/10.48550/arXiv.1802.09834.
Yong D, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1110–1118. https://doi.org/10.1109/cvpr.2015.7298714
Zhang K, Xu W, Yao B et al (2020) Human motion recognition for industrial human-robot collaboration based on a novel skeleton descriptor. In: 2020 IEEE 16th international conference on automation science and engineering, pp 404-410. https://doi.org/10.1109/case48305.2020.9216971
Zhao R, Ali H, Van Der Smagt P (2017) Two-stream RNN/CNN for action recognition in 3D videos. In 2017 IEEE/RSJ international conference on intelligent robots and systems, pp 4260-4267. https://doi.org/10.1109/iros.2017.8206288
Availability of data
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
This work was supported by the project from Shanghai Municipal Commission of Economy and Information (grant number 2021-GYHLW-01008).
Author information
Authors and Affiliations
Nanyan Shen: conceptualization, writing - original draft, writing - review & editing, supervision, project administration. Zeyuan Feng: methodology, software, validation, formal analysis, investigation, data curation, writing - original draft, writing - review & editing, visualization. Jing Li: conceptualization, resources, supervision, project administration, funding acquisition. Hua You: software, data curation, investigation, methodology. Chenyu Xia: software, data curation, visualization.
Corresponding author
Ethics declarations
Ethics approval Not applicable.
Consent to participate
This manuscript is the authors’ original work and has not been published nor has it been submitted simultaneously elsewhere.
Consent for publication
All authors have checked the manuscript and have agreed to the submission.
Competing interests
The authors declare no competing interests.
Additional information
• Identifies actions from continuous skeletal data and distinguishes between operational actions and transitional actions.
• The proposed GAT-GRU network extracting spatio-temporal features from action achieves fast speed and good accuracy.
• Adjusting the action classes to be recognized does not require retraining the entire model.
• Avoids the dangerous and mistaken actions of robot caused by false recognition serves HRC application better.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shen, N., Feng, Z., Li, J. et al. Action fusion recognition model based on GAT-GRU binary classification networks for human-robot collaborative assembly. Multimed Tools Appl 82, 18867–18885 (2023). https://doi.org/10.1007/s11042-022-14123-0
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14123-0