Skip to main content

Pose Forecasting in Industrial Human-Robot Collaboration

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it only uses 1.72% of the parameters and it is \(\sim \)4 times faster, while still performing comparably in forecasting accuracy on Human3.6M at 1 s in the future, which enables cobots to be aware of human operators. As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO ). CHICO includes multi-view videos, 3D poses and trajectories of 20 human operators and cobots, engaging in 7 realistic industrial actions. Additionally, it reports 226 genuine collisions, taking place during the human-cobot interaction. We test SeS-GCN on CHICO for two important perception tasks in robotics: human pose forecasting, where it reaches an average error of 85.3 mm (MPJPE) at 1 sec in the future with a run time of 2.3 ms, and collision detection, by comparing the forecasted human motion with the known cobot motion, obtaining an F1-score of 0.64.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Code and dataset are available at: https://github.com/AlessioSam/CHICO-PoseForecasting.

  2. 2.

    Unconstrained collisions is a term coming from [26], indicating a situation in which only the robot and human are directly involved into the collision.

  3. 3.

    After the collisions, the robot stops for 1 s, during which the human operator usually stands still, waiting for the robot to resume operations.

References

  1. Aksan, E., Kaufmann, M., Hilliges, O.: Structured prediction helps 3d human motion modelling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2019)

    Google Scholar 

  2. Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:abs/1803.01271 (2018)

  3. Balcilar, M., Renton, G., Héroux, P., Gaüzère, B., Adam, S., Honeine, P.: Spectral-designed depthwise separable graph neural networks. In: Proceedings of Thirty-seventh International Conference on Machine Learning (ICML 2020)-Workshop on Graph Representation Learning and Beyond (GRL+ 2020) (2020)

    Google Scholar 

  4. Bauer, A., Wollherr, D., Buss, M.: Human-robot collaboration: a survey. Int. J. Humanoid Rob. 5(01), 47–66 (2008)

    Article  Google Scholar 

  5. Beltran, E.P., Diwa, A.A.S., Gales, B.T.B., Perez, C.E., Saguisag, C.A.A., Serrano, K.K.D.: Fuzzy logic-based risk estimation for safe collaborative robots. In: 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), pp. 1–5 (2018)

    Google Scholar 

  6. Benesova, K., Svec, A., Suppa, M.: Cost-effective deployment of BERT models in serverless environment (2021)

    Google Scholar 

  7. Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? In: Proceedings of the International Conference on Machine Learning (ICML) (2021)

    Google Scholar 

  8. Bütepage, J., Kjellström, H., Kragic, D.: Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration. arXiv:abs/1702.08212 (2017)

  9. Cai, Y., et al.: Learning progressive joint propagation for human motion prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 226–242. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_14

    Chapter  Google Scholar 

  10. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2d pose estimation using part affinity fields. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

    Google Scholar 

  11. Castro, A., Silva, F., Santos, V.: Trends of human-robot collaboration in industry contexts: handover, learning, and metrics. Sensors 21(12), 4113 (2021)

    Article  Google Scholar 

  12. Chen, J.H., Song, K.T.: Collision-free motion planning for human-robot collaborative safety under cartesian constraint. In: IEEE International Conference on Robotics and Automation, pp. 4348–4354 (2018)

    Google Scholar 

  13. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017)

    Google Scholar 

  14. Costanzo, M., De Maria, G., Lettera, G., Natale, C.: A multimodal approach to human safety in collaborative robotic workcells. IEEE Trans. Autom. Sci. Eng. 19, 1–15 (2021)

    Google Scholar 

  15. Cui, Q., Sun, H., Yang, F.: Learning dynamic relationships for 3d human motion prediction. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6518–6526 (2020)

    Google Scholar 

  16. Dallel, M., Havard, V., Baudry, D., Savatier, X.: Inhard - industrial human action recognition dataset in the context of industrial collaborative robotics. In: 2020 IEEE International Conference on Human-Machine Systems (ICHMS) (2020)

    Google Scholar 

  17. Dang, L., Nie, Y., Long, C., Zhang, Q., Li, G.: MSR-GCN: Multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  18. Duarte, N.F., Raković, M., Tasevski, J., Coco, M.I., Billard, A., Santos-Victor, J.: Action anticipation: reading the intentions of humans and robots. IEEE Robot. Autom. Lett. 3(4), 4132–4139 (2018)

    Article  Google Scholar 

  19. Fieraru, M., Zanfir, M., Oneata, E., Popa, A.I., Olaru, V., Sminchisescu, C.: Three-dimensional reconstruction of human interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7214–7223 (2020)

    Google Scholar 

  20. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4346–4354 (2015)

    Google Scholar 

  21. Garcia-Esteban, J.A., Piardi, L., Leitao, P., Curto, B., Moreno, V.: An interaction strategy for safe human Co-working with industrial collaborative robots. In: Proceedings of 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems ICPS 2021, pp. 585–590 (2021)

    Google Scholar 

  22. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: The International Conference on Machine Learning (ICML) (2017)

    Google Scholar 

  23. Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., Ororbia, A.G.: A neural temporal model for human motion prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12108–12117 (2019)

    Google Scholar 

  24. Gualtieri, L., Palomba, I., Wehrle, E.J., Vidoni, R.: The opportunities and challenges of SME manufacturing automation: safety and ergonomics in human–robot collaboration. In: Matt, D.T., Modrák, V., Zsifkovits, H. (eds.) Industry 4.0 for SMEs, pp. 105–144. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25425-4_4

    Chapter  Google Scholar 

  25. Guo, W., Bie, X., Alameda-Pineda, X., Moreno-Noguer, F.: Multi-person extreme motion prediction with cross-interaction attention. arXiv preprint arXiv:2105.08825 (2021)

  26. Haddadin, S., Albu-Schaffer, A., Frommberger, M., Rossmann, J., Hirzinger, G.: The “DLR crash report”: Towards a standard crash-testing protocol for robot safety-part i: Results. In: 2009 IEEE International Conference on Robotics and Automation, pp. 272–279. IEEE (2009)

    Google Scholar 

  27. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  28. Hinton, G., Dean, J., Vinyals, O.: Distilling the knowledge in a neural network. In: NIPS, pp. 1–9 (2014)

    Google Scholar 

  29. Hjorth, S., Chrysostomou, D.: Human-robot collaboration in industrial environments: a literature review on non-destructive disassembly. Robot. Comput. Integr. Manuf. 73, 102–208 (2022)

    Article  Google Scholar 

  30. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)

    Google Scholar 

  31. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1369 (2014)

    Article  Google Scholar 

  32. ISO: ISO/TS 15066:2016. Robots and robotic devices - Collaborative robots (2021). https://www.iso.org/obp/ui/#iso:std:iso:ts:15066:ed-1:v1:en

  33. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5308–5317 (2016)

    Google Scholar 

  34. Kanazawa, A., Kinugawa, J., Kosuge, K.: Adaptive motion planning for a collaborative robot based on prediction uncertainty to enhance human safety and work efficiency. IEEE Trans. Robot. 35(4), 817–832 (2019)

    Article  Google Scholar 

  35. Kang, S., Kim, M., Kim, K.: Safety monitoring for human robot collaborative workspaces. In: International Conference on Control, Automation and System, 2019-October (ICCAS), pp. 1192–1194 (2019)

    Google Scholar 

  36. Knudsen, M., Kaivo-oja, J.: Collaborative robots: frontiers of current literature. J. Intell. Syst. Theory App. 3, 13–20 (2020)

    Google Scholar 

  37. Lai, G., Liu, H., Yang, Y.: Learning graph convolution filters from data manifold (2018)

    Google Scholar 

  38. Laplaza, J., Pumarola, A., Moreno-Noguer, F., Sanfeliu, A.: Attention deep learning based model for predicting the 3d human body pose using the robot human handover phases. In: 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), pp. 161–166. IEEE (2021)

    Google Scholar 

  39. LeCun, V., Denker, J., Solla, S.: Optimal brain damage. In: Advances in Neural Information Processing Systems (1989)

    Google Scholar 

  40. Lemmerz, K., Glogowski, P., Kleineberg, P., Hypki, A., Kuhlenkötter, B.: A hybrid collaborative operation for human-robot interaction supported by machine learning. In: International Conference on Human System Interaction, HSI 2019-June, pp. 69–75 (2019)

    Google Scholar 

  41. Li, C., Zhang, Z., Sun Lee, W., Hee Lee, G.: Convolutional sequence to sequence model for human dynamics. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  42. Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 211–220 (2020)

    Google Scholar 

  43. Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J.Y., Han, S.: Gan compression: efficient architectures for interactive conditional GANs. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5283–5293 (2020)

    Google Scholar 

  44. Li, X., Li, D.: GPFS: a graph-based human pose forecasting system for smart home with online learning. ACM Trans. Sen. Netw. 17(3), 1–9 (2021)

    Google Scholar 

  45. Lim, J., et al.: Designing path of collision avoidance for mobile manipulator in worker safety monitoring system using reinforcement learning. In: ISR 2021–2021 IEEE International Conference on Intelligence and Safety for Robotics, pp. 94–97 (2021)

    Google Scholar 

  46. Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 730–734 (2015)

    Google Scholar 

  47. Magrini, E., Ferraguti, F., Ronga, A.J., Pini, F., De Luca, A., Leali, F.: Human-robot coexistence and interaction in open industrial cells. Robot. Comput. Integr. Manuf. 61, 101846 (2020)

    Article  Google Scholar 

  48. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: Archive of motion capture as surface shapes. In: International Conference on Computer Vision (2019)

    Google Scholar 

  49. Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 474–489. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_28

    Chapter  Google Scholar 

  50. Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  51. von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUS and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_37

    Chapter  Google Scholar 

  52. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  53. Matthias, B., Reisinger, T.: Example application of ISO/TS 15066 to a collaborative assembly scenario. In: 47th International Symposium on Robotics ISR 2016 2016, pp. 88–92 (2016)

    Google Scholar 

  54. Michalos, G., Makris, S., Tsarouchi, P., Guasch, T., Kontovrakis, D., Chryssolouris, G.: Design considerations for safe human-robot collaborative workplaces. Proc. CIrP 37, 248–253 (2015)

    Article  Google Scholar 

  55. Minelli, M., et al.: Integrating model predictive control and dynamic waypoints generation for motion planning in surgical scenario. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3157–3163 (2020)

    Google Scholar 

  56. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference (2017)

    Google Scholar 

  57. Nascimento, H., Mujica, M., Benoussaad, M.: Collision avoidance in human-robot interaction using kinect vision system combined with robot’s model and data. In: IEEE International Conference on Intelligent Robotics and Systems, pp. 10293–10298 (2020)

    Google Scholar 

  58. Oono, K., Suzuki, T.: Graph neural networks exponentially lose expressive power for node classification. In: International Conference on Learning Representations (2020)

    Google Scholar 

  59. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  60. Ramon, J.A.C., Herias, F.A.C., Torres, F.: Safe human-robot interaction based on dynamic sphere-swept line bounding volumes. Robot. Comput. Integr. Manuf. 27(1), 177–185 (2011)

    Article  Google Scholar 

  61. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: Imagenet classification using binary convolutional neural networks (2016)

    Google Scholar 

  62. Rodriguez-Guerra, D., Sorrosal, G., Cabanes, I., Calleja, C.: Human-robot interaction review: challenges and solutions for modern industrial environments. IEEE Access 9, 108557–108578 (2021)

    Article  Google Scholar 

  63. Shah, J., Wiken, J., Breazeal, C., Williams, B.: Improved human-robot team performance using Chaski, a human-inspired plan execution system. In: HRI 2011 - Proceedings of 6th ACM/IEEE International Conference on Human-Robot Interaction, pp. 29–36 (2011)

    Google Scholar 

  64. Shi, L., Wang, L., Long, C., Zhou, S., Zhou, M., Niu, Z., Hua, G.: Sparse graph convolution network for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  65. Sofianos, T., Sampieri, A., Franco, L., Galasso, F.: Space-time-separable graph convolutional network for pose forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  66. Torkar, C., Yahyanejad, S., Pichler, H., Hofbaur, M., Rinner, B.: RNN-based human pose prediction for human-robot interaction. In: Proceedings of the ARW & OAGM Workshop 2019, pp. 76–80 (2019)

    Google Scholar 

  67. Tu, H., Wang, C., Zeng, W.: VoxelPose: towards multi-camera 3d human pose estimation in wild environment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 197–212. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_12

    Chapter  Google Scholar 

  68. Vianello, L., Mouret, J.B., Dalin, E., Aubry, A., Ivaldi, S.: Human posture prediction during physical human-robot interaction. IEEE Robot. Autom. Lett. 6, 6046–6053 (2021)

    Article  Google Scholar 

  69. Wang, C., Wang, Y., Huang, Z., Chen, Z.: Simple baseline for single human motion forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 2260–2265 (2021)

    Google Scholar 

  70. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)

    Google Scholar 

  71. Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30

    Chapter  Google Scholar 

  72. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

  73. Zhang, J., Liu, H., Chang, Q., Wang, L., Gao, R.X.: Recurrent neural network for motion trajectory prediction in human-robot collaborative assembly. CIRP Ann. 69(1), 9–12 (2020)

    Article  Google Scholar 

  74. Zhao, Y., Dou, Y.: Pose-forecasting aided human video prediction with graph convolutional networks. IEEE Access 8, 147256–147264 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Italian MIUR through the project “Dipartimenti di Eccellenza 2018–2022”, and partially funded by DsTech S.r.l.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guido Maria D’Amely di Melendugno .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 992 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sampieri, A. et al. (2022). Pose Forecasting in Industrial Human-Robot Collaboration. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13698. Springer, Cham. https://doi.org/10.1007/978-3-031-19839-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19839-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19838-0

  • Online ISBN: 978-3-031-19839-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics