Multi-modal 3D Human Pose Estimation for Human-Robot Collaborative Applications

Peppas, Konstantinos; Tsiolis, Konstantinos; Mariolis, Ioannis; Topalidou-Kyniazopoulou, Angeliki; Tzovaras, Dimitrios

doi:10.1007/978-3-030-73973-7_34

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12644))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

940 Accesses
1 Citations

Abstract

We propose a multi-modal 3D human pose estimation approach which combines a 2D human pose estimation network utilizing RGB data with a 3D human pose estimation network utilizing the 2D pose estimation results and depth information, in order to predict 3D human poses. We improve upon the state-of-the-art by proposing the use of a more accurate 2D human pose estimation network, as well as by introducing squeeze-excite blocks into the architecture of the 3D pose estimation network. More importantly, we focused on the challenging application of 3D human pose estimation during collaborative tasks. In that direction, we selected appropriate sub-sets that address collaborative tasks from a large-scale multi-view RGB-D dataset and generated a novel one-view RGB-D dataset for training and testing respectively. We achieved above state-of-the-art performance among RGB-D approaches when tested on a novel benchmark RGB-D dataset on collaborative assembly that we have created and made publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://doi.org/10.5281/zenodo.4475685.

References

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis, June 2014. https://doi.org/10.1109/CVPR.2014.471
Buys, K., Cagniart, C., Baksheev, A., De Laet, T., De Schutter, J., Pantofaru, C.: An adaptable system for RGB-D based human body detection and pose estimation. J. Vis. Commun. Image Represent. 25(1), 39–52 (2014)
Article Google Scholar
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)
Article Google Scholar
Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7035–7043 (2017)
Google Scholar
Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani, R., Cucchiara, R.: Learning to detect and track visible and occluded body joints in a virtual world. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 450–466. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_27
Chapter Google Scholar
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: ICCV (2017)
Google Scholar
Hu, J., Shen, L., Sun, G., Albanie, S.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/TPAMI.2019.2913372
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.: Densely connected convolutional networks (07 2017). https://doi.org/10.1109/CVPR.2017.243
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–2339 (2014). https://doi.org/10.1109/TPAMI.2013.248
Article Google Scholar
Joo, H., et al.: Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 190–204 (2019)
Article Google Scholar
Li, S., Chan, A.B.: 3D human pose estimation from monocular images with deep convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 332–347. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16808-1_23
Chapter Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)
Google Scholar
Rhodin, H., Salzmann, M., Fua, P.: Unsupervised geometry-aware representation for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 765–782. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_46
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Google Scholar
Varol, G., et al.: Learning from synthetic humans. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)
Google Scholar
Zimmermann, C., Welschehold, T., Dornhege, C., Burgard, W., Brox, T.: 3D human pose estimation in RGBD images for robotic task learning, pp. 1986–1992, May 2018. https://doi.org/10.1109/ICRA.2018.8462833

Download references

Acknowledgement

This work has been supported by the European Union’s Horizon 2020 research and innovation programme funded project namely: “Co-production CeLL performing Human-Robot Collaborative AssEmbly (CoLLaboratE)” under the grant agreement with no: 820767.

Author information

Authors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas - CERTH, 6th km Charilaou-Thermi Road, Thessaloniki, Greece
Konstantinos Peppas, Konstantinos Tsiolis, Ioannis Mariolis, Angeliki Topalidou-Kyniazopoulou & Dimitrios Tzovaras

Authors

Konstantinos Peppas
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Tsiolis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Mariolis
View author publications
You can also search for this author in PubMed Google Scholar
Angeliki Topalidou-Kyniazopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Tzovaras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konstantinos Peppas .

Editor information

Editors and Affiliations

Ca’ Foscari University of Venice, Venice, Italy
Andrea Torsello
Queen Mary University of London, London, UK
Luca Rossi
Università Ca' Foscari Venezia, Venice, Italy
Marcello Pelillo
University of Cagliari, Cagliari, Italy
Battista Biggio
Deakin University, Burwood, VIC, Australia
Antonio Robles-Kelly

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peppas, K., Tsiolis, K., Mariolis, I., Topalidou-Kyniazopoulou, A., Tzovaras, D. (2021). Multi-modal 3D Human Pose Estimation for Human-Robot Collaborative Applications. In: Torsello, A., Rossi, L., Pelillo, M., Biggio, B., Robles-Kelly, A. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2021. Lecture Notes in Computer Science(), vol 12644. Springer, Cham. https://doi.org/10.1007/978-3-030-73973-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-73973-7_34
Published: 10 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73972-0
Online ISBN: 978-3-030-73973-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)