Depth-Based vs. Color-Based Pose Estimation in Human Action Recognition

Malawski, Filip; Jankowski, Bartosz

doi:10.1007/978-3-031-20713-6_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13598))

Included in the following conference series:

International Symposium on Visual Computing

616 Accesses
1 Citations

Abstract

Recent advances in deep learning resulted in the emergence of accurate models for human pose estimation in color videos. Distance between automatically estimated and manually annotated joint positions is commonly used for the evaluation of such methods. However, from a practical point of view, pose estimation is not a goal by itself. Therefore, in this work, we study how useful are state-of-the-art deep learning pose estimation approaches in a practical scenario of human action recognition. We compare different variants of pose estimation models with the baseline provided by the Kinect skeleton tracking, which, until recently, was the most widely used solution in such applications. We present a comprehensive framework for pose-based action recognition evaluation, which consists of both classical machine learning approaches, including feature extraction, selection, and classification steps, as well as more recent end-to-end methods. Extensive evaluation on four publicly available datasets shows, that by using state-of-the-art neural network models for pose tracking, color-based action recognition matches, or even outperforms, that of the depth-based one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://google.github.io/mediapipe/solutions/pose.html.

References

Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2015)
Article Google Scholar
Barandas, M., et al.: TSFEL: time series feature extraction library. SoftwareX 11, 100456 (2020)
Article Google Scholar
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., Grundmann, M.: BlazePose: on-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020)
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 168–172. IEEE (2015)
Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: A survey of depth and inertial sensor fusion for human action recognition. Multimed. Tools Appl. 76(3), 4405–4425 (2017)
Article Google Scholar
Gaglio, S., Re, G.L., Morana, M.: Human activity recognition process using 3-D posture data. IEEE Trans. Human-Mach. Syst. 45(5), 586–597 (2014)
Article Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Article MATH Google Scholar
Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)
Google Scholar
Host, K., Ivašić-Kos, M.: An overview of human action recognition in sports based on computer vision. Heliyon, e09633 (2022)
Google Scholar
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288–3297 (2017)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Google Scholar
Liao, Y., Vakanski, A., Xian, M.: A deep learning framework for assessing physical rehabilitation exercises. IEEE Trans. Neural Syst. Rehabil. Eng. 28(2), 468–477 (2020)
Article Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lou, M., Li, J., Wang, G., He, G.: AR-C3D: action recognition accelerator for human-computer interaction on FPGA. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE (2019)
Google Scholar
Malawski, F., Kwolek, B.: Recognition of action dynamics in fencing using multimodal cues. Image Vis. Comp. 75, 1–10 (2018)
Article Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Google Scholar
Wang, H., Wang, L.: Beyond joints: learning representations from primitive geometries for skeleton-based action recognition and detection. IEEE Trans. Image Proc. 27(9), 4382–4394 (2018)
Article MathSciNet Google Scholar
Wang, J., et al.: Deep 3D human pose estimation: a review. Comput. Vis. Image Underst. 210, 103225 (2021)
Article Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)
Article Google Scholar
Xia, L., Chen, C., Aggarwal, J.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(CVPRW), pp. 20–27. IEEE (2012)
Google Scholar
Yang, X., Tian, Y.: Effective 3D action recognition using EigenJoints. J. Vis. Commun. Image R 25(1), 2–11 (2014)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

Download references

Acknowledgements

The research presented in this paper was supported by the National Centre for Research and Development (NCBiR) under Grant No. LIDER/37/0198/L-12/20/NCBR/2021.

Author information

Authors and Affiliations

Institute of Computer Science, AGH University of Science and Technology, Krakow, Poland
Filip Malawski & Bartosz Jankowski

Authors

Filip Malawski
View author publications
You can also search for this author in PubMed Google Scholar
Bartosz Jankowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filip Malawski .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Illinois Urbana-Champaign, Urbana, IL, USA
Bo Li
National University of Singapore, Singapore, Singapore
Angela Yao
Microsoft Research Asia, Beijing, China
Yang Liu
University of Missouri, Columbia, MO, USA
Ye Duan
City University of Hong Kong, Kowloon, Hong Kong
Manfred Lau
Idaho National Laboratory, Idaho Falls, ID, USA
Rajiv Khadka
Salesforce, Seattle, WA, USA
Ana Crisan
Tufts University, Medford, MA, USA
Remco Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malawski, F., Jankowski, B. (2022). Depth-Based vs. Color-Based Pose Estimation in Human Action Recognition. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13598. Springer, Cham. https://doi.org/10.1007/978-3-031-20713-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-20713-6_26
Published: 11 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20712-9
Online ISBN: 978-3-031-20713-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Depth-Based vs. Color-Based Pose Estimation in Human Action Recognition