Human Action Recognition from 3D Landmark Points of the Performer

Mukherjee, Snehasis; Nagalakshmi, Chirumamilla

doi:10.1007/978-981-16-1092-9_4

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1377))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

1500 Accesses

Abstract

Recognizing human actions is an active research area, where pose of the performer is an important cue for recognition. However, applying the 3D landmark points of the performer in recognizing action, is relatively less explored area of research due to the challenge involved in the process of extracting 3D landmark points from single view of the performers. With the recent advancements in the area of 3D landmark point detection, exploiting the landmark points in recognizing human action, is a good idea. We propose a technique for Human Action Recognition by learning the 3D landmark points of human pose, obtained from single image. We apply an autoencoder architecture followed by a regression layer to estimate the pose parameters like shape, gesture and camera position, which are later mapped to the 3D landmark points by Skinned Multi Person Linear Model (SMPL model). The proposed method is a novel attempt to apply a CNN based 3D pose reconstruction model (autoencoder) for recognizing action. Further, instead of using the autoencoder as a classifier to classify to 3D poses, we replace the decoder part by a regressor to obtain the landmark points, which are then fed into a classifier. The 3D landmark points of the human performer(s) at each frame, are fed into a neural network classifier as features for recognizing action.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Generation an Annotated Dataset of Human Poses for Deep Learning Networks Based on Motion Tracking System

Refining the Pose: Training and Use of Deep Recurrent Autoencoders for Improving Human Pose Estimation

An Assessment Towards 2D and 3D Human Pose Estimation and its Applications to Activity Recognition: A Review

Article 17 February 2025

References

Fan, Z., Ling, S., Jin, X., Yi, F.: From handcrafted to learned representations for human action recognition: a survey. Image Vis. Comput. 55, 42–52 (2016)
Article Google Scholar
Maryam, Z., Robert, B.: Semantic human activity recognition: a literature review. Pattern Recogn. 48(8), 2329–2345 (2015)
Article Google Scholar
Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: CVPR, pp. 1–8. IEEE (2009)
Google Scholar
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing human action at a distance in video by key poses. IEEE Trans. CSVT 21(9), 1228–1241 (2011)
Google Scholar
Wang H., Schmid C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558. IEEE (2013)
Google Scholar
Mukherjee, S.: Human action recognition using dominant pose duplet. In: Nalpantidis, L., Krüger, V., Eklundh, J.-O., Gasteratos, A. (eds.) ICVS 2015. LNCS, vol. 9163, pp. 488–497. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20904-3_44
Chapter Google Scholar
Laptev I., Marszałek M., Schmid C., Rozenfeld B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Das Dawn, D., Shaikh, S.H.: A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis. Comput. 32(3), 289–306 (2015). https://doi.org/10.1007/s00371-015-1066-2
Article Google Scholar
Vinodh, B., Sunitha, G.T., Mukherjee, S.: Event recognition in egocentric videos using a novel trajectory based feature. In: ICVGIP, pp. 76:1–76:8. ACM (2016)
Google Scholar
Nazir, S., Yousaf, M.H., Nebel, J.-C., Velastin, S.A.: A bag of expression framework for improved human action recognition. Pattern Recogn. Lett. 103, 39–45 (2018)
Article Google Scholar
Herath, S., Harandi, M.T., Porikli, F.M.: Going deeper into action recognition: a survey. Image Vis. Comput. (2017). https://doi.org/10.1016/j.imavis.2017.01.010
Article Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR, pp. 1–9. IEEE (2016)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. In: ICML, pp. 1–8 (2010)
Google Scholar
Hara, K., Kataoka, H., Satoh, Y.: Can spatio-temporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: CVPR, pp. 6546–6555. IEEE (2018)
Google Scholar
Li, C., Zhong, Q., Xie, D., Pu, S.: Collaborative spatiotemporal feature learning for video action recognition. In: CVPR, pp. 7872–7881. IEEE (2019)
Google Scholar
Wu, C.-Y., Zaheer, M., Hu, H., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Compressed video action recognition. In: CVPR, pp. 6026–6035. IEEE (2018)
Google Scholar
Shou, Z., et al.: DMC-Net: generating discriminative motion cues for fast compressed video action recognition. In: CVPR, pp. 1–10. IEEE (2019)
Google Scholar
Singh, K.K., Mukherjee, S.: Recognizing human activities in videos using improved dense trajectories over LSTM. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds.) NCVPRIPG 2017. CCIS, vol. 841, pp. 78–88. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-0020-2_8
Chapter Google Scholar
Li, C., Wang, P., Wang, S., Hou, Y., Li, W.: Skeleton-based action recognition using LSTM and CNN. In: ICME Workshops, pp. 585–590. IEEE (2017)
Google Scholar
Li, C., et al.: Deep manifold structure transfer for action recognition. IEEE Trans. Image Process. 28, 4646–4658 (2019)
Article MathSciNet Google Scholar
Uddin, M.A., Lee, Y.-K.: Feature fusion of deep spatial features and handcrafted spatiotemporal features for human action recognition. Sensors 19(7), 1599 (2019). https://doi.org/10.3390/s19071599
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Gerard, P.-M., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 248:1–248:16 (2015)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 1–8. IEEE (2005)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 1–10. IEEE (2018)
Google Scholar
Nagalakshmi, C., Mukherjee S.: Classification of yoga asana from single image by learning 3D view of human pose. In: ICVGIP Workshops. Springer (2018). https://doi.org/10.1007/978-3-030-57907-4_1
Soomro K., Zamir A.R., Shah M.: UCF101: a dataset of 101 human action classes from videos in the wild. Report no. CRCV-TR-12-01 (November 2012)
Google Scholar
Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: CVPR, pp. 1112–1121 (2020)
Google Scholar
Materzynska, J., Xiao, T., Herzig, R., Xu, H., Wang, X., Darrell, T.: Something-else: compositional action recognition with spatial-temporal interaction networks. In: CVPR, pp. 1049–1059 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Shiv Nadar University, Greater Noida, India
Snehasis Mukherjee
IIIT SriCity, Chittoor, India
Chirumamilla Nagalakshmi

Authors

Snehasis Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Chirumamilla Nagalakshmi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indian Institute of Information Technology Allahabad, Prayagraj, India
Satish Kumar Singh
Indian Institute of Technology Roorkee, Roorkee, India
Partha Roy
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Information Technology Allahabad, Prayagraj, India
P. Nagabhushan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukherjee, S., Nagalakshmi, C. (2021). Human Action Recognition from 3D Landmark Points of the Performer. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1377. Springer, Singapore. https://doi.org/10.1007/978-981-16-1092-9_4

Download citation

DOI: https://doi.org/10.1007/978-981-16-1092-9_4
Published: 28 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1091-2
Online ISBN: 978-981-16-1092-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics