An Approach Towards Action Recognition Using Part Based Hierarchical Fusion

Agarwal, Aditya; Sen, Bipasha

doi:10.1007/978-3-030-64556-4_24

An Approach Towards Action Recognition Using Part Based Hierarchical Fusion

Aditya Agarwal¹⁷ &
Bipasha Sen¹⁷

Conference paper
First Online: 07 December 2020

1300 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12509))

Abstract

The human body can be represented as an articulation of rigid and hinged joints which can be combined to form the parts of the body. Human actions can be thought of as a collective action of these parts. Hence, learning an effective spatio-temporal representation of the collective motion of these parts is key to action recognition. In this work, we propose an end-to-end pipeline for the task of human action recognition on video sequences using 2D joint trajectories estimated from a pose estimation framework. We use a Hierarchical Bidirectional Long Short Term Memory Network (HBLSTM) to model the spatio-temporal dependencies of the motion by fusing the pose based joint trajectories in a part based hierarchical fashion. To denote the effectiveness of our proposed approach, we compare its performance with six comparative architectures based on our model and also with several other methods on the widely used KTH and Weizmann action recognition datasets.

A. Agarwal and B. Sen—These authors contributed equally to the work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Laptev, L.: Space-time interest points. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 432–439 (2003)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)
Article Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33
Chapter Google Scholar
Perš, J., Sulić, V., Kristan, M., Perše, M., Polanec, K., Kovačič, S.: Histograms of optical flow for efficient representation of body motion. Pattern Recogn. Lett. 31, 1369–1376 (2010)
Article Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176 (2011)
Google Scholar
Zhou, Z., Shi, F., Wu, W.: Learning spatial and temporal extents of human actions for action detection. IEEE Trans. Multimed. 17(1), 512–525 (2015)
Article Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Shi, Y., Zeng, W., Huang, T., Wang, Y.: Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2015)
Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Brau, E., Jiang, H.: 3D human pose estimation via deep learning from 2D annotations. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 582–591 (2016)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines (2016)
Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields (2018)
Google Scholar
Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild (2018)
Google Scholar
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp. 1297–1304 (2011)
Google Scholar
Ye, M., Zhang, Q., Wang, L., Zhu, J., Yang, R., Gall, J.: A survey on human motion analysis from depth data. In: Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds.) Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. LNCS, vol. 8200, pp. 149–187. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-44964-2_8
Chapter Google Scholar
Siddiqui, M., Medioni, G.: Human pose estimation from a single view point, real-time range sensor. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 1–8 (2010)
Google Scholar
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: 2010 IEEE International Conference on Robotics and Automation, pp. 3108–3113 (2010)
Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Google Scholar
Gong, D., Medioni, G., Zhao, X.: Structured time series analysis for human action segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1414–1427 (2014)
Article Google Scholar
Zhang, Y., Zhang, Y., Zhang, Z., Bao, J., Song, Y.: Human activity recognition based on time series analysis using U-Net (2018)
Google Scholar
Kim, H., Kim, I.: Human activity recognition as time-series analysis. Math. Probl. Eng. 2015, 1–9 (2015)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118 (2015)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_3
Chapter Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997)
Article Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36 (2004)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: The Tenth IEEE International Conference on Computer Vision (ICCV 2005), pp. 1395–1402 (2005)
Google Scholar
Gao, Z., Chen, M., Hauptmann, A.G., Cai, A.: Comparing evaluation protocols on the KTH dataset. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 88–100. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14715-9_10
Chapter Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25446-8_4
Chapter Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007)
Article Google Scholar
Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features, pp. 925–931 (2009)
Google Scholar
Moussa, M.M., Hamayed, E., Fayek, M.B., El Nemr, H.A.: An enhanced method for human action recognition. J. Adv. Res. 6, 163–169 (2015)
Article Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild (2012)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft, Hyderabad, India
Aditya Agarwal & Bipasha Sen

Authors

Aditya Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Bipasha Sen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Aditya Agarwal or Bipasha Sen .

Editor information

Editors and Affiliations

University of Nevada Reno, Reno, NV, USA
George Bebis
Stony Brook University, Stony Brook, NY, USA
Zhaozheng Yin
Drexel University, Philadelphia, PA, USA
Edward Kim
RWTH Aachen University, Aachen, Germany
Jan Bender
University of Edinburgh, Edinburgh, UK
Kartic Subr
IBM Research – Cambridge, Cambridge, MA, USA
Bum Chul Kwon
University of Waterloo, Waterloo, ON, Canada
Jian Zhao
Graz University of Technology, Graz, Austria
Denis Kalkofen
The Hong Kong Polytechnic University, Hong Kong, Hong Kong
George Baciu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agarwal, A., Sen, B. (2020). An Approach Towards Action Recognition Using Part Based Hierarchical Fusion. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2020. Lecture Notes in Computer Science(), vol 12509. Springer, Cham. https://doi.org/10.1007/978-3-030-64556-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-64556-4_24
Published: 07 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64555-7
Online ISBN: 978-3-030-64556-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics