Skip to main content

An Approach Towards Action Recognition Using Part Based Hierarchical Fusion

  • Conference paper
  • First Online:
  • 1300 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12509))

Abstract

The human body can be represented as an articulation of rigid and hinged joints which can be combined to form the parts of the body. Human actions can be thought of as a collective action of these parts. Hence, learning an effective spatio-temporal representation of the collective motion of these parts is key to action recognition. In this work, we propose an end-to-end pipeline for the task of human action recognition on video sequences using 2D joint trajectories estimated from a pose estimation framework. We use a Hierarchical Bidirectional Long Short Term Memory Network (HBLSTM) to model the spatio-temporal dependencies of the motion by fusing the pose based joint trajectories in a part based hierarchical fashion. To denote the effectiveness of our proposed approach, we compare its performance with six comparative architectures based on our model and also with several other methods on the widely used KTH and Weizmann action recognition datasets.

A. Agarwal and B. Sen—These authors contributed equally to the work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Laptev, L.: Space-time interest points. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 432–439 (2003)

    Google Scholar 

  2. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)

    Article  Google Scholar 

  3. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33

    Chapter  Google Scholar 

  4. Perš, J., Sulić, V., Kristan, M., Perše, M., Polanec, K., Kovačič, S.: Histograms of optical flow for efficient representation of body motion. Pattern Recogn. Lett. 31, 1369–1376 (2010)

    Article  Google Scholar 

  5. Wang, H., Kläser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176 (2011)

    Google Scholar 

  6. Zhou, Z., Shi, F., Wu, W.: Learning spatial and temporal extents of human actions for action detection. IEEE Trans. Multimed. 17(1), 512–525 (2015)

    Article  Google Scholar 

  7. Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  8. Shi, Y., Zeng, W., Huang, T., Wang, Y.: Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2015)

    Google Scholar 

  9. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  10. Brau, E., Jiang, H.: 3D human pose estimation via deep learning from 2D annotations. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 582–591 (2016)

    Google Scholar 

  11. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  12. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines (2016)

    Google Scholar 

  13. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields (2018)

    Google Scholar 

  14. Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild (2018)

    Google Scholar 

  15. Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp. 1297–1304 (2011)

    Google Scholar 

  16. Ye, M., Zhang, Q., Wang, L., Zhu, J., Yang, R., Gall, J.: A survey on human motion analysis from depth data. In: Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds.) Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. LNCS, vol. 8200, pp. 149–187. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-44964-2_8

    Chapter  Google Scholar 

  17. Siddiqui, M., Medioni, G.: Human pose estimation from a single view point, real-time range sensor. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 1–8 (2010)

    Google Scholar 

  18. Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: 2010 IEEE International Conference on Robotics and Automation, pp. 3108–3113 (2010)

    Google Scholar 

  19. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)

    Google Scholar 

  20. Gong, D., Medioni, G., Zhao, X.: Structured time series analysis for human action segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1414–1427 (2014)

    Article  Google Scholar 

  21. Zhang, Y., Zhang, Y., Zhang, Z., Bao, J., Song, Y.: Human activity recognition based on time series analysis using U-Net (2018)

    Google Scholar 

  22. Kim, H., Kim, I.: Human activity recognition as time-series analysis. Math. Probl. Eng. 2015, 1–9 (2015)

    Google Scholar 

  23. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118 (2015)

    Google Scholar 

  24. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018)

    Google Scholar 

  25. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)

    Google Scholar 

  26. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  27. Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_3

    Chapter  Google Scholar 

  28. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997)

    Article  Google Scholar 

  29. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36 (2004)

    Google Scholar 

  30. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: The Tenth IEEE International Conference on Computer Vision (ICCV 2005), pp. 1395–1402 (2005)

    Google Scholar 

  31. Gao, Z., Chen, M., Hauptmann, A.G., Cai, A.: Comparing evaluation protocols on the KTH dataset. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 88–100. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14715-9_10

    Chapter  Google Scholar 

  32. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

    Google Scholar 

  33. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25446-8_4

    Chapter  Google Scholar 

  34. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007)

    Article  Google Scholar 

  35. Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features, pp. 925–931 (2009)

    Google Scholar 

  36. Moussa, M.M., Hamayed, E., Fayek, M.B., El Nemr, H.A.: An enhanced method for human action recognition. J. Adv. Res. 6, 163–169 (2015)

    Article  Google Scholar 

  37. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild (2012)

    Google Scholar 

  38. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aditya Agarwal or Bipasha Sen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Agarwal, A., Sen, B. (2020). An Approach Towards Action Recognition Using Part Based Hierarchical Fusion. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2020. Lecture Notes in Computer Science(), vol 12509. Springer, Cham. https://doi.org/10.1007/978-3-030-64556-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64556-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64555-7

  • Online ISBN: 978-3-030-64556-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics