Skip to main content

Modelling a Deep Learning Framework for Recognition of Human Actions on Video

  • Conference paper
  • First Online:
Trends and Applications in Information Systems and Technologies (WorldCIST 2021)

Abstract

In Human action recognition, the identification of actions is a system that can detect human activities. The types of human activity are classified into four different categories, depending on the complexity of the steps and the number of body parts involved in the action, namely gestures, actions, interactions, and activities [1]. It is challenging for video Human action recognition to capture useful and discriminative features because of the human body's variations. To obtain Intelligent Solutions for action recognition, it is necessary to training models to recognize which action is performed by a person. This paper conducted an experience on Human action recognition compare several deep learning models with a small dataset. The main goal is to obtain the same or better results than the literature, which apply a bigger dataset with the necessity of high-performance hardware. Our analysis provides a roadmap to reach the training, classification, and validation of each model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ko, T.: A survey on behavior analysis in video surveillance for homeland security applications. In: 37th IEEE Applied Imagery Pattern Recognition Workshop, pp. 1–8. IEEE (2008)

    Google Scholar 

  2. Analide, C., Novais, P., Machado, J., Neves, J.: Quality of knowledge in virtual entities. In: Encyclopedia of Communities of Practice in Information and Knowledge Management, pp. 436–442. IGI Global (2006)

    Google Scholar 

  3. Durães, D., Marcondes, F.S., Gonçalves, F., Fonseca, J., Machado, J., Novais, P.: Detection violent behaviors: a survey. In: International Symposium on Ambient Intelligence, pp. 106–116. Springer, Cham (2020)

    Google Scholar 

  4. Marcondes, F.S., Durães, D., Gonçalves, F., Fonseca, J., Machado, J., Novais, P.: In-vehicle violence detection in carpooling: a brief survey towards a general surveillance system. In: International Symposium on Distributed Computing and Artificial Intelligence, pp. 211–220. Springer, Cham (2020)

    Google Scholar 

  5. Durães, D., Carneiro, D., Jiménez, A., Novais, P.: Characterizing attentive behavior in intelligent environments. Neurocomputing 272, 46–54 (2018)

    Article  Google Scholar 

  6. Costa, R., Neves, J., Novais, P., Machado, J., Lima, L., Alberto, C.: Intelligent mixed reality for the creation of ambient assisted living. In: Portuguese Conference on Artificial Intelligence, pp. 323–331. Springer, Heidelberg (2007)

    Google Scholar 

  7. Zhu, Y., Zhao, X., Fu, Y., Liu, Y.: Sparse coding on local spatial-temporal volumes for human action recognition. In: Asian Conference on Computer Vision, pp. 660–671. Springer, Heidelberg (2010)

    Google Scholar 

  8. Jesus, T., Duarte, J., Ferreira, D., Durães, D., Marcondes, F., Santos, F., Machado, J.: Review of trends in automatic human activity recognition using synthetic audio-visual data. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 549–560. Springer, Cham (2020)

    Google Scholar 

  9. Shokri, M., Harati, A., Taba, K.: Salient object detection in video using deep non-local neural networks. J. Vis. Commun. Image Represent. 68, 102769 (2020)

    Article  Google Scholar 

  10. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016).

    Google Scholar 

  12. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  13. Hochreiter, S., Bengio, Y., Fransconi, P., Schmidhuber, J.: Gradient flow in recorrent nets: the difficulty of learning long-terms dependencies (2001)

    Google Scholar 

  14. Huang, G., Yu, S., Zhung, L., Daniel, S., Killian, Q.W.: Deep networks with stochastic depth. In: European Conference on Computer Vision, pp. 646–661. Springer Cham (2016)

    Google Scholar 

  15. Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6202–6211 (2019)

    Google Scholar 

  16. Carreira, J., Andrew, Z.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  17. Carreira, J., Noland, E., Hillier, C., Zisserman, A.: A short note on the Kinetics-700 human action dataset. arXiv, vol. preprint, no. 1907.06987 (2019)

    Google Scholar 

  18. Li, A., Thotakuri, M., Ross, D.A., Carreira, J., Vostrikov, A., Zisserman, A.: The AVA-kinetics localized human actions video dataset. arXiv preprint 2005.00214 (2020)

    Google Scholar 

Download references

Acknowledgement

This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nº 039334; Funding Reference: POCI-01–0247-FEDER- 039334].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalila Durães .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santos, F. et al. (2021). Modelling a Deep Learning Framework for Recognition of Human Actions on Video. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies. WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1365. Springer, Cham. https://doi.org/10.1007/978-3-030-72657-7_10

Download citation

Publish with us

Policies and ethics