Skip to main content
Log in

Research on human behavior recognition in video based on 3DCCA

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human behavior is an important part of video content. Therefore, the effective recognition of human behavior in the video has attracted extensive attention. In order to solve the problem that the key features are not prominent and the accuracy rate is not high in the existing methods of human behavior recognition in video. This paper proposes a three-dimensional convolutional neural network fusing channel attention (3DCCA) model feature extraction method. Mean normalization is presented for the preprocessing of RGB video frames. The three-dimensional convolution (3DCNN) is presented for the spatiotemporal features extraction of the inputs clips. The channel attention(CA) is used to select features that are more critical for current behavior recognition from all features. Softmax classifiers to achieve in the Classification and Identification of the human behavior in video. The training results on UCF101 and HMDB51 public datasets show that the algorithm can make better use of the original information in the video, extract more effective features, correctly detect human behaviors and actions and show stronger recognition ability to the algorithm compared with other commonly used human behavior feature extraction and recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Action recognition. UCF101: a large human motion database (n.d.). https://www.crcv.ucf.edu/data/UCF101.php

  2. Action recognition. HMDB: a large human motion database (n.d.). http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/

  3. Cai Z, Wang L, Peng X, et al, “Multi-view super vector for action recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–603, Columbus, OH, USA, June 2014.

  4. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308, Honolulu, HI, USA

  5. Du W, Wang Y, Qia Y (2018) Recurrent spatial-temporal attention network for action recognition in videos. IEEE Trans Image Process 27:1347–1360

    Article  MathSciNet  MATH  Google Scholar 

  6. Hbn A, Fmb C, Mhya C et al (2021) T-VLAD: temporal vector of locally aggregated descriptor for Multiview human action recognition. Pattern Recognition Letters

    Google Scholar 

  7. Hsueh YL, Lie WN, Guo GY (2020) Human Behavior Recognition from Multiview Videos. Inf Sci 517:275–296

    Article  Google Scholar 

  8. Hu H, Cheng K, Li Z, Chen J, Hu H (2020) Workflow recognition with structured two-stream convolutional networks. Pattern Recogn Lett 130:267–274

    Article  Google Scholar 

  9. Huang J, Lin S, Wang N, Dai G, Xie Y, Zhou J (2020) TSE-CNN: a two-stage end-to-end CNN for human activity recognition. IEEE J Biomed Health Inform 24(1):292–299

    Article  Google Scholar 

  10. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  11. Kim JH, Cho YI (2020) A new residual attention network based on attention models for human action recognition in video. J Korea Soc Comp Inform 25(1):55–61

    Article  Google Scholar 

  12. Klaser A, Marszalek M, Schmid C (2008) A Spatio-Temporal Descriptor Based on 3D-Gradients. In: Proceedings of the 19th British Machine Vision Conference, pp. 1–10, Leeds, United Kingdom

  13. Laptev I (2005) On space-time interest points. Int J Comput Vis 64(3):107–123

    Article  MathSciNet  Google Scholar 

  14. Li R, Wang L, Wang K (2014) A review of research on human movement and behavior recognition. Pattern Recogn Artif Intell 27(1):35–48

    Google Scholar 

  15. Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CGM (2018) VideoLSTM convolves, attends and flows for action recognition, Comput. Vis Image Underst 166:41–50

    Article  Google Scholar 

  16. Liciotti D, Bernardini M, Romeo L, Frontoni E (2020) A sequential deep learning application for recognising human activities in smart homes. Neurocomputing 396:501–513

    Article  Google Scholar 

  17. Peng Y, Zhao Y, Zhang J (2019) Two-stream collaborative learning with spatialtemporal attention for video classification. IEEE Trans Circuits Syst Video Technol 29(3):773–786

    Article  Google Scholar 

  18. Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541, Venice, Italy

  19. Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 357–360, Augsburg, Germany

  20. Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. CoRR

    Google Scholar 

  21. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 28th Neural Information Processing Systems, pp. 568–576, Montreal, Canada

  22. Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497, Santiago, Chile

  23. Tu NA, Huynh-The T, Khan KU, Lee YK (2019) ML-HDP: a hierarchical Bayesian nonparametric model for recognizing human actionsin video. IEEE Trans Circuits Syst for Video Technol 29(3):800–814

    Article  Google Scholar 

  24. Wang L (2018) Three-dimensional convolutional restricted Boltzmann machine for human behavior recognition from RGB-D video. EURASIP J Image Video Process 120:1–11

    Google Scholar 

  25. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558, Sydney, Australia

  26. Wang L, Xiong Y, Wang Z, et al (2016) Temporal segment networks: Towards good practices for deep action recognition. In: Proceedings of the European Conference on Computer Vision, pp. 20–36, Springer, Cham

  27. Woo S, Park J, Lee J Y, et al (2018) CBAM: Convolutional Block Attention Module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19, Springer, Cham

  28. Yao F (2020) Deep learning analysis of human behaviour recognition based on convolutional neural network analysis. Behav Inform Technol 40:1–9

    Google Scholar 

  29. Yao G, Lei T, Zhong J (2019) A review of Convolutional-neural-network-based action recognition. Pattern Recogn Lett 118:14–22

    Article  Google Scholar 

  30. Ye Q, Liang Z, Zhong H, et al (2022) “Human behavior recognition based on time correlation sampling two-stream heterogeneous grafting network,” in Optik - International Journal for Light and Electron Optics, vol. 251, Elsevier,168402

  31. Yeung S, Russakovsky O, Jin N et al (2015) Every moment counts: dense detailed labeling of actions in complex videos. Int J Comput Vis 126(2–4):375–389

    MathSciNet  Google Scholar 

  32. T. Yu, C. Guo, L. Wang, et al, “Joint spatial-temporal attention for action recognition,” Pattern Recognit, Lett, vol. 112, pp. 226–233, 2018.

  33. Yu S, Xie L, Liu L, Xia D (2020) Learning long-term temporal features with deep neural networks for human action recognition. IEEE Access 8:1840–1850

    Article  Google Scholar 

  34. Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, et al (2015) Beyond short snippets: Deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702, Boston, MA, USA

  35. Zhang J, Hu H (2019) Domain learning joint with semantic adaptation for human action recognition. Pattern Recogn 90:196–209

    Article  Google Scholar 

  36. Zhang B, Wang L, Wang Z, Qiao Y, Wang H (May 2018) Real-time action recognition with deeply transferred motion vector CNNS. IEEE TransImage Process 27(5):2326–2339

    Article  MathSciNet  Google Scholar 

  37. Zhang M, Yang Y, Ji Y, Xie N, Shen F (2018) Recurrent attention network using spatial-temporal relations for action recognition. Proceed Signal Process 145:137–145

    Article  Google Scholar 

  38. Zhang M, Yang Y, Ji Y, Xie N, Shen F (2018) Recurrent attention network using spatial-temporal relations for action recognition. Signal Process 145:137–145

    Article  Google Scholar 

  39. Zhang J, Hu H, Lu X (2019) Moving foreground-aware visual attention and key volume mining for human action recognition. ACM Trans Multimedia Comput Commun Appl 15(3):1–16

    Article  Google Scholar 

  40. Zufan Zhang, Zongming Lv, Chenquan Gan et al, “Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions,” in Proceedings of the Neurocomputing, vol. 410, pp. 304–316, 2020.

  41. Zhao S, Liu Y, Han Y, Hong R, Hu Q, Tian Q (2018) Pooling the convolutional layers in deep convNets for video action recognition. Proceed IEEE Trans, Circuits Syst, VideoTechnol 28(8):1839–1849

    Article  Google Scholar 

  42. Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput 55:42–52

    Article  Google Scholar 

Download references

Acknowledgments

This research work was supported in part by the National Science Foundation of China under Grant 51668043, and Grant 61262016, in part by the CERNET Innovation Project under Grant NGII20160311, and Grant NGII20160112, and in part by the Gansu Science Foundation of China under Grant 18JR3RA156.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Liu.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, H., Liu, J. & Wang, W. Research on human behavior recognition in video based on 3DCCA. Multimed Tools Appl 82, 20251–20268 (2023). https://doi.org/10.1007/s11042-023-14355-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14355-8

Keywords

Navigation