Improved human action recognition approach based on two-stream convolutional neural network model

Liu, Congcong; Ying, Jie; Yang, Haima; Hu, Xing; Liu, Jin

doi:10.1007/s00371-020-01868-8

Improved human action recognition approach based on two-stream convolutional neural network model

Original Article
Published: 05 June 2020

Volume 37, pages 1327–1341, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Congcong Liu¹,
Jie Ying¹,
Haima Yang¹,
Xing Hu¹ &
…
Jin Liu²

1521 Accesses
41 Citations
1 Altmetric
Explore all metrics

Abstract

In order to improve the accuracy of human abnormal behavior recognition, a two-stream convolution neural network model was proposed. This model includes two main parts, VMHI and FRGB. Firstly, the motion history images are extracted and input into VGG-16 convolutional neural network for training. Then, the RGB image is input into Faster R-CNN algorithm for training using Kalman filter-assisted data annotation. Finally, the two stream VMHI and FRGB results are fused. The algorithm can recognize not only single person behavior, but also two person interaction behavior and improve the recognition accuracy of similar actions. Experimental results on KTH, Weizmann, UT-interaction, and TenthLab dataset showed that the proposed algorithm has higher accuracy than the other literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

Pranjal Kumar, Siddhartha Chauhan & Lalit Kumar Awasthi

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Xia Zhao, Limin Wang, … Milan Parmar

Human activity recognition in artificial intelligence framework: a narrative review

Article 18 January 2022

Neha Gupta, Suneet K. Gupta, … Jasjit S. Suri

References

Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Article Google Scholar
Fujiyoshi, H., Lipton, A.J.: Real-time human motion analysis by image skeletonization. Appl. Comput. Vis. 87, 113–120 (1998)
Google Scholar
Yang, X., Tian, Y.L.: Effective 3D action recognition using EigenJoints. J. Vis. Commun. Image Represent. 25(1), 2–11 (2014)
Article Google Scholar
Chaudhry, R., Ravichandran, A., Hager, G.: Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 20–25 (2009)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)
Article Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: IEEE International Conference on Pattern Recognition, pp. 23–26 (2004)
Rapantzikos, K., Avrithis, Y., Kollias, S.: Dense saliency-based spatiotemporal feature points for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 43–48 (2009)
Hu, X.: Huang Y, Duan Q, et al, Abnormal event detection in crowded scenes using histogram of oriented contextual gradient descriptor. EURASIP J. Adv. Signal Process. 2018(1), 54 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Szegedy, C., Liu, W., Jia, Y.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2009)
He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object recognition with region proposal networks. In: International Conference on Neural Information Processing Systems (2015)
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object recognition. In: IEEE International Computer Vision and Pattern Recognition (2016)
Liu, W., Anguelov, D., Erhan, D.: SSD: single shot multibox detector. In: European Conference on Computer Vision (2016)
Li, C., Wang, P., Wang, S.: Skeleton-based action recognition using LSTM and CNN. In: IEEE International Conference on Multimedia and Expo Workshops (2017)
Donahue, J., Hendricks, L.A., Guadarrama, S.: Long-term recurrent convolutional networks for visual recognition and description. In: AB Initto Calculation of the Structures and Properties of Molecules (2015)
Ji, S., Xu, W., Yang, M.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Wang, X., Gao, L., Song, J.: Beyond frame-level CNN: saliency-aware 3D CNN with LSTM for video action recognition. IEEE Signal Process. Lett. 99, 1 (2016)
Google Scholar
Simonyan, K., Zisserma, A.: Two-stream convolutional networks for action recognition in videos. In: Conference and Workshop on Neural Information Processing Systems (2014)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: IEEE International Computer Vision and Pattern Recognition (2016)
Wang, L., Xiong, Y., Wang, Z.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision (2016)
Chen, J., Wu, J., Konrad, J.: Semi-coupled two-stream fusion con-vnets for action recognition at extremely low resolutions. In: IEEE Winter Conference on Applications of Computer Vision (2017)
Wang, X., Gao, L., Wang, P.: Two-stream 3-D convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans. Multimed. 20, 634–644 (2018)
Article Google Scholar
Zhao, R., Ali, H., Smagt, P.V.D.: Two-stream RNN/CNN for action recognition in 3D videos. In: IEEE International Conference on Intelligent Robots and Systems (2017)
Afrasiabi, M., Khotanlou, H., Mansoorizadeh, M.: DTW-CNN: time series-based human interaction prediction in videos using CNN-extracted features. Vis. Comput. (2019). https://doi.org/10.1007/s00371-019-01722-6
Article Google Scholar
Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient Intell. Hum. Comput. 11, 189–208 (2020)
Article Google Scholar
Yi, Y., Li, A., Zhou, X.F.: Human action recognition based on action relevance weighted encoding. Signal Process. Image Commun. 80, 115640 (2020)
Article Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Article Google Scholar
Acuna, D., Ling, H., Kar, A.: Efficient interactive annotation of segmentation datasets with polygon-RNN++. In: IEEE International Computer Vision and Pattern Recognition (2018)
Castrejon, L., Kundu, K., Urtasun, R.: Annotating object instances with a polygon-RNN. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Siswantoro, J., Prabuwono, A.S., Abdullah, A.: A linear model based on Kalman filter for improving neural network classification performance. Expert Syst. Appl. 49, 112–122 (2016)
Article Google Scholar
Duin, R.P.W.: The combining classifier: to train or not to train. In: International Conference on Pattern Recognition (2002)
The KTH Dataset: http://www.nada.kth.se/cvap/actions/. Accessed on 18 Jan. (2005)
The Weizmann Dataset: http://www.wisdom.weizmann.ac.il/. Accessed on 24 Dec. (2007)
The UT-Interaction Dataset: http://cvrc.ece.utexas.edu/SDHA2010 (2007)
Qian, H., Zhou, J., Mao, Y.: Recognizing human actions from silhouettes described with weighted distance metric and kinematics. Multimed. Tools Appl. 76, 21889–21910 (2017)
Article Google Scholar
Xu, K., Jiang, X., Sun, T.: Two-stream dictionary learning architecture for action recognition. IEEE Trans. Circuits Syst. Video 27, 567–576 (2017)
Article Google Scholar
Chou, K.P., Prasad, M., Wu, D.: Robust feature-based automated multi-view human action recognition system. IEEE Access 6, 1 (2018)
Article Google Scholar
Ko, K.E., Sim, K.B.: Deep convolutional framework for abnormal activities recognition in a smart surveillance system. Eng. Appl. Artif. Intell. 67, 226–234 (2018)
Article Google Scholar
Wang, J., Zhou, S.C., Xia, L.M.: Human interaction recognition based on sparse representation of feature covariance matrices. J. Central South Univ. 25(2), 304–314 (2018)
Article Google Scholar
Vishwakarma, D.K., Dhiman, C.: A unified model for human activity recognition using spatial distribution of gradients and difference of Gaussian kernel. Vis. Comput. 35, 1595–1613 (2019)
Article Google Scholar
Sahoo, P.S., Ari, S.: On an algorithm for human action recognition. Expert Syst. Appl. 115, 524–534 (2019)
Article Google Scholar
Vishwakarma, D.K.: A twofold transformation model for human action recognition using decisive pose. Cognit. Syst. Res. 61, 1–13 (2020)
Article Google Scholar

Download references

Acknowledgements

Shanghai Natural Science Foundation (No. 17ZR1443500), Fund Project of National Natural Science Foundation of China (No. 61701296), Joint Funds of the National Natural Science Foundation of China (No. U1831133).

Author information

Authors and Affiliations

School of Optical Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Congcong Liu, Jie Ying, Haima Yang & Xing Hu
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China
Jin Liu

Authors

Congcong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ying
View author publications
You can also search for this author in PubMed Google Scholar
Haima Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haima Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Ying, J., Yang, H. et al. Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37, 1327–1341 (2021). https://doi.org/10.1007/s00371-020-01868-8

Download citation

Published: 05 June 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00371-020-01868-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved human action recognition approach based on two-stream convolutional neural network model

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

A review of convolutional neural networks in computer vision

Human activity recognition in artificial intelligence framework: a narrative review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved human action recognition approach based on two-stream convolutional neural network model

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

A review of convolutional neural networks in computer vision

Human activity recognition in artificial intelligence framework: a narrative review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation