Skip to main content

Advertisement

Log in

Improved human action recognition approach based on two-stream convolutional neural network model

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In order to improve the accuracy of human abnormal behavior recognition, a two-stream convolution neural network model was proposed. This model includes two main parts, VMHI and FRGB. Firstly, the motion history images are extracted and input into VGG-16 convolutional neural network for training. Then, the RGB image is input into Faster R-CNN algorithm for training using Kalman filter-assisted data annotation. Finally, the two stream VMHI and FRGB results are fused. The algorithm can recognize not only single person behavior, but also two person interaction behavior and improve the recognition accuracy of similar actions. Experimental results on KTH, Weizmann, UT-interaction, and TenthLab dataset showed that the proposed algorithm has higher accuracy than the other literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  1. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)

    Article  Google Scholar 

  2. Fujiyoshi, H., Lipton, A.J.: Real-time human motion analysis by image skeletonization. Appl. Comput. Vis. 87, 113–120 (1998)

    Google Scholar 

  3. Yang, X., Tian, Y.L.: Effective 3D action recognition using EigenJoints. J. Vis. Commun. Image Represent. 25(1), 2–11 (2014)

    Article  Google Scholar 

  4. Chaudhry, R., Ravichandran, A., Hager, G.: Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 20–25 (2009)

  5. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)

    Article  Google Scholar 

  6. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: IEEE International Conference on Pattern Recognition, pp. 23–26 (2004)

  7. Rapantzikos, K., Avrithis, Y., Kollias, S.: Dense saliency-based spatiotemporal feature points for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 43–48 (2009)

  8. Hu, X.: Huang Y, Duan Q, et al, Abnormal event detection in crowded scenes using histogram of oriented contextual gradient descriptor. EURASIP J. Adv. Signal Process. 2018(1), 54 (2018)

    Article  Google Scholar 

  9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)

  10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

  11. Szegedy, C., Liu, W., Jia, Y.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2009)

  12. He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)

  13. Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object recognition with region proposal networks. In: International Conference on Neural Information Processing Systems (2015)

  14. Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object recognition. In: IEEE International Computer Vision and Pattern Recognition (2016)

  15. Liu, W., Anguelov, D., Erhan, D.: SSD: single shot multibox detector. In: European Conference on Computer Vision (2016)

  16. Li, C., Wang, P., Wang, S.: Skeleton-based action recognition using LSTM and CNN. In: IEEE International Conference on Multimedia and Expo Workshops (2017)

  17. Donahue, J., Hendricks, L.A., Guadarrama, S.: Long-term recurrent convolutional networks for visual recognition and description. In: AB Initto Calculation of the Structures and Properties of Molecules (2015)

  18. Ji, S., Xu, W., Yang, M.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  19. Wang, X., Gao, L., Song, J.: Beyond frame-level CNN: saliency-aware 3D CNN with LSTM for video action recognition. IEEE Signal Process. Lett. 99, 1 (2016)

    Google Scholar 

  20. Simonyan, K., Zisserma, A.: Two-stream convolutional networks for action recognition in videos. In: Conference and Workshop on Neural Information Processing Systems (2014)

  21. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: IEEE International Computer Vision and Pattern Recognition (2016)

  22. Wang, L., Xiong, Y., Wang, Z.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision (2016)

  23. Chen, J., Wu, J., Konrad, J.: Semi-coupled two-stream fusion con-vnets for action recognition at extremely low resolutions. In: IEEE Winter Conference on Applications of Computer Vision (2017)

  24. Wang, X., Gao, L., Wang, P.: Two-stream 3-D convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans. Multimed. 20, 634–644 (2018)

    Article  Google Scholar 

  25. Zhao, R., Ali, H., Smagt, P.V.D.: Two-stream RNN/CNN for action recognition in 3D videos. In: IEEE International Conference on Intelligent Robots and Systems (2017)

  26. Afrasiabi, M., Khotanlou, H., Mansoorizadeh, M.: DTW-CNN: time series-based human interaction prediction in videos using CNN-extracted features. Vis. Comput. (2019). https://doi.org/10.1007/s00371-019-01722-6

    Article  Google Scholar 

  27. Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient Intell. Hum. Comput. 11, 189–208 (2020)

    Article  Google Scholar 

  28. Yi, Y., Li, A., Zhou, X.F.: Human action recognition based on action relevance weighted encoding. Signal Process. Image Commun. 80, 115640 (2020)

    Article  Google Scholar 

  29. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  30. Acuna, D., Ling, H., Kar, A.: Efficient interactive annotation of segmentation datasets with polygon-RNN++. In: IEEE International Computer Vision and Pattern Recognition (2018)

  31. Castrejon, L., Kundu, K., Urtasun, R.: Annotating object instances with a polygon-RNN. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

  32. Siswantoro, J., Prabuwono, A.S., Abdullah, A.: A linear model based on Kalman filter for improving neural network classification performance. Expert Syst. Appl. 49, 112–122 (2016)

    Article  Google Scholar 

  33. Duin, R.P.W.: The combining classifier: to train or not to train. In: International Conference on Pattern Recognition (2002)

  34. The KTH Dataset: http://www.nada.kth.se/cvap/actions/. Accessed on 18 Jan. (2005)

  35. The Weizmann Dataset: http://www.wisdom.weizmann.ac.il/. Accessed on 24 Dec. (2007)

  36. The UT-Interaction Dataset: http://cvrc.ece.utexas.edu/SDHA2010 (2007)

  37. Qian, H., Zhou, J., Mao, Y.: Recognizing human actions from silhouettes described with weighted distance metric and kinematics. Multimed. Tools Appl. 76, 21889–21910 (2017)

    Article  Google Scholar 

  38. Xu, K., Jiang, X., Sun, T.: Two-stream dictionary learning architecture for action recognition. IEEE Trans. Circuits Syst. Video 27, 567–576 (2017)

    Article  Google Scholar 

  39. Chou, K.P., Prasad, M., Wu, D.: Robust feature-based automated multi-view human action recognition system. IEEE Access 6, 1 (2018)

    Article  Google Scholar 

  40. Ko, K.E., Sim, K.B.: Deep convolutional framework for abnormal activities recognition in a smart surveillance system. Eng. Appl. Artif. Intell. 67, 226–234 (2018)

    Article  Google Scholar 

  41. Wang, J., Zhou, S.C., Xia, L.M.: Human interaction recognition based on sparse representation of feature covariance matrices. J. Central South Univ. 25(2), 304–314 (2018)

    Article  Google Scholar 

  42. Vishwakarma, D.K., Dhiman, C.: A unified model for human activity recognition using spatial distribution of gradients and difference of Gaussian kernel. Vis. Comput. 35, 1595–1613 (2019)

    Article  Google Scholar 

  43. Sahoo, P.S., Ari, S.: On an algorithm for human action recognition. Expert Syst. Appl. 115, 524–534 (2019)

    Article  Google Scholar 

  44. Vishwakarma, D.K.: A twofold transformation model for human action recognition using decisive pose. Cognit. Syst. Res. 61, 1–13 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

Shanghai Natural Science Foundation (No. 17ZR1443500), Fund Project of National Natural Science Foundation of China (No. 61701296), Joint Funds of the National Natural Science Foundation of China (No. U1831133).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haima Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Ying, J., Yang, H. et al. Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37, 1327–1341 (2021). https://doi.org/10.1007/s00371-020-01868-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01868-8

Keywords

Navigation