Skip to main content
Log in

Linear dynamical systems approach for human action recognition with dual-stream deep features

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Human action recognition with a dual-stream architecture using linear dynamical systems (LDSs) approach is discussed in this paper. First, a slice process is established to extract original slices from video sequences. Two slicing methods are adopted to subtract or reserve the remaining frames in the video sequences. By applying background subtraction to adjacent frames of the original slices, difference slices are also expressed. To capture the spatial component of the background and difference expressed in each slice simultaneously, a framework based on pre-trained convolutional neural networks (CNNs) is introduced for dual-stream deep feature extraction. Subsequently, LDSs are established to model the timing relationship between adjacent slices and obtain the temporal component of the background and difference features, which are expressed as linear dynamical background feature (LD-BF) and linear dynamical difference feature (LD-DF). Practical experiments were conducted to demonstrate the effectiveness and robustness of the proposed approach using different datasets. Specifically, our experiments were conducted on the UCF50, UCF101, and hmdb51 datasets. The impact of retaining various principal component analysis (PCA) feature dimensions and distinct slicing methods in terms of detail recognition were evaluated. In particular, combining LD-BF with LD-DF under appropriate feature dimensions and slicing methods further improved the accuracy for the UCF50, UCF101, and hmdb51 datasets. In addition, the computational cost of the feature extraction process was evaluated to illustrate the efficiency of the proposed approach. The experimental results show that the proposed approach is competitive with state-of-the-art approaches in the three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Du Z, Mukaidani H, Saravanakumar R (2020) Action recognition based on linear dynamical systems with deep features in videos. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, pp 2634–2639. IEEE

  2. Simonyan K, Zisserman A (2014) Two–stream convolutional networks for action recognition in videos. In: Proceedings of the advances in neural information processing systems, pp 568–576

  3. Huang Q, Sun S, Wang F (2017) A compact pairwise trajectory representation for action recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 1767–1771. IEEE

  4. Duta IC, Nguyen TA, Aizawa K, Ionescu B, Sebe N (2016) Boosting vlad with double assignment using deep features for action recognition in videos. In: Proceedings of the 23rd international conference on pattern recognition, pp 2210–2215. IEEE

  5. Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatio–temporal convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4597–4605. IEEE

  6. Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6459 IEEE

  7. Zhou Y, Sun X, Zha ZJ, Zeng W (2018) Mict: Mixed 3d/2d convolutional tube for human action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 449–458 IEEE

  8. Liu K, Liu W, Gan C, Tan M, Ma H (2018) T–c3d: Temporal convolutional 3d network for real–time action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  9. Yang H, Yuan C, Li B, Du Y, Xing J, Hu W, Maybank SJ (2019) Asymmetric 3d convolutional neural networks for action recognition. Pattern Recognit 85:1–12

    Article  Google Scholar 

  10. Wang P, Cao Y, Shen C, Liu L, Shen HT (2016) Temporal pyramid pooling–based convolutional neural network for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 27 (12):2613–2622

    Article  Google Scholar 

  11. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large–scale image recognition. In: Proceedings of the international conference on learning representations

  12. Carreira J, Zisserman A, Vadis Q (2018) Action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4733 IEEE

  13. Doretto G, Chiuso A, Wu YN, Soatto S (2003) Dynamic textures. Int J Comput Vis 51(2):91–109

    Article  Google Scholar 

  14. Ravichandran A, Chaudhry R, Vidal R (2012) Categorizing dynamic textures using a bag of dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(2):342–353

    Article  Google Scholar 

  15. Vidal R, Favaro P (2007) Dynamicboost: Boosting time series generated by dynamical systems. In: Proceedings of the IEEE 11th international conference on computer vision, pp 1–6. IEEE

  16. Luo G, Hu W (2013) Learning silhouette dynamics for human action recognition. In: Proceedings of the IEEE international conference on image processing, pp 2832–2836. IEEE

  17. Luo G, Wei J, Hu W, Maybank SJ (2019) Tangent fisher vector on matrix manifolds for action recognition. IEEE Trans Image Process 29:3052–3064

    Article  Google Scholar 

  18. Scovanner P, Ali S, Shah M (2007) A 3–dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia, pp 357–360

  19. Noguchi A, Yanai K (2010) A surf–based spatio-temporal feature for feature-fusion-based action recognition. In: Proceedings of the European conference on computer vision, pp 153–167. Springer

  20. Sahoo SP, Silambarasi R, Ari S (2019) Fusion of histogram based features for human action recognition. In: Proceedings of the international conference on advanced computing & communication systems, pp 1012–1016. IEEE

  21. Xiao X, Hu H, Wang W (2017) Trajectories–based motion neighborhood feature for human action recognition. In: Proceedings of the international conference on image processing, pp 4147–4151. IEEE

  22. Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recogn 81:443–455

    Article  Google Scholar 

  23. Ahmed K, El-Henawy I, Mahmoud HA (2017) Action recognition technique based on fast hog3d of integral foreground snippets and random forest. In: Proceedings of the Intelligent Systems and Computer Vision, pp 1–7. IEEE

  24. Liu J, Huang Y, Peng X, Wang L (2015) Multi–view descriptor mining via codeword net for action recognition. In: Proceedings of the IEEE International Conference on Image Processing, pp 793–797. IEEE

  25. Peng X, Wang L, Wang X, Qiao Y (2016) Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Comput Vis Image Underst 150:109–125

    Article  Google Scholar 

  26. Yang Y, Liu R, Deng C, Gao X (2016) Multi–task human action recognition via exploring supercategory. Signal Process 124:36–44

    Article  Google Scholar 

  27. Wang H, Oneata D, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238

    Article  MathSciNet  Google Scholar 

  28. Duta IC, Uijlings JR, Ionescu B, Aizawa K, Hauptmann AG, Sebe N (2017) Efficient human action recognition using histograms of motion gradients and vlad with descriptor shape information. Multimed Tools Appl 76(21):22445–22472

    Article  Google Scholar 

  29. Fiorini L, Mancioppi G, Semeraro F, Fujita H, Cavallo F (2020) Unsupervised emotional state classification through physiological parameters for social robotics applications. Knowledge–Based Systems 105217:190

    Google Scholar 

  30. Yao G, Lei T, Zhong J, Jiang P (2019) Learning multi–temporal–scale deep information for action recognition. Appl Intell 49(6):2017–2029

    Article  Google Scholar 

  31. Mishra SR, Mishra TK, Sanyal G, Sarkar A, Satapathy SC (2020) Real time human action recognition using triggered frame extraction and a typical cnn heuristic. Pattern Recogn Lett 135:329–336

    Article  Google Scholar 

  32. Sun L, Jia K, Chen K, Yeung DY, Shi BE, Savarese S (2017) Lattice long short–term memory for human action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2147–2156

  33. Majd M, Safabakhsh R (2020) Correlational convolutional lstm for human action recognition. Neurocomputing 396:224–229

    Article  Google Scholar 

  34. Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi–directional lstm with cnn features. IEEE Access 6:1155–1166

    Article  Google Scholar 

  35. Stergiou A, Poppe R (2020) Learn to cycle: Time–consistent feature discovery for action recognition. Pattern Recogn Lett 141:1–7

    Article  Google Scholar 

  36. Zhang Z, Lv Z, Gan C, Zhu Q (2020) Human action recognition using convolutional lstm and fullyconnected lstm with different attentions. Neurocomputing 410:304–316

    Article  Google Scholar 

  37. Gammulle H, Denman S, Sridharan S, Fookes C (2017) Two stream lstm: a deep fusion framework for human action recognition. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 177–186. IEEE

  38. Garcia-Garcia B, Bouwmans T, Silva AJR (2020) Background subtraction in real applications: Challenges, current models and future directions. Comp Sci Rev 100204:35

    MathSciNet  Google Scholar 

  39. Li L, Prakash BA, Faloutsos C (2010) Parsimonious linear fingerprinting for time series. In: Proceedings of the VLDB endowment 3(1)

  40. Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981

    Article  Google Scholar 

  41. Soomro K, Zamir AR, Shah M (2012) A dataset of 101 human action classes from videos in the wild Center for Research in Computer Vision 2(11)

  42. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: Proceedings of the international conference on computer vision, pp 2556–2563. IEEE

  43. Klaser A (2008) Marszałek, M., Schmid, C.: A spatio–temporal descriptor based on 3d–gradients. In: Proceedings of the 19th British machine vision conference, pp 275:1–10. British Machine Vision Association

  44. Shu Y, Shi Y, Wang Y, Zou Y, Yuan Q, Tian Y (2018) Odn: Opening the deep network for open–set action recognition. In: Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE

  45. Hu H, Liao Z, Xiao X (2019) Action recognition using multiple pooling strategies of cnn features. Neural Process Lett 50(1):379–396

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive and insightful comments, which helped enhance the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroaki Mukaidani.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The material in this paper was partially presented at the 2020 IEEE International Conference on Systems, Man, and Cybernetics, October 11-14, 2020, Toronto, Canada [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, Z., Mukaidani, H. Linear dynamical systems approach for human action recognition with dual-stream deep features. Appl Intell 52, 452–470 (2022). https://doi.org/10.1007/s10489-021-02367-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02367-6

Keywords

Navigation