Abstract
Human action recognition with a dual-stream architecture using linear dynamical systems (LDSs) approach is discussed in this paper. First, a slice process is established to extract original slices from video sequences. Two slicing methods are adopted to subtract or reserve the remaining frames in the video sequences. By applying background subtraction to adjacent frames of the original slices, difference slices are also expressed. To capture the spatial component of the background and difference expressed in each slice simultaneously, a framework based on pre-trained convolutional neural networks (CNNs) is introduced for dual-stream deep feature extraction. Subsequently, LDSs are established to model the timing relationship between adjacent slices and obtain the temporal component of the background and difference features, which are expressed as linear dynamical background feature (LD-BF) and linear dynamical difference feature (LD-DF). Practical experiments were conducted to demonstrate the effectiveness and robustness of the proposed approach using different datasets. Specifically, our experiments were conducted on the UCF50, UCF101, and hmdb51 datasets. The impact of retaining various principal component analysis (PCA) feature dimensions and distinct slicing methods in terms of detail recognition were evaluated. In particular, combining LD-BF with LD-DF under appropriate feature dimensions and slicing methods further improved the accuracy for the UCF50, UCF101, and hmdb51 datasets. In addition, the computational cost of the feature extraction process was evaluated to illustrate the efficiency of the proposed approach. The experimental results show that the proposed approach is competitive with state-of-the-art approaches in the three datasets.
Similar content being viewed by others
References
Du Z, Mukaidani H, Saravanakumar R (2020) Action recognition based on linear dynamical systems with deep features in videos. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, pp 2634–2639. IEEE
Simonyan K, Zisserman A (2014) Two–stream convolutional networks for action recognition in videos. In: Proceedings of the advances in neural information processing systems, pp 568–576
Huang Q, Sun S, Wang F (2017) A compact pairwise trajectory representation for action recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 1767–1771. IEEE
Duta IC, Nguyen TA, Aizawa K, Ionescu B, Sebe N (2016) Boosting vlad with double assignment using deep features for action recognition in videos. In: Proceedings of the 23rd international conference on pattern recognition, pp 2210–2215. IEEE
Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatio–temporal convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4597–4605. IEEE
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6459 IEEE
Zhou Y, Sun X, Zha ZJ, Zeng W (2018) Mict: Mixed 3d/2d convolutional tube for human action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 449–458 IEEE
Liu K, Liu W, Gan C, Tan M, Ma H (2018) T–c3d: Temporal convolutional 3d network for real–time action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Yang H, Yuan C, Li B, Du Y, Xing J, Hu W, Maybank SJ (2019) Asymmetric 3d convolutional neural networks for action recognition. Pattern Recognit 85:1–12
Wang P, Cao Y, Shen C, Liu L, Shen HT (2016) Temporal pyramid pooling–based convolutional neural network for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 27 (12):2613–2622
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large–scale image recognition. In: Proceedings of the international conference on learning representations
Carreira J, Zisserman A, Vadis Q (2018) Action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4733 IEEE
Doretto G, Chiuso A, Wu YN, Soatto S (2003) Dynamic textures. Int J Comput Vis 51(2):91–109
Ravichandran A, Chaudhry R, Vidal R (2012) Categorizing dynamic textures using a bag of dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(2):342–353
Vidal R, Favaro P (2007) Dynamicboost: Boosting time series generated by dynamical systems. In: Proceedings of the IEEE 11th international conference on computer vision, pp 1–6. IEEE
Luo G, Hu W (2013) Learning silhouette dynamics for human action recognition. In: Proceedings of the IEEE international conference on image processing, pp 2832–2836. IEEE
Luo G, Wei J, Hu W, Maybank SJ (2019) Tangent fisher vector on matrix manifolds for action recognition. IEEE Trans Image Process 29:3052–3064
Scovanner P, Ali S, Shah M (2007) A 3–dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia, pp 357–360
Noguchi A, Yanai K (2010) A surf–based spatio-temporal feature for feature-fusion-based action recognition. In: Proceedings of the European conference on computer vision, pp 153–167. Springer
Sahoo SP, Silambarasi R, Ari S (2019) Fusion of histogram based features for human action recognition. In: Proceedings of the international conference on advanced computing & communication systems, pp 1012–1016. IEEE
Xiao X, Hu H, Wang W (2017) Trajectories–based motion neighborhood feature for human action recognition. In: Proceedings of the international conference on image processing, pp 4147–4151. IEEE
Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recogn 81:443–455
Ahmed K, El-Henawy I, Mahmoud HA (2017) Action recognition technique based on fast hog3d of integral foreground snippets and random forest. In: Proceedings of the Intelligent Systems and Computer Vision, pp 1–7. IEEE
Liu J, Huang Y, Peng X, Wang L (2015) Multi–view descriptor mining via codeword net for action recognition. In: Proceedings of the IEEE International Conference on Image Processing, pp 793–797. IEEE
Peng X, Wang L, Wang X, Qiao Y (2016) Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Comput Vis Image Underst 150:109–125
Yang Y, Liu R, Deng C, Gao X (2016) Multi–task human action recognition via exploring supercategory. Signal Process 124:36–44
Wang H, Oneata D, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238
Duta IC, Uijlings JR, Ionescu B, Aizawa K, Hauptmann AG, Sebe N (2017) Efficient human action recognition using histograms of motion gradients and vlad with descriptor shape information. Multimed Tools Appl 76(21):22445–22472
Fiorini L, Mancioppi G, Semeraro F, Fujita H, Cavallo F (2020) Unsupervised emotional state classification through physiological parameters for social robotics applications. Knowledge–Based Systems 105217:190
Yao G, Lei T, Zhong J, Jiang P (2019) Learning multi–temporal–scale deep information for action recognition. Appl Intell 49(6):2017–2029
Mishra SR, Mishra TK, Sanyal G, Sarkar A, Satapathy SC (2020) Real time human action recognition using triggered frame extraction and a typical cnn heuristic. Pattern Recogn Lett 135:329–336
Sun L, Jia K, Chen K, Yeung DY, Shi BE, Savarese S (2017) Lattice long short–term memory for human action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2147–2156
Majd M, Safabakhsh R (2020) Correlational convolutional lstm for human action recognition. Neurocomputing 396:224–229
Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi–directional lstm with cnn features. IEEE Access 6:1155–1166
Stergiou A, Poppe R (2020) Learn to cycle: Time–consistent feature discovery for action recognition. Pattern Recogn Lett 141:1–7
Zhang Z, Lv Z, Gan C, Zhu Q (2020) Human action recognition using convolutional lstm and fullyconnected lstm with different attentions. Neurocomputing 410:304–316
Gammulle H, Denman S, Sridharan S, Fookes C (2017) Two stream lstm: a deep fusion framework for human action recognition. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 177–186. IEEE
Garcia-Garcia B, Bouwmans T, Silva AJR (2020) Background subtraction in real applications: Challenges, current models and future directions. Comp Sci Rev 100204:35
Li L, Prakash BA, Faloutsos C (2010) Parsimonious linear fingerprinting for time series. In: Proceedings of the VLDB endowment 3(1)
Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981
Soomro K, Zamir AR, Shah M (2012) A dataset of 101 human action classes from videos in the wild Center for Research in Computer Vision 2(11)
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: Proceedings of the international conference on computer vision, pp 2556–2563. IEEE
Klaser A (2008) Marszałek, M., Schmid, C.: A spatio–temporal descriptor based on 3d–gradients. In: Proceedings of the 19th British machine vision conference, pp 275:1–10. British Machine Vision Association
Shu Y, Shi Y, Wang Y, Zou Y, Yuan Q, Tian Y (2018) Odn: Opening the deep network for open–set action recognition. In: Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE
Hu H, Liao Z, Xiao X (2019) Action recognition using multiple pooling strategies of cnn features. Neural Process Lett 50(1):379–396
Acknowledgements
The authors would like to thank the anonymous reviewers for their constructive and insightful comments, which helped enhance the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest to this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The material in this paper was partially presented at the 2020 IEEE International Conference on Systems, Man, and Cybernetics, October 11-14, 2020, Toronto, Canada [1].
Rights and permissions
About this article
Cite this article
Du, Z., Mukaidani, H. Linear dynamical systems approach for human action recognition with dual-stream deep features. Appl Intell 52, 452–470 (2022). https://doi.org/10.1007/s10489-021-02367-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02367-6