Linear dynamical systems approach for human action recognition with dual-stream deep features

Du, Zhouning; Mukaidani, Hiroaki

doi:10.1007/s10489-021-02367-6

Linear dynamical systems approach for human action recognition with dual-stream deep features

Published: 03 May 2021

Volume 52, pages 452–470, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

535 Accesses
10 Citations
Explore all metrics

Abstract

Human action recognition with a dual-stream architecture using linear dynamical systems (LDSs) approach is discussed in this paper. First, a slice process is established to extract original slices from video sequences. Two slicing methods are adopted to subtract or reserve the remaining frames in the video sequences. By applying background subtraction to adjacent frames of the original slices, difference slices are also expressed. To capture the spatial component of the background and difference expressed in each slice simultaneously, a framework based on pre-trained convolutional neural networks (CNNs) is introduced for dual-stream deep feature extraction. Subsequently, LDSs are established to model the timing relationship between adjacent slices and obtain the temporal component of the background and difference features, which are expressed as linear dynamical background feature (LD-BF) and linear dynamical difference feature (LD-DF). Practical experiments were conducted to demonstrate the effectiveness and robustness of the proposed approach using different datasets. Specifically, our experiments were conducted on the UCF50, UCF101, and hmdb51 datasets. The impact of retaining various principal component analysis (PCA) feature dimensions and distinct slicing methods in terms of detail recognition were evaluated. In particular, combining LD-BF with LD-DF under appropriate feature dimensions and slicing methods further improved the accuracy for the UCF50, UCF101, and hmdb51 datasets. In addition, the computational cost of the feature extraction process was evaluated to illustrate the efficiency of the proposed approach. The experimental results show that the proposed approach is competitive with state-of-the-art approaches in the three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

References

Du Z, Mukaidani H, Saravanakumar R (2020) Action recognition based on linear dynamical systems with deep features in videos. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, pp 2634–2639. IEEE
Simonyan K, Zisserman A (2014) Two–stream convolutional networks for action recognition in videos. In: Proceedings of the advances in neural information processing systems, pp 568–576
Huang Q, Sun S, Wang F (2017) A compact pairwise trajectory representation for action recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 1767–1771. IEEE
Duta IC, Nguyen TA, Aizawa K, Ionescu B, Sebe N (2016) Boosting vlad with double assignment using deep features for action recognition in videos. In: Proceedings of the 23rd international conference on pattern recognition, pp 2210–2215. IEEE
Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatio–temporal convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4597–4605. IEEE
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6459 IEEE
Zhou Y, Sun X, Zha ZJ, Zeng W (2018) Mict: Mixed 3d/2d convolutional tube for human action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 449–458 IEEE
Liu K, Liu W, Gan C, Tan M, Ma H (2018) T–c3d: Temporal convolutional 3d network for real–time action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Yang H, Yuan C, Li B, Du Y, Xing J, Hu W, Maybank SJ (2019) Asymmetric 3d convolutional neural networks for action recognition. Pattern Recognit 85:1–12
Article Google Scholar
Wang P, Cao Y, Shen C, Liu L, Shen HT (2016) Temporal pyramid pooling–based convolutional neural network for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 27 (12):2613–2622
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large–scale image recognition. In: Proceedings of the international conference on learning representations
Carreira J, Zisserman A, Vadis Q (2018) Action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4733 IEEE
Doretto G, Chiuso A, Wu YN, Soatto S (2003) Dynamic textures. Int J Comput Vis 51(2):91–109
Article Google Scholar
Ravichandran A, Chaudhry R, Vidal R (2012) Categorizing dynamic textures using a bag of dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(2):342–353
Article Google Scholar
Vidal R, Favaro P (2007) Dynamicboost: Boosting time series generated by dynamical systems. In: Proceedings of the IEEE 11th international conference on computer vision, pp 1–6. IEEE
Luo G, Hu W (2013) Learning silhouette dynamics for human action recognition. In: Proceedings of the IEEE international conference on image processing, pp 2832–2836. IEEE
Luo G, Wei J, Hu W, Maybank SJ (2019) Tangent fisher vector on matrix manifolds for action recognition. IEEE Trans Image Process 29:3052–3064
Article Google Scholar
Scovanner P, Ali S, Shah M (2007) A 3–dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia, pp 357–360
Noguchi A, Yanai K (2010) A surf–based spatio-temporal feature for feature-fusion-based action recognition. In: Proceedings of the European conference on computer vision, pp 153–167. Springer
Sahoo SP, Silambarasi R, Ari S (2019) Fusion of histogram based features for human action recognition. In: Proceedings of the international conference on advanced computing & communication systems, pp 1012–1016. IEEE
Xiao X, Hu H, Wang W (2017) Trajectories–based motion neighborhood feature for human action recognition. In: Proceedings of the international conference on image processing, pp 4147–4151. IEEE
Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recogn 81:443–455
Article Google Scholar
Ahmed K, El-Henawy I, Mahmoud HA (2017) Action recognition technique based on fast hog3d of integral foreground snippets and random forest. In: Proceedings of the Intelligent Systems and Computer Vision, pp 1–7. IEEE
Liu J, Huang Y, Peng X, Wang L (2015) Multi–view descriptor mining via codeword net for action recognition. In: Proceedings of the IEEE International Conference on Image Processing, pp 793–797. IEEE
Peng X, Wang L, Wang X, Qiao Y (2016) Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Comput Vis Image Underst 150:109–125
Article Google Scholar
Yang Y, Liu R, Deng C, Gao X (2016) Multi–task human action recognition via exploring supercategory. Signal Process 124:36–44
Article Google Scholar
Wang H, Oneata D, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238
Article MathSciNet Google Scholar
Duta IC, Uijlings JR, Ionescu B, Aizawa K, Hauptmann AG, Sebe N (2017) Efficient human action recognition using histograms of motion gradients and vlad with descriptor shape information. Multimed Tools Appl 76(21):22445–22472
Article Google Scholar
Fiorini L, Mancioppi G, Semeraro F, Fujita H, Cavallo F (2020) Unsupervised emotional state classification through physiological parameters for social robotics applications. Knowledge–Based Systems 105217:190
Google Scholar
Yao G, Lei T, Zhong J, Jiang P (2019) Learning multi–temporal–scale deep information for action recognition. Appl Intell 49(6):2017–2029
Article Google Scholar
Mishra SR, Mishra TK, Sanyal G, Sarkar A, Satapathy SC (2020) Real time human action recognition using triggered frame extraction and a typical cnn heuristic. Pattern Recogn Lett 135:329–336
Article Google Scholar
Sun L, Jia K, Chen K, Yeung DY, Shi BE, Savarese S (2017) Lattice long short–term memory for human action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2147–2156
Majd M, Safabakhsh R (2020) Correlational convolutional lstm for human action recognition. Neurocomputing 396:224–229
Article Google Scholar
Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi–directional lstm with cnn features. IEEE Access 6:1155–1166
Article Google Scholar
Stergiou A, Poppe R (2020) Learn to cycle: Time–consistent feature discovery for action recognition. Pattern Recogn Lett 141:1–7
Article Google Scholar
Zhang Z, Lv Z, Gan C, Zhu Q (2020) Human action recognition using convolutional lstm and fullyconnected lstm with different attentions. Neurocomputing 410:304–316
Article Google Scholar
Gammulle H, Denman S, Sridharan S, Fookes C (2017) Two stream lstm: a deep fusion framework for human action recognition. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 177–186. IEEE
Garcia-Garcia B, Bouwmans T, Silva AJR (2020) Background subtraction in real applications: Challenges, current models and future directions. Comp Sci Rev 100204:35
MathSciNet Google Scholar
Li L, Prakash BA, Faloutsos C (2010) Parsimonious linear fingerprinting for time series. In: Proceedings of the VLDB endowment 3(1)
Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) A dataset of 101 human action classes from videos in the wild Center for Research in Computer Vision 2(11)
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: Proceedings of the international conference on computer vision, pp 2556–2563. IEEE
Klaser A (2008) Marszałek, M., Schmid, C.: A spatio–temporal descriptor based on 3d–gradients. In: Proceedings of the 19th British machine vision conference, pp 275:1–10. British Machine Vision Association
Shu Y, Shi Y, Wang Y, Zou Y, Yuan Q, Tian Y (2018) Odn: Opening the deep network for open–set action recognition. In: Proceedings of the IEEE international conference on multimedia and expo, pp 1–6. IEEE
Hu H, Liao Z, Xiao X (2019) Action recognition using multiple pooling strategies of cnn features. Neural Process Lett 50(1):379–396
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive and insightful comments, which helped enhance the quality of this paper.

Author information

Authors and Affiliations

Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, Hiroshima, 739-8527, Japan
Zhouning Du & Hiroaki Mukaidani

Authors

Zhouning Du
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Mukaidani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroaki Mukaidani.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The material in this paper was partially presented at the 2020 IEEE International Conference on Systems, Man, and Cybernetics, October 11-14, 2020, Toronto, Canada [1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Du, Z., Mukaidani, H. Linear dynamical systems approach for human action recognition with dual-stream deep features. Appl Intell 52, 452–470 (2022). https://doi.org/10.1007/s10489-021-02367-6

Download citation

Accepted: 17 March 2021
Published: 03 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02367-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear dynamical systems approach for human action recognition with dual-stream deep features

Abstract

Access this article

Similar content being viewed by others

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Linear dynamical systems approach for human action recognition with dual-stream deep features

Abstract

Access this article

Similar content being viewed by others

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation