Surveillance videos classification based on multilayer long short-term memory networks

Zhang, Hong; Zhao, Liang; Dai, Gang

doi:10.1007/s11042-019-08431-1

Surveillance videos classification based on multilayer long short-term memory networks

Published: 11 January 2020

Volume 79, pages 12125–12137, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hong Zhang^1,2,
Liang Zhao^1,2 &
Gang Dai^1,2

242 Accesses
2 Citations
Explore all metrics

Abstract

Image classification and video recognition are always a key issue in computer vision. Until now, the recognition of videos has not achieved good results in some application filed, such as the recognition of surveillance videos. In order to achieve better recognition results, in this paper, we propose a new algorithm to recognize video by five coherent pictures. Firstly, the features of the video frames are extracted by Resnet, and then the features are sent to a 2-layer LSTM for processing, and finally classification by gathering the fully connected layer. We use the collected shipping data as a dataset to detect the algorithm model in this paper. The results of experiment show that the recognition of the proposed algorithm are better than other methods, and the total accuracy increased from 0.967 to 0.981.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional Long Short-Term Memory Network Model for Dynamic Texture Classification: A Case Study

Research and Practice of Video Recognition Based on Deep Learning

Exploring the Impact of Convolutions on LSTM Networks for Video Classification

References

Ballas N, Yao L, Pal C et al (2015) Delving deeper into convolutional networks for learning video representations[J]. Comput Sci
Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In ECCV 5
Chen YN, Han CC, Wang CT et al (2006) The application of a convolution neural network on face and license plate detection[C]. Int Conf Pattern Recogn IEEE Comput Soc 552–555
Chung J, Gulcehre C, Cho KH et al (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. Eprint Arxiv
Deng J, Dong W, Socher R et al (2009) ImageNet: A large-scale hierarchical image database[C]. Comput Vis Pattern Recogn 2009. CVPR 2009. IEEE Conference IEEE 248–255
Deutsch. Supervised Sequence Labelling with Recurrent Neural Networks | Springer[J]. Springer-Verlag Berlin Heidelberg, 2012
Donahue J, Hendricks LA, Guadarrama S et al (2015) Long-term recurrent convolutional networks for visual recognition and description[C]. Comput Vis Pattern Recogn IEEE 677
Donahue J, Hendricks LA, Rohrbach M et al (2015) Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2625–2634
Glorot X, Bordes A, Bengio Y (2012) Deep Sparse Rectifier Neural Networks[C]. Int Conf Art Intell Stat 315–323
Graves A, Jaitly N, Mohamed AR (2013) Hybrid speech recognition with deep bidirectional LSTM. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic, 273–278
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. IEEE Conf Comput Vis Pattern Recogn IEEE Comput Soc 770–778
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks[J]. Science 313(5786):504–507
Article MathSciNet Google Scholar
Hochreiter S (1998) Recurrent neural net learning and vanishing gradient[J]
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. 448–456
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35(1):221–231 2, 5
Article Google Scholar
Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. Proceedings of the ACM International Conference on Image and Video Retrieval. Amsterdam, Netherlands, 494–501
Kiperwasser E, Goldberg Y (2016) Simple and accurate dependency parsing using bidirectional LSTM feature representations[J]
Kolen JF, Kremer SC (2001) Gradient flow in recurrent nets: the difficulty of learning long term dependencies[J]. 28(2):237–243
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks[C]. Int Conf Neural Inf Process Syst. Curran Associates Inc. 1097–1105
Lecun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551
Article Google Scholar
Ng YH, Hausknecht M, Vijayanarasimhan S et al (2015) Beyond short snippets: deep networks for video classification[J]
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks[C]. International Conference on International Conference on Machine Learning. JMLR.org, III-1310
Rumelhart DE, Hinton GE et al (1986) Learning representations by back-propagating errors[J]. 323(6088):399–421
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199, 2, 5, 6
Simonyan K, Zisserman A (2014) Two-stream convolutional networks foraction recognition in videos. Proceedings of the International Conference on neural information processing systems. Montreal, Canada, 568–576
Sutskever I (2013) Training recurrent neural networks[J]. Doctoral
Szarvas M, Yoshizawa A, Yamamoto M et al (2005) Pedestrian detection with convolutional neural networks[C]. Intelligent Vehicles Symposium, 2005. Proc IEEE IEEE 224–229
Szegedy C, Ioffe S, Vanhoucke V et al (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning[J]
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions[C]. IEEE Conf Comput Vis Pattern Recogn IEEE 1–9
Tivive FHC, Bouzerdoum A (2003) A new class of convolutional neural networks (SICoNNets) and their application of face detection[C]. International Joint Conference on Neural Networks. IEEE 3:2157–2162
Google Scholar
Tivive FHC, Bouzerdown A (2006) An eye feature detector based on convolutional neural network[C]. Eighth Int Symp Signal Process Applic IEEE 90–93
Tran D, Bourdev L, Fergus R et al (2014) Learning spatiotemporal features with 3D convolutional networks[J]
Wang X, Ji Q (2015) Video event recognition with deep hierarchical context model. Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, USA, 4418–4427
Yang J, Yu K, Gong Y et al (2009) Linear spatial pyramid matching using sparse coding for image classification[C]. Proc IEEE Conf Comput Vis Pattern Recogn. Piscataway, NJ: IEEE Press, 1794–1801
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks[C]. European Conference on Computer Vision. Cham, Switzerland: Springer International Publishing AG, 818–833
Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 38(10):1943–1955
Article Google Scholar
Zhu L, Xu Z, Yang Y et al (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):409–421
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No.61373109, No. 61602349), the Educational Research Project from the Educational Commission of Hubei Province (2016234).

Author information

Authors and Affiliations

College of Computer Science & Technology, Wuhan University of Science & Technology, Wuhan, 430081, China
Hong Zhang, Liang Zhao & Gang Dai
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
Hong Zhang, Liang Zhao & Gang Dai

Authors

Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Gang Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Zhao, L. & Dai, G. Surveillance videos classification based on multilayer long short-term memory networks. Multimed Tools Appl 79, 12125–12137 (2020). https://doi.org/10.1007/s11042-019-08431-1

Download citation

Received: 31 August 2018
Revised: 04 October 2019
Accepted: 01 November 2019
Published: 11 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11042-019-08431-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Surveillance videos classification based on multilayer long short-term memory networks

Abstract

Access this article

Similar content being viewed by others

Convolutional Long Short-Term Memory Network Model for Dynamic Texture Classification: A Case Study

Research and Practice of Video Recognition Based on Deep Learning

Exploring the Impact of Convolutions on LSTM Networks for Video Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Surveillance videos classification based on multilayer long short-term memory networks

Abstract

Access this article

Similar content being viewed by others

Convolutional Long Short-Term Memory Network Model for Dynamic Texture Classification: A Case Study

Research and Practice of Video Recognition Based on Deep Learning

Exploring the Impact of Convolutions on LSTM Networks for Video Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation