Abstract
Image classification and video recognition are always a key issue in computer vision. Until now, the recognition of videos has not achieved good results in some application filed, such as the recognition of surveillance videos. In order to achieve better recognition results, in this paper, we propose a new algorithm to recognize video by five coherent pictures. Firstly, the features of the video frames are extracted by Resnet, and then the features are sent to a 2-layer LSTM for processing, and finally classification by gathering the fully connected layer. We use the collected shipping data as a dataset to detect the algorithm model in this paper. The results of experiment show that the recognition of the proposed algorithm are better than other methods, and the total accuracy increased from 0.967 to 0.981.
Similar content being viewed by others
References
Ballas N, Yao L, Pal C et al (2015) Delving deeper into convolutional networks for learning video representations[J]. Comput Sci
Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In ECCV 5
Chen YN, Han CC, Wang CT et al (2006) The application of a convolution neural network on face and license plate detection[C]. Int Conf Pattern Recogn IEEE Comput Soc 552–555
Chung J, Gulcehre C, Cho KH et al (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. Eprint Arxiv
Deng J, Dong W, Socher R et al (2009) ImageNet: A large-scale hierarchical image database[C]. Comput Vis Pattern Recogn 2009. CVPR 2009. IEEE Conference IEEE 248–255
Deutsch. Supervised Sequence Labelling with Recurrent Neural Networks | Springer[J]. Springer-Verlag Berlin Heidelberg, 2012
Donahue J, Hendricks LA, Guadarrama S et al (2015) Long-term recurrent convolutional networks for visual recognition and description[C]. Comput Vis Pattern Recogn IEEE 677
Donahue J, Hendricks LA, Rohrbach M et al (2015) Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2625–2634
Glorot X, Bordes A, Bengio Y (2012) Deep Sparse Rectifier Neural Networks[C]. Int Conf Art Intell Stat 315–323
Graves A, Jaitly N, Mohamed AR (2013) Hybrid speech recognition with deep bidirectional LSTM. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic, 273–278
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. IEEE Conf Comput Vis Pattern Recogn IEEE Comput Soc 770–778
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks[J]. Science 313(5786):504–507
Hochreiter S (1998) Recurrent neural net learning and vanishing gradient[J]
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. 448–456
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35(1):221–231 2, 5
Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. Proceedings of the ACM International Conference on Image and Video Retrieval. Amsterdam, Netherlands, 494–501
Kiperwasser E, Goldberg Y (2016) Simple and accurate dependency parsing using bidirectional LSTM feature representations[J]
Kolen JF, Kremer SC (2001) Gradient flow in recurrent nets: the difficulty of learning long term dependencies[J]. 28(2):237–243
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks[C]. Int Conf Neural Inf Process Syst. Curran Associates Inc. 1097–1105
Lecun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551
Ng YH, Hausknecht M, Vijayanarasimhan S et al (2015) Beyond short snippets: deep networks for video classification[J]
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks[C]. International Conference on International Conference on Machine Learning. JMLR.org, III-1310
Rumelhart DE, Hinton GE et al (1986) Learning representations by back-propagating errors[J]. 323(6088):399–421
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199, 2, 5, 6
Simonyan K, Zisserman A (2014) Two-stream convolutional networks foraction recognition in videos. Proceedings of the International Conference on neural information processing systems. Montreal, Canada, 568–576
Sutskever I (2013) Training recurrent neural networks[J]. Doctoral
Szarvas M, Yoshizawa A, Yamamoto M et al (2005) Pedestrian detection with convolutional neural networks[C]. Intelligent Vehicles Symposium, 2005. Proc IEEE IEEE 224–229
Szegedy C, Ioffe S, Vanhoucke V et al (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning[J]
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions[C]. IEEE Conf Comput Vis Pattern Recogn IEEE 1–9
Tivive FHC, Bouzerdoum A (2003) A new class of convolutional neural networks (SICoNNets) and their application of face detection[C]. International Joint Conference on Neural Networks. IEEE 3:2157–2162
Tivive FHC, Bouzerdown A (2006) An eye feature detector based on convolutional neural network[C]. Eighth Int Symp Signal Process Applic IEEE 90–93
Tran D, Bourdev L, Fergus R et al (2014) Learning spatiotemporal features with 3D convolutional networks[J]
Wang X, Ji Q (2015) Video event recognition with deep hierarchical context model. Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, USA, 4418–4427
Yang J, Yu K, Gong Y et al (2009) Linear spatial pyramid matching using sparse coding for image classification[C]. Proc IEEE Conf Comput Vis Pattern Recogn. Piscataway, NJ: IEEE Press, 1794–1801
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks[C]. European Conference on Computer Vision. Cham, Switzerland: Springer International Publishing AG, 818–833
Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 38(10):1943–1955
Zhu L, Xu Z, Yang Y et al (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):409–421
Acknowledgements
This research is supported by the National Natural Science Foundation of China (No.61373109, No. 61602349), the Educational Research Project from the Educational Commission of Hubei Province (2016234).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, H., Zhao, L. & Dai, G. Surveillance videos classification based on multilayer long short-term memory networks. Multimed Tools Appl 79, 12125–12137 (2020). https://doi.org/10.1007/s11042-019-08431-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08431-1