Abstract
With the prevalence of surveillance video, surveillance data can be used in a wide variety of applications where moving object detection, object recognition and pedestrian tracking has become a significant field of research. Especially for pedestrian tracking, it has become an urgent problem to be solved. This paper proposes a novel method based on convolutional neural network called Matching-Siamese network for pedestrian tracking. First, the pedestrians are detected from surveillance videos through Faster-R-CNN and are numbered sequentially. Second, Matching-Siamese network is designed by modifying the structure of the traditional Siamese network to calculate the similarity of two input images. Third, using the image similarity determines whether the probe image of the target pedestrian and each pedestrian images are of the same identity or not. Finally, we track the target pedestrian in all videos by using the identity of probe image and pedestrian images. The results in this paper show that the proposed method outperforms most popular algorithms in terms of accuracy, overlap rate and computational efficiency, especially in the circumstances of object disappearing and reappearing. In addition, our method could use a latest probe pedestrian image to accomplish its tracking in videos ranging from randomly selected time and regions well.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
An J, Zhang X (2011) Robust image matching method based on complex wavelet structural Similarity[C]. In: Advances in computer science, environment, ecoinformatics, and education, pp 81–88
Araujo A, Girod B (2017) Large-scale video retrieval using image queries[j]. IEEE transactions on circuits and systems for video technology
Bertinetto L, Valmadre J, Golodetz S et al (2016) Staple: complementary learners for real-time tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Bontar J, Lecun Y (2015) Stereo matching by training a convolutional neural network to compare image patches[J]. J Mach Learn Res 17(1):2287–2318
Bromley J, Guyon I, Lecun Y et al (1993) Signature verification using a siamese time delay neural Network[C]. Adv Neural Inf Proces Syst, DBLP 7(4):737–744
Chopra S, Hadsell R, Lecun Y (2005) Learning a similarity metric discriminatively, with application to face verification[C]. IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:539–546
Cui Z, Xiao S, Feng J et al (2016) Recurrently target-attending tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1449–1458
Danelljan M, Robinson A, Khan FS et al (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking[C]. European Conference on Computer Vision. Springer International Publishing, Berlin
Feng P, Xu C, Zhao Z et al (2017) Sparse representation combined with context information for visual tracking[J]. Neurocomputing 225:92–102
Henriques JF, Rui C, Martins P et al (2014) High-speed tracking with kernelized correlation filters[j]. IEEE Trans Pattern Anal Mach Intell 37(3):583
Jin X, Xu C, Feng J et al (2016) Deep learning with s-shaped rectified linear activation units[c]. In: AAAI, pp 1737–1743
Milan A, Rezatofighi SH, Dick AR et al (2017) Online multi-target tracking using recurrent neural networks[c]. In: AAAI, pp 4225–4232
Rehman A, Gao Y, Wang J et al (2013) Image classification based on complex wavelet structural similarity[J]. Signal Process: Image Commun 28(8):984–992
Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell 39 (6):1137–1149
Shen X, Sui X, Pan K et al (2016) Adaptive pedestrian tracking via patch-based features and spatialCtemporal similarity measurement[J]. Pattern Recogn 53:163–173
Smeulders AWM, Chu DM, Cucchiara R et al (2014) Visual tracking: An experimental survey[J]. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
Vojir T, Noskova J, Matas J (2014) Robust scale-adaptive mean-shift for tracking[J]. Pattern Recogn Lett 49(C):250–258
Wang Z, Lu L, Bovik AC (2004) Video quality assessment based on structural distortion measurement[J]. Signal Process: Image Commun 19(2):121–132
Weng L, Preneel B (2011) A secure perceptual hash algorithm for image content authentication[C]. IFIP International Conference on Communications and Multimedia Security. Springer, Berlin Heidelberg
Xu C, Lu C, Liang X et al (2016) Multi-loss regularized deep neural network[J]. IEEE Trans Circ Syst Video Technol 26(12):2273–2283
Yan C, Xie H, Liu S et al (2017) Effective uyghur language text detection in complex background images for traffic prompt identification[j]. IEEE Transactions on Intelligent Transportation Systems
Yan C, Zhang Y, Xu J et al (2014) Efficient parallel framework for HEVC motion estimation on many-core processors[J]. IEEE Trans Circ Syst Video Technol 24 (12):2077–2089
Yan C, Zhang Y, Xu J et al (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors[J]. IEEE Signal Process Lett 21(5):573–576
Yan C, Xie H, Yang D et al (2017) Supervised hash coding with deep neural network for environment perception of intelligent vehicles[j]. IEEE Transactions on Intelligent Transportation Systems
Yang T, Fu D, Pan S (2017) Pedestrian tracking for infrared image sequence based on trajectory manifold of spatio-temporal slice[J]. Multimed Tools Appl 76 (8):11021–11035
Yilmaz A, Javed O, Shah M (2006) Object tracking: A survey[j]. ACM Comput Surv (CSUR) 38(4):13
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361
Zhong B, Shen Y, Chen Y et al (2015) Online learning 3D context for robust visual tracking[J]. Neurocomputing 151:710–718
Zhu L, Wang R, Xu K (2016) SU-f-j-226: Structural similarity-based ultrasound image similarity measurement[J]. Med Phys 43(6):3461–3461
Acknowledgements
This work is supported by Anhui Science and Technology Department project (No. 1401b042001) and Security and Campus Management of USTC.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luo, Y., Yin, D., Wang, A. et al. Pedestrian tracking in surveillance video based on modified CNN. Multimed Tools Appl 77, 24041–24058 (2018). https://doi.org/10.1007/s11042-018-5728-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5728-8