A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement

Kumar, N.; Sukavanam, N.

doi:10.1007/s11042-019-08501-4

A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement

Published: 13 December 2019

Volume 79, pages 6109–6134, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

542 Accesses
13 Citations
Explore all metrics

Abstract

Human tracking and localization play a crucial role in many applications like accident avoidance, action recognition, safety and security, surveillance and crowd analysis. Inspired by its use and scope, we introduced a novel method for human tracking (one or many) and re-localization in a complex environment with large displacement. The model can handle complex background, variations in illumination, changes in target pose, the presence of similar target and appearance (pose and clothes), the motion of target and camera, occlusion of the target, background variation, and massive displacement of the target. Our model uses three convolutional neural network based deep architecture and cascades their learning such that it improves the overall efficiency of the model. The first network learns the pixel level representation of small regions. The second architecture uses these features and learns the displacement of a region with its category between moved, not-moved, and occluded classes. Whereas, the third network improves the displacement result of the second network by utilizing the previous two learning. We also create a semi-synthetic dataset for training purpose. The model is trained on this dataset first and tested on a subset of CamNeT, VOT2015, LITIV-tracking and Visual Tracker Benchmark database without training with real data. The proposed model yield comparative results with respect to current state-of-the-art methods based on evaluation criteria described in Object Tracking Benchmark, TPAMI 2015, CVPR 2013 and ICCV 2017.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object Tracking in Videos Using CNN

Online Multiple Person Tracking Using Fully-Convolutional Neural Networks and Motion Invariance Constraints

Guided MDNet tracker with guided samples

Article 08 February 2021

References

Babenko B, Yang MH, Belongie S (2011) Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 33(8):1619–1632
Article Google Scholar
Bouachir W, Bilodeau GA (2015) Collaborative part-based tracking using salient local predictors. Comput Vis Image Underst 137:88–101
Article Google Scholar
Chen K, Tao W (2018) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimed
Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577
Article Google Scholar
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(Mar):551–585
MathSciNet MATH Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: CVPR, vol 1, p 3
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Dauphin Y, de Vries H, Bengio Y (2015) Equilibrated adaptive learning rates for non-convex optimization. In: Advances in neural information processing systems, pp 1504–1512
Fan H, Ling H (2017) Sanet: structure-aware network for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 42–49
Fan J, Xu W, Wu Y, Gong Y (2010) Human tracking using convolutional neural networks. IEEE Trans Neural Netw 21(10):1610–1623
Article Google Scholar
Fang K, Xiang Y, Savarese S (2017) Recurrent autoregressive networks for online multi-object tracking. arXiv:1711.02741
Gan W, Wang S, Lei X, Lee MS, Kuo CCJ (2018) Online cnn-based multiple object tracking with enhanced model updates and identity association. Signal Process Image Commun 66:95–102
Article Google Scholar
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision. Springer, pp 702–715
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Jia X, Lu H, Yang MH (2012) Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1822–1829
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2015) The visual object tracking vot2015 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1–23
Laaroussi K, Saaidi A, Masrar M, Satori K (2016) Human tracking based on appearance model. In: Proceedings of the mediterranean conference on information & communication technologies 2015. Springer, pp 297–305
Laaroussi K, Saaidi A, Masrar M, Satori K (2018) Human tracking using joint color-texture features and foreground-weighted histogram. Multimed Tools Appl 77(11):13,947–13,981
Article Google Scholar
Le Cun Y, Jackel L, Boser B, Denker J, Graf H, Guyon I, Henderson D, Howard R, Hubbard W (1989) Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun Mag 27(11):41–46
Article Google Scholar
Lu X, Tang F, Huo H, Fang T (2018) Learning channel-aware deep regression for object tracking. Pattern Recogn Lett
Ma C, Huang JB, Yang X, Yang MH (2018) Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Comput Vis 8:1–26
Google Scholar
Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision & pattern recognition
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Senior A, Hampapur A, Tian YL, Brown L, Pankanti S, Bolle R (2006) Appearance models for occlusion handling. Image Vis Comput 24(11):1233–1243
Article Google Scholar
Shen Y, Lin W, Yan J, Xu M, Wu J, Wang J (2015) Person re-identification with correspondence structure learning. In: Proceedings of the IEEE international conference on computer vision, pp 3200–3208
Sidenbladh H, Black MJ, Fleet DJ (2000) Stochastic tracking of 3d human figures using 2d image motion. In: European conference on computer vision. Springer, pp 702–718
Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402
Takada H, Hotta K, Janney P (2016) Human tracking in crowded scenes using target information at previous frames. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 1809–1814
The litiv datasets (2017) http://www.polymtl.ca/litiv/en/vid. Accessed 10 Aug 2018
The visual tracker benchmark database (2017) http://www.visual-tracking.net. Accessed 10 Aug 2018
Wang D, Lu H, Bo C (2015) Visual tracking via weighted local cosine similarity. IEEE Trans Cybern 45(9):1838–1850
Article Google Scholar
Wang D, Sun W, Yu S, Li L, Liu W (2016) A novel background-weighted histogram scheme based on foreground saliency for mean-shift tracking. Multimed Tools Appl 75(17):10,271–10,289
Article Google Scholar
Wang L, Ouyang W, Wang X, Lu H (2015) Visual tracking with fully convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3119–3127
Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: Real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785
Article Google Scholar
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Article Google Scholar
Xiao H, Lin W, Sheng B, Lu K, Yan J, Wang J, Ding E, Zhang Y, Xiong H (2018) Group re-identification: leveraging and integrating multi-grain information. In: 2018 ACM multimedia conference on multimedia conference. ACM, pp 192–200
Zhang K, Zhang L, Liu Q, Zhang D, Yang MH (2014) Fast visual tracking via dense spatio-temporal context learning. In: European conference on computer vision. Springer, pp 127–141
Zhong W, Lu H, Yang MH (2014) Robust object tracking via sparse collaborative appearance model. IEEE Trans Image Process 23(5):2356–2368
Article MathSciNet Google Scholar
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2016) Semantic understanding of scenes through the ade20k dataset. arXiv:1608.05442
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition

Download references

Author information

Authors and Affiliations

Department of Mathematics, IIT Roorkee, Roorkee, 247667, India
N. Kumar & N. Sukavanam

Authors

N. Kumar
View author publications
You can also search for this author inPubMed Google Scholar
N. Sukavanam
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to N. Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, N., Sukavanam, N. A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement. Multimed Tools Appl 79, 6109–6134 (2020). https://doi.org/10.1007/s11042-019-08501-4

Download citation

Received: 28 August 2018
Revised: 30 October 2019
Accepted: 19 November 2019
Published: 13 December 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11042-019-08501-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Object Tracking in Videos Using CNN

Online Multiple Person Tracking Using Fully-Convolutional Neural Networks and Motion Invariance Constraints

Guided MDNet tracker with guided samples

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now