Skip to main content

Advertisement

Log in

A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human tracking and localization play a crucial role in many applications like accident avoidance, action recognition, safety and security, surveillance and crowd analysis. Inspired by its use and scope, we introduced a novel method for human tracking (one or many) and re-localization in a complex environment with large displacement. The model can handle complex background, variations in illumination, changes in target pose, the presence of similar target and appearance (pose and clothes), the motion of target and camera, occlusion of the target, background variation, and massive displacement of the target. Our model uses three convolutional neural network based deep architecture and cascades their learning such that it improves the overall efficiency of the model. The first network learns the pixel level representation of small regions. The second architecture uses these features and learns the displacement of a region with its category between moved, not-moved, and occluded classes. Whereas, the third network improves the displacement result of the second network by utilizing the previous two learning. We also create a semi-synthetic dataset for training purpose. The model is trained on this dataset first and tested on a subset of CamNeT, VOT2015, LITIV-tracking and Visual Tracker Benchmark database without training with real data. The proposed model yield comparative results with respect to current state-of-the-art methods based on evaluation criteria described in Object Tracking Benchmark, TPAMI 2015, CVPR 2013 and ICCV 2017.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Babenko B, Yang MH, Belongie S (2011) Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 33(8):1619–1632

    Article  Google Scholar 

  2. Bouachir W, Bilodeau GA (2015) Collaborative part-based tracking using salient local predictors. Comput Vis Image Underst 137:88–101

    Article  Google Scholar 

  3. Chen K, Tao W (2018) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimed

  4. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577

    Article  Google Scholar 

  5. Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(Mar):551–585

    MathSciNet  MATH  Google Scholar 

  6. Danelljan M, Bhat G, Khan FS, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: CVPR, vol 1, p 3

  7. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66

  8. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318

  9. Dauphin Y, de Vries H, Bengio Y (2015) Equilibrated adaptive learning rates for non-convex optimization. In: Advances in neural information processing systems, pp 1504–1512

  10. Fan H, Ling H (2017) Sanet: structure-aware network for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 42–49

  11. Fan J, Xu W, Wu Y, Gong Y (2010) Human tracking using convolutional neural networks. IEEE Trans Neural Netw 21(10):1610–1623

    Article  Google Scholar 

  12. Fang K, Xiang Y, Savarese S (2017) Recurrent autoregressive networks for online multi-object tracking. arXiv:1711.02741

  13. Gan W, Wang S, Lei X, Lee MS, Kuo CCJ (2018) Online cnn-based multiple object tracking with enhanced model updates and identity association. Signal Process Image Commun 66:95–102

    Article  Google Scholar 

  14. Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision. Springer, pp 702–715

  15. Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  16. Jia X, Lu H, Yang MH (2012) Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1822–1829

  17. Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2015) The visual object tracking vot2015 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1–23

  18. Laaroussi K, Saaidi A, Masrar M, Satori K (2016) Human tracking based on appearance model. In: Proceedings of the mediterranean conference on information & communication technologies 2015. Springer, pp 297–305

  19. Laaroussi K, Saaidi A, Masrar M, Satori K (2018) Human tracking using joint color-texture features and foreground-weighted histogram. Multimed Tools Appl 77(11):13,947–13,981

    Article  Google Scholar 

  20. Le Cun Y, Jackel L, Boser B, Denker J, Graf H, Guyon I, Henderson D, Howard R, Hubbard W (1989) Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun Mag 27(11):41–46

    Article  Google Scholar 

  21. Lu X, Tang F, Huo H, Fang T (2018) Learning channel-aware deep regression for object tracking. Pattern Recogn Lett

  22. Ma C, Huang JB, Yang X, Yang MH (2018) Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Comput Vis 8:1–26

    Google Scholar 

  23. Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision & pattern recognition

  24. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  25. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  26. Senior A, Hampapur A, Tian YL, Brown L, Pankanti S, Bolle R (2006) Appearance models for occlusion handling. Image Vis Comput 24(11):1233–1243

    Article  Google Scholar 

  27. Shen Y, Lin W, Yan J, Xu M, Wu J, Wang J (2015) Person re-identification with correspondence structure learning. In: Proceedings of the IEEE international conference on computer vision, pp 3200–3208

  28. Sidenbladh H, Black MJ, Fleet DJ (2000) Stochastic tracking of 3d human figures using 2d image motion. In: European conference on computer vision. Springer, pp 702–718

  29. Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402

  30. Takada H, Hotta K, Janney P (2016) Human tracking in crowded scenes using target information at previous frames. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 1809–1814

  31. The litiv datasets (2017) http://www.polymtl.ca/litiv/en/vid. Accessed 10 Aug 2018

  32. The visual tracker benchmark database (2017) http://www.visual-tracking.net. Accessed 10 Aug 2018

  33. Wang D, Lu H, Bo C (2015) Visual tracking via weighted local cosine similarity. IEEE Trans Cybern 45(9):1838–1850

    Article  Google Scholar 

  34. Wang D, Sun W, Yu S, Li L, Liu W (2016) A novel background-weighted histogram scheme based on foreground saliency for mean-shift tracking. Multimed Tools Appl 75(17):10,271–10,289

    Article  Google Scholar 

  35. Wang L, Ouyang W, Wang X, Lu H (2015) Visual tracking with fully convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3119–3127

  36. Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: Real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785

    Article  Google Scholar 

  37. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  38. Xiao H, Lin W, Sheng B, Lu K, Yan J, Wang J, Ding E, Zhang Y, Xiong H (2018) Group re-identification: leveraging and integrating multi-grain information. In: 2018 ACM multimedia conference on multimedia conference. ACM, pp 192–200

  39. Zhang K, Zhang L, Liu Q, Zhang D, Yang MH (2014) Fast visual tracking via dense spatio-temporal context learning. In: European conference on computer vision. Springer, pp 127–141

  40. Zhong W, Lu H, Yang MH (2014) Robust object tracking via sparse collaborative appearance model. IEEE Trans Image Process 23(5):2356–2368

    Article  MathSciNet  Google Scholar 

  41. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2016) Semantic understanding of scenes through the ade20k dataset. arXiv:1608.05442

  42. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, N., Sukavanam, N. A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement. Multimed Tools Appl 79, 6109–6134 (2020). https://doi.org/10.1007/s11042-019-08501-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08501-4

Keywords