Skip to main content
Log in

Multi-object Tracking by Joint Detection and Identification Learning

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Multi-object tracking (MOT) is closely related to video-based object detection and target re-identification. In recent years, with the representation power brought by deep learning, the majority of state-of-the-art methods on object detection and re-identification are based on deep neural networks. However, it is still an open problem to improve the performance of MOT in real challenging scenes. Specifically, recent MOT algorithms have not been optimized together with object detection, which hinders the performance of tracking. Inspired by recent progress on object detection and recognition, we propose a MOT method via joint learning on detection and identification by using existing MOT datasets without external training data. We further introduce a feature enhancement module based on the ConvGRU structure, which helps to deal with deterioration of image quality in video object detection and re-identification, such as motion blur and camera losing focus. Experimental results show that the proposed method achieves competitive performance compared with state-of-the-art methods in video-based object detection, cross-dataset person re-identification, and multi-object tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Benfold B, Reid I (2011) Stable multi-target tracking in real-time surveillance video. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 3457–3464

  2. Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):13

    Article  Google Scholar 

  3. Yang B, Huang C, Nevatia R (2011) Learning affinities and dependencies for multi-target tracking using a CRF model. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1233–1240

  4. Koller D, Weber J, Malik J (1994) Robust multiple car tracking with occlusion reasoning. In: Proceedings of European conference on computer vision, pp 189–196

  5. Lu WL, Ting JA, Little JJ, Murphy KP (2013) Learning to track and identify players from broadcast sports videos. IEEE Trans Pattern Anal Mach Intell 35(7):1704–1716

    Article  Google Scholar 

  6. Rafael M, Juan M, Ortiz-de-Lazcano-Lobato E, Lpez-Rubio E, Domnguez E, Palomo E (2013) A competitive neural network for multiple object tracking in video sequence analysis. Neural Process Lett 37(1):47–67

    Article  Google Scholar 

  7. Piccardi M (2004) Background subtraction techniques: a review. In: Proceedings of the IEEE conference on systems, man and cybernetics, pp 3099–3104

  8. Andriluka M, Roth S, Schiele B (2008) People tracking by detection and people detection by tracking. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1–8

  9. Liu X, Tao D, Song M, Zhang L, Bu J, Chen C (2015) Learning to track multiple targets. IEEE Trans Neural Netw Learn Syst 26(5):1060–1073

    Article  MathSciNet  Google Scholar 

  10. Hong C, Li N, Song M, Bu J, Chen C (2010) A Level-set based tracking approach for surveillance video with fusion and occlusion, Pacific-Rim symposium on image and video technology (PSIVT2010), pp 156–161

  11. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE international conference on image processing, pp 3464–3468

  12. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) POI: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the European conference on computer vision workshops, pp 36–42

  13. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  14. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8

  15. Zuo X, Shen J, Yu H, Xu D, Qian C, Shan Y (2017) Fast pedestrian detection based on the selective window differential filter. Neural Process Lett 48(1):403–417

    Article  Google Scholar 

  16. Ferryman J, Shahrokni A (2009) PETS2009: dataset and challenge. In: 12th IEEE International workshop on performance evaluation of tracking and surveillance, pp 1–6

  17. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, vol 28. pp 91–99

  18. Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2129–2137

  19. Okuma K, Taleghani A, De Freitas N, Little JJ, Lowe DG (2004) A boosted particle filter: multitarget detection and tracking. In: Proceedings of the European conference on computer vision, pp 28–39

  20. Ali S, Shah M (2008) Floor fields for tracking in high density crowd scenes. In: Proceedings of the European conference on computer vision, pp 1–14

  21. Izadinia H, Saleemi I, Li W, Shah M (2012) 2t: multiple people multiple parts tracker. In: Proceedings of the European conference on computer vision, pp 100–114

  22. Zhao C, Chen Y, Wei Z, Miao D, Gu X (2018) QRKISS: a two-stage metric learning via QR-decomposition and KISS for person re-identification. Neural Process Lett. https://doi.org/10.1007/s11063-018-9820-x

    Google Scholar 

  23. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Proceedings of the European conference on computer vision, pp 630–645

  24. Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Proceedings of Scandinavian conference on image analysis, pp 91–102

  25. Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 3539–3548

  26. Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 300–311

  27. McLaughlin N, del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1325–1334

  28. Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 4694–4703

  29. Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 3376–3385

  30. Wojke N, Bewley A (2018) Deep cosine metric learning for person re-identification. In: Proceeding of IEEE conference on applications of computer vision, pp 748–756

  31. Jin Y, Mokhtarian F (2007) Variational particle filter for multi-object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1–8

  32. Butt A, Collins R (2013) Multi-target tracking by Lagrangian relaxation to min-cost network flow. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1846–1853

  33. Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2730–2739

  34. Kim S, Kwak S, Feyereisl J, Han B (2012) Online multi-target tracking by large margin structured learning. In: Proceedings of the Asian conference on computer vision, pp 98–111

  35. Deshmukh AA, Dogan U, Scott C (2017) Multi-task learning for contextual bandits. In: Advances in neural information processing systems, vol 30. pp 4848–4856

  36. Long M, Cao Z, Wang J, Yu PS (2017) Learning multiple tasks with multilinear relationship networks. In: Advances in neural information processing systems, vol 30. pp 1594–1603

  37. Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Syst Man Cybern 45(4):767–779

    Google Scholar 

  38. Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: Proceedings of the European conference on computer vision workshops, pp 94–108

  39. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032

    Article  MathSciNet  MATH  Google Scholar 

  40. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  MATH  Google Scholar 

  41. Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. In: Proceedings of the IEEE conference on systems, man and cybernetics, pp 2103–2108

  42. Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2117–2125

  43. Ranzato M, Szlam A, Bruna J, Mathieu M, Collobert R, Chopra S (2014) Video (language) modeling: a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604

  44. Shi X, Chen Z, Wang H, Yeung D-Y (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, vol 28. pp 802–810

  45. Ballas N, Yao L, Pal C, Courville A (2016) Delving deeper into convolutional networks for learning video representations. In: Proceedings of the international conference on learning representations, pp 1–11

  46. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 4836–4845

  47. Li X, Wang K, Wang W, Li Y (2010) A multiple object tracking method using Kalman filter. In: Proceedings of the IEEE international conference on information and automation, pp 1862–1866

  48. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831

  49. Hu Y, Yi D, Liao S, Lei Z, Li S (2014) Cross dataset person re-identification. In: Proceedings of the Asian conference on computer vision, pp 650–664

  50. Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  51. Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online multi-target tracking with strong and weak detections. In: Proceedings of the European conference on computer vision workshops, pp 84–99

  52. Liu KC, Shen YT, Chen LG (2018) Simple online and realtime tracking with spherical panoramic camera. In: Proceedings of IEEE conference on consumer electronics, pp 1–6

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 61172141, U1611461), Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase, No. U1501501), Project on the Integration of Industry, Education and Research of Guangdong Province (No. 2013B090500013), Science and Technology Program of Guangzhou (Nos. 201803030029, 2014J4100092), and Major Projects for the Innovation of Industry and Research of Guangzhou (No. 2014Y2-00213).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huicheng Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ke, B., Zheng, H., Chen, L. et al. Multi-object Tracking by Joint Detection and Identification Learning. Neural Process Lett 50, 283–296 (2019). https://doi.org/10.1007/s11063-019-10046-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10046-4

Keywords

Navigation