Abstract
Multi-object tracking (MOT) is closely related to video-based object detection and target re-identification. In recent years, with the representation power brought by deep learning, the majority of state-of-the-art methods on object detection and re-identification are based on deep neural networks. However, it is still an open problem to improve the performance of MOT in real challenging scenes. Specifically, recent MOT algorithms have not been optimized together with object detection, which hinders the performance of tracking. Inspired by recent progress on object detection and recognition, we propose a MOT method via joint learning on detection and identification by using existing MOT datasets without external training data. We further introduce a feature enhancement module based on the ConvGRU structure, which helps to deal with deterioration of image quality in video object detection and re-identification, such as motion blur and camera losing focus. Experimental results show that the proposed method achieves competitive performance compared with state-of-the-art methods in video-based object detection, cross-dataset person re-identification, and multi-object tracking.
Similar content being viewed by others
References
Benfold B, Reid I (2011) Stable multi-target tracking in real-time surveillance video. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 3457–3464
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):13
Yang B, Huang C, Nevatia R (2011) Learning affinities and dependencies for multi-target tracking using a CRF model. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1233–1240
Koller D, Weber J, Malik J (1994) Robust multiple car tracking with occlusion reasoning. In: Proceedings of European conference on computer vision, pp 189–196
Lu WL, Ting JA, Little JJ, Murphy KP (2013) Learning to track and identify players from broadcast sports videos. IEEE Trans Pattern Anal Mach Intell 35(7):1704–1716
Rafael M, Juan M, Ortiz-de-Lazcano-Lobato E, Lpez-Rubio E, Domnguez E, Palomo E (2013) A competitive neural network for multiple object tracking in video sequence analysis. Neural Process Lett 37(1):47–67
Piccardi M (2004) Background subtraction techniques: a review. In: Proceedings of the IEEE conference on systems, man and cybernetics, pp 3099–3104
Andriluka M, Roth S, Schiele B (2008) People tracking by detection and people detection by tracking. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1–8
Liu X, Tao D, Song M, Zhang L, Bu J, Chen C (2015) Learning to track multiple targets. IEEE Trans Neural Netw Learn Syst 26(5):1060–1073
Hong C, Li N, Song M, Bu J, Chen C (2010) A Level-set based tracking approach for surveillance video with fusion and occlusion, Pacific-Rim symposium on image and video technology (PSIVT2010), pp 156–161
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE international conference on image processing, pp 3464–3468
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) POI: multiple object tracking with high performance detection and appearance feature. In: Proceedings of the European conference on computer vision workshops, pp 36–42
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8
Zuo X, Shen J, Yu H, Xu D, Qian C, Shan Y (2017) Fast pedestrian detection based on the selective window differential filter. Neural Process Lett 48(1):403–417
Ferryman J, Shahrokni A (2009) PETS2009: dataset and challenge. In: 12th IEEE International workshop on performance evaluation of tracking and surveillance, pp 1–6
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, vol 28. pp 91–99
Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2129–2137
Okuma K, Taleghani A, De Freitas N, Little JJ, Lowe DG (2004) A boosted particle filter: multitarget detection and tracking. In: Proceedings of the European conference on computer vision, pp 28–39
Ali S, Shah M (2008) Floor fields for tracking in high density crowd scenes. In: Proceedings of the European conference on computer vision, pp 1–14
Izadinia H, Saleemi I, Li W, Shah M (2012) 2t: multiple people multiple parts tracker. In: Proceedings of the European conference on computer vision, pp 100–114
Zhao C, Chen Y, Wei Z, Miao D, Gu X (2018) QRKISS: a two-stage metric learning via QR-decomposition and KISS for person re-identification. Neural Process Lett. https://doi.org/10.1007/s11063-018-9820-x
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Proceedings of the European conference on computer vision, pp 630–645
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Proceedings of Scandinavian conference on image analysis, pp 91–102
Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 3539–3548
Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 300–311
McLaughlin N, del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1325–1334
Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 4694–4703
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 3376–3385
Wojke N, Bewley A (2018) Deep cosine metric learning for person re-identification. In: Proceeding of IEEE conference on applications of computer vision, pp 748–756
Jin Y, Mokhtarian F (2007) Variational particle filter for multi-object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1–8
Butt A, Collins R (2013) Multi-target tracking by Lagrangian relaxation to min-cost network flow. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 1846–1853
Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2730–2739
Kim S, Kwak S, Feyereisl J, Han B (2012) Online multi-target tracking by large margin structured learning. In: Proceedings of the Asian conference on computer vision, pp 98–111
Deshmukh AA, Dogan U, Scott C (2017) Multi-task learning for contextual bandits. In: Advances in neural information processing systems, vol 30. pp 4848–4856
Long M, Cao Z, Wang J, Yu PS (2017) Learning multiple tasks with multilinear relationship networks. In: Advances in neural information processing systems, vol 30. pp 1594–1603
Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Syst Man Cybern 45(4):767–779
Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: Proceedings of the European conference on computer vision workshops, pp 94–108
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. In: Proceedings of the IEEE conference on systems, man and cybernetics, pp 2103–2108
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2117–2125
Ranzato M, Szlam A, Bruna J, Mathieu M, Collobert R, Chopra S (2014) Video (language) modeling: a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604
Shi X, Chen Z, Wang H, Yeung D-Y (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, vol 28. pp 802–810
Ballas N, Yao L, Pal C, Courville A (2016) Delving deeper into convolutional networks for learning video representations. In: Proceedings of the international conference on learning representations, pp 1–11
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 4836–4845
Li X, Wang K, Wang W, Li Y (2010) A multiple object tracking method using Kalman filter. In: Proceedings of the IEEE international conference on information and automation, pp 1862–1866
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831
Hu Y, Yi D, Liao S, Lei Z, Li S (2014) Cross dataset person re-identification. In: Proceedings of the Asian conference on computer vision, pp 650–664
Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online multi-target tracking with strong and weak detections. In: Proceedings of the European conference on computer vision workshops, pp 84–99
Liu KC, Shen YT, Chen LG (2018) Simple online and realtime tracking with spherical panoramic camera. In: Proceedings of IEEE conference on consumer electronics, pp 1–6
Acknowledgements
This work was supported by National Natural Science Foundation of China (Nos. 61172141, U1611461), Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase, No. U1501501), Project on the Integration of Industry, Education and Research of Guangdong Province (No. 2013B090500013), Science and Technology Program of Guangzhou (Nos. 201803030029, 2014J4100092), and Major Projects for the Innovation of Industry and Research of Guangzhou (No. 2014Y2-00213).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ke, B., Zheng, H., Chen, L. et al. Multi-object Tracking by Joint Detection and Identification Learning. Neural Process Lett 50, 283–296 (2019). https://doi.org/10.1007/s11063-019-10046-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10046-4