Abstract
In this paper, we propose a generic boosting framework for multiple object tracking (MOT). Unlike other works tracking objects from zero, our framework uses their results (tracklets) and makes further optimizations. The motivation of us derives from the observation that most modern MOT trackers have been acceptable performance and can yield relatively reliable tracklets; accordingly, we straight focus on the tracklet-level re-identification, which is the most challenging issue in this case. To achieve that goal, we simultaneously utilize the techniques of single object tracking, tracking fragment (tracklets) and re-identification mechanism through casting them into a multi-label energy optimization and then innovatively solving it using the \(\alpha -\)expansion with label costs algorithm. All these techniques inspire recent MOT a lot to mitigate the occlusion problem, but to our knowledge, by far few works explore to reasonably combine them all like us. Furthermore, we introduce a spatial attention to improve the appearance model and a hierarchical clustering as post-process to progressively improve the tracking consistency. Finally, testing results on the most used benchmarks demonstrate the significant effectiveness and generality of our framework, and the importance of each contribution is also verified through ablative studies.




Similar content being viewed by others
Notes
Codes are available at: https://vision.cs.uwaterloo.ca/files/gco-v3.0.zip.
References
Ajaeiya GA, Elhajj IH, Chehab A, Kayssi AI, Kneppers M (2018) Mobile apps identification based on network flows. Knowl Inf Syst 55(3):771–796
Babaee M, Athar A, Rigoll G (2018) Multiple people tracking using hierarchical deep tracklet re-identification. arXiv preprint arXiv:1811.04091
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 941–951
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. Eurasip J Image and Video Process 1:1–10
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp 3464–3468
Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239
Chen J, Sheng H, Zhang Y, Xiong Z (2017) Enhancing detection model for multiple hypothesis tracking. In: CVPR workshops pp 2143–2152
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE international conference on multimedia and expo (ICME), pp 1–6
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE international conference on computer vision, pp 4846–4855
Chu Q, Ouyang W, Liu B, Zhu F, Yu N (2020) Dasot: A unified framework integrating data association and single object tracking for online multi-object tracking. In: AAAI 2020: the thirty-fourth AAAI conference on artificial intelligence, vol 34, pp 10672–10679
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th international conference on neural information processing systems, pp 379–387
Delong A, Osokin A, Isack HN, Boykov Y (2012) Fast approximate energy minimization with label costs. Int J Comput Vision 96(1):1–27
Felzenszwalb FP, Girshick BR, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell pp 1627–1645
Feng W, Hu Z, Wu W, Yan J, Ouyang W (2019) Multi-object tracking with multiple cues and switcher-aware classification. arXiv preprint arXiv:1901.06129
Feng W, Lan L, Luo Y, Yu Y, Zhang X, Luo Z (2020a) Near-online multi-pedestrian tracking via combining multiple consistent appearance cues. IEEE Trans Circuits Syst Video Technol pp 1–14
Feng W, Lan L, Zhang X, Luo Z (2020b) Learning sequence-to-sequence affinity metric for near-online multi-object tracking. Knowl Inf Syst 62(10):3911–3930
Fu Z, Angelini F, Chambers J, Naqvi MS (2019) Multi-level cooperative fusion of gm-phd filters for online multiple human tracking. IEEE Transactions on Multimedia pp 1–14
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
He K, Gkioxari G, Dollar P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: NIPS’15 proceedings of the 28th international conference on neural information processing systems, pp 2017–2025
Kim C, Li F, Rehg MJ (2018) Multi-object tracking with neural gating using bilinear lstm. In: ECCV pp 208–224
Lan L, Tao D, Gong C, Guan N, Luo Z (2016) Online multi-object tracking by quadratic pseudo-boolean optimization. In: IJCAI’16 proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 3396–3402
Lan L, Wang X, Hua G, Huang ST, Tao D (2020) Semi-online multi-people tracking by re-identification. Int J Comput Vis pp 1–19
Leal-Taixe L, Canton-Ferrer C, Schindler K (2016) Learning by tracking: Siamese cnn for robust target association. In: 2016 IEEE conference on computer vision and pattern recognition workshops (CVPRW), vol 1, pp 418–425
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: CVPR, pp 8971–8980
Li W, Zhao R, Xiao T, Wang X (2014) Deepreid: Deep filter pairing neural network for person re-identification. In: CVPR ’14 proceedings of the 2014 IEEE conference on computer vision and pattern recognition, pp 152–159
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944
Liu H, Cocea M (2019a) Granular computing-based approach of rule learning for binary classification. Granul Comput 4(2):275–283
Liu H, Cocea M (2019b) Nature-inspired framework of ensemble learning for collaborative classification in granular computing context. Granul Comput 4(4):715–724
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision pp 21–37
Long L, Wang X, Zhang S, Tao D, Wen G, Huang TS (2018) Interacting tracklets for multi-object tracking. IEEE Trans Image Process 27(9):4585–4597
Lukezic A, Vojir T, Zajc LC, Matas J, Kristan M (2017) Discriminative correlation filter with channel and spatial reliability. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4847–4856
Milan A, Leal-Taixé L, Reid ID, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831
Milan A, Rezatofighi HS, Dick A, Reid I, Schindler K (2016) Online multi-target tracking using recurrent neural networks. In: National conference on artificial intelligence, 4225–4232
Ren S, He K, Girshick BR, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell pp 1137–1149
Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: 2017 IEEE international conference on computer vision (ICCV), pp 300–311
Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2730–2739
Shen H, Huang L, Huang C, Xu W (2018) Tracklet association tracker: an end-to-end learning-based association approach for multi-object tracking. arXiv preprint arXiv:1808.01562
Soh CW, Njilla LL, Kwiat KK, Kamhoua CA (2020) Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach. Granul Comput 5(1):1–14
Son J, Baek M, Cho M, Han B (2017) Multi-object tracking with quadruplet convolutional neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3786–3795
Sun S, Akhtar N, Song H, Mian AS, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43:104–119
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV pp 501–518
Tang S, Andres B, Andriluka M, Schiele B (2016) Multi-person tracking by multicut and deep matching. In: European conference on computer vision, W11 benchmarking multi-target tracking: MOTChallenge, pp 100–111
Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3701–3710
Wang S, Zeng Y, Liu X, Zhu E, Yin J, Xu C, Kloft M (2019) Effective end-to-end unsupervised outlier detection via inlier priority of discriminative network. In: NeurIPS pp 5960–5973
Wu J, Hong Z, Pan S, Zhu X, Cai Z, Zhang C (2016) Multi-graph-view subgraph mining for graph classification. Knowl Inf Syst 48(1):29–54
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: CVPR pp 2129–2137
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: 2015 IEEE international conference on computer vision (ICCV), pp 1116–1124
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang MH (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 379–396
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (No. 61906210).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liang, T., Lan, L., Zhang, X. et al. A generic MOT boosting framework by combining cues from SOT, tracklet and re-identification. Knowl Inf Syst 63, 2109–2127 (2021). https://doi.org/10.1007/s10115-021-01576-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01576-2