Abstract
In the past several years, various visual object tracking benchmarks have been proposed, and some of them have been used widely in numerous recently proposed trackers. However, most of the discussions focus on the overall performance, and cannot describe the strengths and weaknesses of the trackers in detail. Meanwhile, several benchmark measures that are often used in tests lack convincing interpretation. In this paper, 12 frame-wise visual attributes that reflect different aspects of the characteristics of image sequences are collated, and a normalized quantitative formulaic definition has been given to each of them for the first time. Based on these definitions, we propose two novel test methodologies, a correlation-based test and a weight-based test, which can provide a more intuitive and easier demonstration of the trackers’ performance for each aspect. Then these methods have been applied to the raw results from one of the most famous tracking challenges, the Video Object Tracking (VOT) Challenge 2017. From the tests, most trackers did not perform well when the size of the target changed rapidly or intensely, and even the advanced deep learning based trackers did not perfectly solve the problem. The scale of the targets was not considered in the calculation of the center location error; however, in a practical test, the center location error is still sensitive to the targets’ changes in size.
Similar content being viewed by others
References
Babenko B, Yang MH, Belongie S, 2011. Robust object tracking with online multiple instance learning. IEEE Trans Patt Anal Mach Intell, 33(8): 1619–1632. https://doi.org/10.1109/TPAMI.2010.226
Bao CL, Wu Y, Ling HB, et al., 2012. Real time robust L1 tracker using accelerated proximal gradient approach. IEEE Conf on Computer Vision and Pattern Recognition, p.1830–1837. https://doi.org/10.1109/CVPR.2012.6247881
Battistone F, Petrosino A, Santopietro V, 2018. Watch out: embedded video tracking with BST for unmanned aerial vehicles. J Signal Process Syst, 90(6): 891–900. https://doi.org/10.1007/s11265-017-1279-x
Bertinetto L, Valmadre J, Golodetz S, et al., 2016. Staple: complementary learners for real-time tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.1401–1409. https://doi.org/10.1109/CVPR.2016.156
Cehovin L, Kristan M, Leonardis A, 2011. An adaptive coupled-layer visual model for robust visual tracking. IEEE Int Conf on Computer Vision, p.1363–1370. https://doi.org/10.1109/ICCV.2011.6126390
Cehovin L, Leonardis A, Kristan M, 2016a. Visual object tracking performance measures revisited. IEEE Trans Image Process, 25(3): 1261–1274. https://doi.org/10.1109/TIP.2016.2520370
Cehovin L, Leonardis A, Kristan M, 2016b. Robust visual tracking using template anchors. IEEE Winter Conf on Applications of Computer Vision, p.1–8. https://doi.org/10.1109/WACV.2016.7477570
Chen K, Tao WB, 2018. Convolutional regression for visual tracking. IEEE Trans Image Process, 27(7): 3611–3620. https://doi.org/10.1109/TIP.2018.2819362
Collins R, Zhou XH, Teh SK, 2005. An open source tracking testbed and evaluation web site. Proc IEEE Int Workshop on Performance Evaluation of Tracking and Surveillance, p.17–24.
Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Conf on Computer Vision and Pattern Recognition, p.886–893. https://doi.org/10.1109/CVPR.2005.177
Danelljan M, Häger G, Khan F, et al., 2014. Accurate scale estimation for robust visual tracking. Proc British Machine Vision Conf, p.1–11. https://doi.org/10.5244/C28.65
Danelljan M, Häger G, Khan FS, et al., 2015a. Convolutional features for correlation filter based visual tracking. Proc IEEE Int Conf on Computer Vision Workshops, p.621–629. https://doi.org/10.1109/ICCVW.2015.84
Danelljan M, Häger G, Khan FS, et al., 2015b. Learning spatially regularized correlation filters for visual tracking. IEEE Int Conf on Computer Vision, p.4310–4318. https://doi.org/10.1109/ICCV.2015.490
Danelljan M, Robinson A, Khan FS, et al., 2016. Beyond correlation filters: learning continuous convolution operators for visual tracking. 14th European Conf on Computer Vision, p.472–488. https://doi.org/10.1007/978-3-319-46454-1_29
Danelljan M, Bhat G, Khan FS, et al., 2017. ECO: efficient convolution operators for tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.6931–6939. https://doi.org/10.1109/CVPR.2017.733
Fischler MA, Bolles RC, 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM, 24(6): 381–395. https://doi.org/10.1145/358669.358692
Funt B, Barnard K, Martin L, 1998. Is machine colour constancy good enough? 5th European Conf on Computer Vision, p.445–459. https://doi.org/10.1007/BFb0055683
Galoogahi HK, Fagg A, Huang C, et al., 2017. Need for speed: a benchmark for higher frame rate object tracking. IEEE Int Conf on Computer Vision, p.1134–1143. https://doi.org/10.1109/ICCV.2017.128
Gao SB, Yang KF, Li CY, et al., 2015. Color constancy using double-opponency. IEEE Trans Patt Anal Mach Intell, 37(10): 1973–1985. https://doi.org/10.1109/TPAMI.2015.2396053
Gundogdu E, Alatan AA, 2018. Good features to correlate for visual tracking. IEEE Trans Image Process, 27(5): 2526–2540. https://doi.org/10.1109/TIP.2018.2806280
Hare S, Saffari A, Torr PHS, 2011. Struck: structured output tracking with kernels. Int Conf on Computer Vision, p.263–270. https://doi.org/10.1109/ICCV.2011.6126251
Hare S, Golodetz S, Saffari A, et al., 2016. Struck: structured output tracking with kernels. IEEE Trans Patt Anal Mach Intell, 38(10): 2096–2109. https://doi.org/10.1109/TPAMI.2015.2509974
He Z, Fan Y, Zhuang J, et al., 2017. Correlation filters with weighted convolution responses. IEEE Int Conf on Computer Vision Workshop, p.1992–2000.
Henriques JF, Caseiro R, Martins P, et al., 2015. High-speed tracking with kernelized correlation filters. IEEE Trans Patt Anal Mach Intell, 37(3): 583–596. https://doi.org/10.1109/TPAMI.2014.2345390
Karasulu B, Korukoglu S, 2011. A software for performance evaluation and comparison of people detection and tracking methods in video processing. Multim Tools Appl, 55(3): 677–723. https://doi.org/10.1007/s11042-010-0591-2
Kristan M, Pers J, Perše M, et al., 2006. A Bayes-spectralentropy-based measure of camera focus using a discrete cosine transform. Patt Recogn Lett, 27(13): 1431–1439. https://doi.org/10.1016/j.patrec.2006.01.016
Kristan M, Pflugfelder R, Leonardis A, et al., 2013. The Visual Object Tracking VOT2013 Challenge results. IEEE Int Conf on Computer Vision Workshops, p.98–111. https://doi.org/10.1109/ICCVW.2013.20
Kristan M, Pflugfelder R, Leonardis A, et al., 2015a. The Visual Object Tracking VOT2014 Challenge results. European Conf on Computer Vision, p.191–217. https://doi.org/10.1007/978-3-319-16181-5_14
Kristan M, Matas J, Leonardis A, et al., 2015b. The Visual Object Tracking VOT2015 Challenge results. IEEE Int Conf on Computer Vision Workshop, p.564–586. https://doi.org/10.1109/ICCVW.2015.79
Kristan M, Matas J, Leonardis A, et al., 2016a. A novel performance evaluation methodology for single-target trackers. IEEE Trans Patt Anal Mach Intell, 38(11): 2137–2155. https://doi.org/10.1109/TPAMI.2016.2516982
Kristan M, Leonardis A, Matas J, et al., 2016b. The Visual Object Tracking VOT2016 Challenge results. European Conf on Computer Vision, p.777–823. https://doi.org/10.1007/978-3-319-48881-3_54
Kristan M, Leonardis A, Matas J, et al., 2017. The Visual Object Tracking VOT2017 Challenge results. IEEE Int Conf on Computer Vision Workshops, p.1949–1972. https://doi.org/10.1109/ICCVW.2017.230
Kwon J, Lee KM, 2008. Tracking of abrupt motion using Wang-Landau Monte Carlo estimation. 10th European Conf on Computer Vision, p.387–400. https://doi.org/10.1007/978-3-540-88682-2_30
Li AN, Lin M, Wu Y, et al., 2016. NUS-PRO: a new visual tracking challenge. IEEE Trans Patt Anal Mach Intell, 38(2): 335–349. https://doi.org/10.1109/TPAMI.2015.2417577
Li B, Wu W, Wang Q, et al., 2019. SiamRPN++: evolution of Siamese visual tracking with very deep networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4282–4291.
Li SY, Yeung DY, 2017. Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. Proc 31st AAAI Conf on Artificial Intelligence, p.4140–4146.
Lowe DG, 1999. Object recognition from local scale-invariant features. Proc 7th IEEE Int Conf on Computer Vision, p.150–1157. https://doi.org/10.1109/ICCV.1999.790410
Lukežič A, Vojír T, Zajc LC, et al., 2017. Discriminative correlation filter with channel and spatial reliability. IEEE Conf on Computer Vision and Pattern Recognition, p.4847–4856. https://doi.org/10.1109/CVPR.2017.515
Lukežič A, Zajc Lč, Kristan M, 2018. Deformable parts correlation filters for robust visual tracking. IEEE Trans Cybern, 48(6): 1849–1861. https://doi.org/10.1109/TCYB.2017.2716101
Mathew R, Hiremath SS, 2018. Control of velocity-constrained stepper motor-driven Hilare robot for way-point navigation. Engineering, 4(4): 491–499. https://doi.org/10.1016/j.eng.2018.07.013
Mocanu B, Tapu R, Zaharia T, 2017. Single object tracking using offline trained deep regression networks. 7th Int Conf on Image Processing Theory, Tools and Applications, p.1–6. https://doi.org/10.1109/IPTA.2017.8310091
Nebehay G, Pflugfelder R, 2015. Clustering of static-adaptive correspondences for deformable object tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.2784–2791. https://doi.org/10.1109/CVPR.2015.7298895
Ross DA, Lim J, Lin RS, et al., 2008. Incremental learning for robust visual tracking. Int J Comput Vis, 77(1-3): 125–141. https://doi.org/10.1007/s11263-007-0075-7
Senna P, Drummond IN, Bastos GS, 2017. Real-time ensemble-based tracker with Kalman filter. 30th SIB-GRAPI Conf on Graphics, Patterns and Images, p.338–344. https://doi.org/10.1109/SIBGRAPI.2017.51
Smeulders AWM, Chu DM, Cucchiara R, et al., 2014. Visual tracking: an experimental survey. IEEE Trans Patt Anal Mach Intell, 36(7): 1442–1468. https://doi.org/10.1109/TPAMI.2013.230
Sun C, Wang D, Lu HC, et al., 2018. Learning spatial-aware regressions for visual tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8962–8970. https://doi.org/10.1109/CVPR.2018.00934
Tran A, Manzanera A, 2017. Mixing Hough and color histogram models for accurate real-time object tracking. 17th Int Conf on Computer Analysis of Images and Patterns, p.43–54. https://doi.org/10.1007/978-3-319-64689-3_4
Valmadre J, Bertinetto L, Henriques J, et al., 2017. End-to-end representation learning for correlation filter based tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.5000–5008. https://doi.org/10.1109/CVPR.2017.531
Vojíř T, Matas J, 2014. The enhanced flock of trackers. In: Cipolla R, Battiato S, Farinella GM (Eds.), Registration and Recognition in Images and Videos. Springer, Berlin, p.113–136. https://doi.org/10.1007/978-3-642-44907-9_6
Vojíř T, Noskova J, Matas J, 2014. Robust scale-adaptive mean-shift for tracking. Patt Recogn Lett, 49: 250–258. https://doi.org/10.1016/j.patrec.2014.03.025
Wu Y, Lim J, Yang MH, 2013. Online object tracking: a benchmark. IEEE Conf on Computer Vision and Pattern Recognition, p.2411–2418. https://doi.org/10.1109/CVPR.2013.312
Wu Y, Lim J, Yang MH, 2015. Object tracking benchmark. IEEE Trans Patt Anal Mach Intell, 37(9): 1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Yang LX, Liu RS, Zhang D, et al., 2017. Deep location-specific tracking. Proc 25th ACM Int Conf on Multimedia, p.1309–1317. https://doi.org/10.1145/3123266.3123381
Yilmaz A, Javed O, Shah M, 2006. Object tracking: a survey. ACM Comput Surv, 38(4): 13. https://doi.org/10.1145/1177352.1177355
Young DP, Ferryman JM, 2005. PETS metrics: on-line performance evaluation service. IEEE Int Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, p.317–324. https://doi.org/10.1109/VSPETS.2005.1570931
Zhang JC, Peng YX, 2019. Object-aware aggregation with bidirectional temporal graph for video captioning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.8327–8336.
Zhang JM, Ma SG, Sclaroff S, 2014. MEEM: robust tracking via multiple experts using entropy minimization. 13th European Conf on Computer Vision, p.188–203. https://doi.org/10.1007/978-3-319-10599-4_13
Zhang RF, Deng T, Wang GH, et al., 2017. A robust object tracking framework based on a reliable point assignment algorithm. Front Inform Technol Electron Eng, 18(4): 545–558. https://doi.org/10.1631/FITEE.1601464
Zhang TZ, Liu S, Xu CS, et al., 2018. Correlation particle filter for visual tracking. IEEE Trans Image Process, 27(6): 2676–2687. https://doi.org/10.1109/TIP.2017.2781304
Zuo WM, Wu XH, Lin L, et al., 2019. Learning support correlation filters for visual tracking. IEEE Trans Patt Anal Mach Intell, 41(5): 1158–1172. https://doi.org/10.1109/TPAMI.2018.2829180
Author information
Authors and Affiliations
Contributions
Gong-liang LIU designed the research. Chang LIU processed the data. Chang LIU and Wen-jing KANG drafted the manuscript. Wen-jing KANG and Gong-liang LIU revised and edited the final version.
Corresponding author
Additional information
Compliance with ethics guidelines
Wen-jing KANG, Chang LIU, and Gong-liang LIU declare that they have no conflict of interest.
Project supported by the National Natural Science Foundation of China (No. 61501139) and the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology, Weihai (No. 2019KYCXJJYB06)
Rights and permissions
About this article
Cite this article
Kang, Wj., Liu, C. & Liu, Gl. A quantitative attribute-based benchmark methodology for single-target visual tracking. Front Inform Technol Electron Eng 21, 405–421 (2020). https://doi.org/10.1631/FITEE.1900245
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1900245