A quantitative attribute-based benchmark methodology for single-target visual tracking

Kang, Wen-jing; Liu, Chang; Liu, Gong-liang

doi:10.1631/FITEE.1900245

A quantitative attribute-based benchmark methodology for single-target visual tracking

Published: 01 April 2020

Volume 21, pages 405–421, (2020)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

102 Accesses
2 Citations
Explore all metrics

Abstract

In the past several years, various visual object tracking benchmarks have been proposed, and some of them have been used widely in numerous recently proposed trackers. However, most of the discussions focus on the overall performance, and cannot describe the strengths and weaknesses of the trackers in detail. Meanwhile, several benchmark measures that are often used in tests lack convincing interpretation. In this paper, 12 frame-wise visual attributes that reflect different aspects of the characteristics of image sequences are collated, and a normalized quantitative formulaic definition has been given to each of them for the first time. Based on these definitions, we propose two novel test methodologies, a correlation-based test and a weight-based test, which can provide a more intuitive and easier demonstration of the trackers’ performance for each aspect. Then these methods have been applied to the raw results from one of the most famous tracking challenges, the Video Object Tracking (VOT) Challenge 2017. From the tests, most trackers did not perform well when the size of the target changed rapidly or intensely, and even the advanced deep learning based trackers did not perfectly solve the problem. The scale of the targets was not considered in the calculation of the center location error; however, in a practical test, the center location error is still sensitive to the targets’ changes in size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Meng-Hao Guo, Tian-Xing Xu, … Shi-Min Hu

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Jiayi Ma, Xingyu Jiang, … Junchi Yan

References

Babenko B, Yang MH, Belongie S, 2011. Robust object tracking with online multiple instance learning. IEEE Trans Patt Anal Mach Intell, 33(8): 1619–1632. https://doi.org/10.1109/TPAMI.2010.226
Article Google Scholar
Bao CL, Wu Y, Ling HB, et al., 2012. Real time robust L1 tracker using accelerated proximal gradient approach. IEEE Conf on Computer Vision and Pattern Recognition, p.1830–1837. https://doi.org/10.1109/CVPR.2012.6247881
Battistone F, Petrosino A, Santopietro V, 2018. Watch out: embedded video tracking with BST for unmanned aerial vehicles. J Signal Process Syst, 90(6): 891–900. https://doi.org/10.1007/s11265-017-1279-x
Article Google Scholar
Bertinetto L, Valmadre J, Golodetz S, et al., 2016. Staple: complementary learners for real-time tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.1401–1409. https://doi.org/10.1109/CVPR.2016.156
Cehovin L, Kristan M, Leonardis A, 2011. An adaptive coupled-layer visual model for robust visual tracking. IEEE Int Conf on Computer Vision, p.1363–1370. https://doi.org/10.1109/ICCV.2011.6126390
Cehovin L, Leonardis A, Kristan M, 2016a. Visual object tracking performance measures revisited. IEEE Trans Image Process, 25(3): 1261–1274. https://doi.org/10.1109/TIP.2016.2520370
MathSciNet MATH Google Scholar
Cehovin L, Leonardis A, Kristan M, 2016b. Robust visual tracking using template anchors. IEEE Winter Conf on Applications of Computer Vision, p.1–8. https://doi.org/10.1109/WACV.2016.7477570
Chen K, Tao WB, 2018. Convolutional regression for visual tracking. IEEE Trans Image Process, 27(7): 3611–3620. https://doi.org/10.1109/TIP.2018.2819362
Article MathSciNet Google Scholar
Collins R, Zhou XH, Teh SK, 2005. An open source tracking testbed and evaluation web site. Proc IEEE Int Workshop on Performance Evaluation of Tracking and Surveillance, p.17–24.
Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Conf on Computer Vision and Pattern Recognition, p.886–893. https://doi.org/10.1109/CVPR.2005.177
Danelljan M, Häger G, Khan F, et al., 2014. Accurate scale estimation for robust visual tracking. Proc British Machine Vision Conf, p.1–11. https://doi.org/10.5244/C28.65
Danelljan M, Häger G, Khan FS, et al., 2015a. Convolutional features for correlation filter based visual tracking. Proc IEEE Int Conf on Computer Vision Workshops, p.621–629. https://doi.org/10.1109/ICCVW.2015.84
Danelljan M, Häger G, Khan FS, et al., 2015b. Learning spatially regularized correlation filters for visual tracking. IEEE Int Conf on Computer Vision, p.4310–4318. https://doi.org/10.1109/ICCV.2015.490
Danelljan M, Robinson A, Khan FS, et al., 2016. Beyond correlation filters: learning continuous convolution operators for visual tracking. 14^th European Conf on Computer Vision, p.472–488. https://doi.org/10.1007/978-3-319-46454-1_29
Danelljan M, Bhat G, Khan FS, et al., 2017. ECO: efficient convolution operators for tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.6931–6939. https://doi.org/10.1109/CVPR.2017.733
Fischler MA, Bolles RC, 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM, 24(6): 381–395. https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Funt B, Barnard K, Martin L, 1998. Is machine colour constancy good enough? 5^th European Conf on Computer Vision, p.445–459. https://doi.org/10.1007/BFb0055683
Galoogahi HK, Fagg A, Huang C, et al., 2017. Need for speed: a benchmark for higher frame rate object tracking. IEEE Int Conf on Computer Vision, p.1134–1143. https://doi.org/10.1109/ICCV.2017.128
Gao SB, Yang KF, Li CY, et al., 2015. Color constancy using double-opponency. IEEE Trans Patt Anal Mach Intell, 37(10): 1973–1985. https://doi.org/10.1109/TPAMI.2015.2396053
Article Google Scholar
Gundogdu E, Alatan AA, 2018. Good features to correlate for visual tracking. IEEE Trans Image Process, 27(5): 2526–2540. https://doi.org/10.1109/TIP.2018.2806280
Article MathSciNet Google Scholar
Hare S, Saffari A, Torr PHS, 2011. Struck: structured output tracking with kernels. Int Conf on Computer Vision, p.263–270. https://doi.org/10.1109/ICCV.2011.6126251
Hare S, Golodetz S, Saffari A, et al., 2016. Struck: structured output tracking with kernels. IEEE Trans Patt Anal Mach Intell, 38(10): 2096–2109. https://doi.org/10.1109/TPAMI.2015.2509974
Article Google Scholar
He Z, Fan Y, Zhuang J, et al., 2017. Correlation filters with weighted convolution responses. IEEE Int Conf on Computer Vision Workshop, p.1992–2000.
Henriques JF, Caseiro R, Martins P, et al., 2015. High-speed tracking with kernelized correlation filters. IEEE Trans Patt Anal Mach Intell, 37(3): 583–596. https://doi.org/10.1109/TPAMI.2014.2345390
Article Google Scholar
Karasulu B, Korukoglu S, 2011. A software for performance evaluation and comparison of people detection and tracking methods in video processing. Multim Tools Appl, 55(3): 677–723. https://doi.org/10.1007/s11042-010-0591-2
Article Google Scholar
Kristan M, Pers J, Perše M, et al., 2006. A Bayes-spectralentropy-based measure of camera focus using a discrete cosine transform. Patt Recogn Lett, 27(13): 1431–1439. https://doi.org/10.1016/j.patrec.2006.01.016
Article Google Scholar
Kristan M, Pflugfelder R, Leonardis A, et al., 2013. The Visual Object Tracking VOT2013 Challenge results. IEEE Int Conf on Computer Vision Workshops, p.98–111. https://doi.org/10.1109/ICCVW.2013.20
Kristan M, Pflugfelder R, Leonardis A, et al., 2015a. The Visual Object Tracking VOT2014 Challenge results. European Conf on Computer Vision, p.191–217. https://doi.org/10.1007/978-3-319-16181-5_14
Kristan M, Matas J, Leonardis A, et al., 2015b. The Visual Object Tracking VOT2015 Challenge results. IEEE Int Conf on Computer Vision Workshop, p.564–586. https://doi.org/10.1109/ICCVW.2015.79
Kristan M, Matas J, Leonardis A, et al., 2016a. A novel performance evaluation methodology for single-target trackers. IEEE Trans Patt Anal Mach Intell, 38(11): 2137–2155. https://doi.org/10.1109/TPAMI.2016.2516982
Article Google Scholar
Kristan M, Leonardis A, Matas J, et al., 2016b. The Visual Object Tracking VOT2016 Challenge results. European Conf on Computer Vision, p.777–823. https://doi.org/10.1007/978-3-319-48881-3_54
Kristan M, Leonardis A, Matas J, et al., 2017. The Visual Object Tracking VOT2017 Challenge results. IEEE Int Conf on Computer Vision Workshops, p.1949–1972. https://doi.org/10.1109/ICCVW.2017.230
Kwon J, Lee KM, 2008. Tracking of abrupt motion using Wang-Landau Monte Carlo estimation. 10^th European Conf on Computer Vision, p.387–400. https://doi.org/10.1007/978-3-540-88682-2_30
Li AN, Lin M, Wu Y, et al., 2016. NUS-PRO: a new visual tracking challenge. IEEE Trans Patt Anal Mach Intell, 38(2): 335–349. https://doi.org/10.1109/TPAMI.2015.2417577
Article Google Scholar
Li B, Wu W, Wang Q, et al., 2019. SiamRPN++: evolution of Siamese visual tracking with very deep networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4282–4291.
Li SY, Yeung DY, 2017. Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. Proc 31^st AAAI Conf on Artificial Intelligence, p.4140–4146.
Lowe DG, 1999. Object recognition from local scale-invariant features. Proc 7^th IEEE Int Conf on Computer Vision, p.150–1157. https://doi.org/10.1109/ICCV.1999.790410
Lukežič A, Vojír T, Zajc LC, et al., 2017. Discriminative correlation filter with channel and spatial reliability. IEEE Conf on Computer Vision and Pattern Recognition, p.4847–4856. https://doi.org/10.1109/CVPR.2017.515
Lukežič A, Zajc Lč, Kristan M, 2018. Deformable parts correlation filters for robust visual tracking. IEEE Trans Cybern, 48(6): 1849–1861. https://doi.org/10.1109/TCYB.2017.2716101
Article Google Scholar
Mathew R, Hiremath SS, 2018. Control of velocity-constrained stepper motor-driven Hilare robot for way-point navigation. Engineering, 4(4): 491–499. https://doi.org/10.1016/j.eng.2018.07.013
Article Google Scholar
Mocanu B, Tapu R, Zaharia T, 2017. Single object tracking using offline trained deep regression networks. 7^th Int Conf on Image Processing Theory, Tools and Applications, p.1–6. https://doi.org/10.1109/IPTA.2017.8310091
Nebehay G, Pflugfelder R, 2015. Clustering of static-adaptive correspondences for deformable object tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.2784–2791. https://doi.org/10.1109/CVPR.2015.7298895
Ross DA, Lim J, Lin RS, et al., 2008. Incremental learning for robust visual tracking. Int J Comput Vis, 77(1-3): 125–141. https://doi.org/10.1007/s11263-007-0075-7
Article Google Scholar
Senna P, Drummond IN, Bastos GS, 2017. Real-time ensemble-based tracker with Kalman filter. 30^th SIB-GRAPI Conf on Graphics, Patterns and Images, p.338–344. https://doi.org/10.1109/SIBGRAPI.2017.51
Smeulders AWM, Chu DM, Cucchiara R, et al., 2014. Visual tracking: an experimental survey. IEEE Trans Patt Anal Mach Intell, 36(7): 1442–1468. https://doi.org/10.1109/TPAMI.2013.230
Article Google Scholar
Sun C, Wang D, Lu HC, et al., 2018. Learning spatial-aware regressions for visual tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8962–8970. https://doi.org/10.1109/CVPR.2018.00934
Tran A, Manzanera A, 2017. Mixing Hough and color histogram models for accurate real-time object tracking. 17^th Int Conf on Computer Analysis of Images and Patterns, p.43–54. https://doi.org/10.1007/978-3-319-64689-3_4
Valmadre J, Bertinetto L, Henriques J, et al., 2017. End-to-end representation learning for correlation filter based tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.5000–5008. https://doi.org/10.1109/CVPR.2017.531
Vojíř T, Matas J, 2014. The enhanced flock of trackers. In: Cipolla R, Battiato S, Farinella GM (Eds.), Registration and Recognition in Images and Videos. Springer, Berlin, p.113–136. https://doi.org/10.1007/978-3-642-44907-9_6
Chapter Google Scholar
Vojíř T, Noskova J, Matas J, 2014. Robust scale-adaptive mean-shift for tracking. Patt Recogn Lett, 49: 250–258. https://doi.org/10.1016/j.patrec.2014.03.025
Article Google Scholar
Wu Y, Lim J, Yang MH, 2013. Online object tracking: a benchmark. IEEE Conf on Computer Vision and Pattern Recognition, p.2411–2418. https://doi.org/10.1109/CVPR.2013.312
Wu Y, Lim J, Yang MH, 2015. Object tracking benchmark. IEEE Trans Patt Anal Mach Intell, 37(9): 1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Article Google Scholar
Yang LX, Liu RS, Zhang D, et al., 2017. Deep location-specific tracking. Proc 25^th ACM Int Conf on Multimedia, p.1309–1317. https://doi.org/10.1145/3123266.3123381
Yilmaz A, Javed O, Shah M, 2006. Object tracking: a survey. ACM Comput Surv, 38(4): 13. https://doi.org/10.1145/1177352.1177355
Article Google Scholar
Young DP, Ferryman JM, 2005. PETS metrics: on-line performance evaluation service. IEEE Int Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, p.317–324. https://doi.org/10.1109/VSPETS.2005.1570931
Zhang JC, Peng YX, 2019. Object-aware aggregation with bidirectional temporal graph for video captioning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.8327–8336.
Zhang JM, Ma SG, Sclaroff S, 2014. MEEM: robust tracking via multiple experts using entropy minimization. 13^th European Conf on Computer Vision, p.188–203. https://doi.org/10.1007/978-3-319-10599-4_13
Zhang RF, Deng T, Wang GH, et al., 2017. A robust object tracking framework based on a reliable point assignment algorithm. Front Inform Technol Electron Eng, 18(4): 545–558. https://doi.org/10.1631/FITEE.1601464
Article Google Scholar
Zhang TZ, Liu S, Xu CS, et al., 2018. Correlation particle filter for visual tracking. IEEE Trans Image Process, 27(6): 2676–2687. https://doi.org/10.1109/TIP.2017.2781304
Article MathSciNet Google Scholar
Zuo WM, Wu XH, Lin L, et al., 2019. Learning support correlation filters for visual tracking. IEEE Trans Patt Anal Mach Intell, 41(5): 1158–1172. https://doi.org/10.1109/TPAMI.2018.2829180
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Harbin Institute of Technology, Weihai, 264209, China
Wen-jing Kang, Chang Liu & Gong-liang Liu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
Chang Liu

Authors

Wen-jing Kang
View author publications
You can also search for this author in PubMed Google Scholar
Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gong-liang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Gong-liang LIU designed the research. Chang LIU processed the data. Chang LIU and Wen-jing KANG drafted the manuscript. Wen-jing KANG and Gong-liang LIU revised and edited the final version.

Corresponding author

Correspondence to Gong-liang Liu.

Additional information

Compliance with ethics guidelines

Wen-jing KANG, Chang LIU, and Gong-liang LIU declare that they have no conflict of interest.

Project supported by the National Natural Science Foundation of China (No. 61501139) and the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology, Weihai (No. 2019KYCXJJYB06)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, Wj., Liu, C. & Liu, Gl. A quantitative attribute-based benchmark methodology for single-target visual tracking. Front Inform Technol Electron Eng 21, 405–421 (2020). https://doi.org/10.1631/FITEE.1900245

Download citation

Received: 15 May 2019
Accepted: 09 October 2019
Published: 01 April 2020
Issue Date: March 2020
DOI: https://doi.org/10.1631/FITEE.1900245

Key words

CLC number

TP723

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A quantitative attribute-based benchmark methodology for single-target visual tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Image Matching from Handcrafted to Deep Features: A Survey

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

A quantitative attribute-based benchmark methodology for single-target visual tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Image Matching from Handcrafted to Deep Features: A Survey

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation