Applying Detection Proposals to Visual Tracking for Scale and Aspect Ratio Adaptability

Huang, Dafei; Luo, Lei; Chen, Zhaoyun; Wen, Mei; Zhang, Chunyuan

doi:10.1007/s11263-016-0974-6

Applying Detection Proposals to Visual Tracking for Scale and Aspect Ratio Adaptability

Published: 26 December 2016

Volume 122, pages 524–541, (2017)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Dafei Huang^1,2,
Lei Luo¹,
Zhaoyun Chen^1,2,
Mei Wen^1,2 &
…
Chunyuan Zhang^1,2

1458 Accesses
25 Citations
Explore all metrics

Abstract

The newly proposed correlation filter based trackers can achieve appealing performance despite their great simplicity and superior speed. However, this kind of object trackers is not born with scale and aspect ratio adaptability, thus resulting in suboptimal tracking accuracy. To tackle this problem, this paper integrates the class-agnostic detection proposal method, which is widely adopted in object detection area, into a correlation filter tracker. In the tracker part, optimizations such as feature integration, robust model updating and proposal rejection are applied for efficient integration. As for proposal generation, through integrating and comparing four detection proposal generators along with two baseline methods, the quality of detection proposals is found to have considerable influence on tracking accuracy. Therefore, as the most promising proposal generator, EdgeBoxes is chosen and further enhanced with background suppression. Evaluations are mainly performed on a challenging 50-sequence dataset (OTB50) and its two subsets, 28 sequences with significant scale variation and 14 sequences with obvious aspect ratio change. Among the trackers equipped with different proposal generators, state-of-the-art trackers and existing correlation filter variants, our proposed tracker reports the highest accuracy while running efficiently at an average speed of 20.4 frames per second. Additionally, numerical performance analysis in per-sequence manner and experiment results on VOT2014 dataset are also presented to enable deeper insights into our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

Article 30 January 2023

Notes

We notice that in some sequences, the tracking bounding box of STC shrinks to extremely small size, resulting in even faster speed but unreliable results.
Here “\(x\sim y\)” means that the variation is examined between each frame’s x-th and y-th previous frame.
Here “\(x\sim y\)” means that the relative variation exceeds (1 / x, x) but still remains within (1 / y, y).

References

Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. TPAMI, 34(11), 2189–2202.
Article Google Scholar
Arbelaez, P., Pont-Tuset, J., Barron, J., Marqués, F., & Malik, J. (2014). Multiscale combinatorial grouping. In CVPR (pp. 328–335).
Belagiannis, V., Schubert, F., Navab, N., & Ilic, S. (2012). Segmentation based particle filtering for real-time 2D object tracking. In ECCV (pp. 842–855).
Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In CVPR (pp. 2544–2550).
Cai, Z., Wen, L., Yang, J., Lei, Z., & Li, S. (2012). Structured visual tracking with dynamic graph. In ACCV (pp. 86–97).
Carreira, J., & Sminchisescu, C. (2012). CPMC: Automatic object segmentation using constrained parametric min-cuts. TPAMI, 34(7), 1312–1328.
Article Google Scholar
Cheng, M. M., Zhang, Z., Lin, W. Y., & Torr, P. H. S. (2014). BING: Binarized normed gradients for objectness estimation at 300fps. In CVPR (pp. 3286–3293).
Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. TPAMI, 25(5), 564–577.
Article Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (pp. 886–893).
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2014a). Accurate scale estimation for robust visual tracking. In BMVC.
Danelljan, M., Shahbaz Khan, F., Felsberg, M., & Van de Weijer, J. (2014b). Adaptive color attributes for real-time visual tracking. In CVPR (pp. 1090–1097).
Dollár, P., & Zitnick, C. L. (2013). Structured forests for fast edge detection. In ICCV (pp. 1841–1848).
Duffner, S., & Garcia, C. (2013). PixelTrack: A fast adaptive algorithm for tracking non-rigid objects. In ICCV (pp. 2480–2487).
Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. IJCV, 111(1), 98–136.
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR (pp. 580–587).
Godec, M., Roth, P. M., & Bischof, H. (2011). Hough-based tracking of non-rigid objects. In ICCV (pp. 81–88).
Hare, S., Saffari, A., & Torr, P. H. S. (2011). Struck: Structured output tracking with kernels. In ICCV (pp. 263–270).
He, K., Zhang, X., Ren, S., Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV (pp. 346–361).
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In ECCV (pp. 702–715).
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. TPAMI. doi:10.1109/TPAMI.2014.2345390.
Hosang, J., Benenson, R., & Schiele, B. (2014). How good are detection proposals, really?. In BMVC.
Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection proposals? TPAMI. doi:10.1109/TPAMI.2015.2465908.
Hua, Y., Alahari, K., & Schmid, C. (2015). Online object tracking with proposal selection. In ICCV, (pp. 3092–3100).
Huang, D., Luo, L., Wen, M., Chen, Z., & Zhang, C. (2015). Enable scale and aspect ratio adaptability in visual tracking with detection proposals. In BMVC.
Jia, X., Lu, H., & Yang, M. H. (2012). Visual tracking via adaptive structural local sparse appearance model. In CVPR (pp. 1822–1829).
Kalal, Z., Matas, J., & Mikolajczyk, K. (2010). P-N learning: Bootstrapping binary classifiers by structural constraints. In CVPR (pp. 49–56).
Krähenbühl, P., & Koltun, V. (2014). Geodesic object proposals. In ECCV (pp. 725–739).
Kristan, M., Pflugfelder, R., & Leonardis, A, et al. (2013). The visual object tracking VOT2013 challenge results. In ICCV workshop (pp. 98–111).
Kristan, M., Pflugfelder, R., & Leonardis, A, et al. (2014). The visual object tracking VOT2014 challenge results. http://votchallenge.net/vot2014/download/vot_2014_paper.pdf
Kwon, J., & Lee, K. M. (2010). Visual tracking decomposition. In CVPR (pp. 1269–1276).
Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In ECCV workshop, (pp. 254–265).
Liang, P., Pang, Y., Liao, C., Mei, X., & Ling, H. (2016). Adaptive objectness for object tracking. IEEE Signal Processing Letters, 23(7), 949–953.
Article Google Scholar
Liu, B., Huang, J., Yang, L., & Kulikowsk, C. (2011). Robust tracking using local sparse appearance model and k-selection. In CVPR (pp. 1313–1320).
Liu, T., Wang, G., & Yang, Q. (2015). Real-time part-based visual tracking via adaptive correlation filters. In CVPR (pp. 4902–4912).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
Uijlings, J. R. R., Van de Sande, K. E. A., Gevers, T., & Smeulders, A. W. M. (2013). Selective search for object recognition. IJCV, 104(2), 154–171.
Article Google Scholar
Van de Weijer, J., Schmid, C., Verbeek, J., & Larlus, D. (2009). Learning color names for real-world applications. TIP, 18(7), 1512–1523.
MathSciNet Google Scholar
Wang, A., Wan, G., Cheng, Z., & Li, S. (2009). An incremental extremely random forest classifier for online learning and tracking. In ICIP (pp. 1449–1452).
Wang, A., Cheng, Z., Martin, R. R., & Li, S. (2013). Multiple-cue-based visual object contour tracking with incremental learning. LNCS, 7544, 225–243.
Google Scholar
Wen, L., Du, D., Lei, Z., Li, S. Z., & Yang, M. H. (2015). JOTS: Joint online tracking and segmentation. In CVPR (pp. 2226–2234).
Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In CVPR (pp. 2411–2418).
Zhang, K., Zhang, L., Zhang, D., & Yang, M. H. (2014). Fast visual tracking via dense spatio-temporal context learning. In ECCV (pp. 127–141).
Zhong, W., Lu, H., & Yang, M. H. (2012). Robust object tracking via sparsity-based collaborative model. In CVPR (pp. 1838–1845).
Zhou, T. (2015). Bing objectness proposal estimator matlab (mex-c) wrapper. https://github.com/tfzhou/BINGObjectness
Zhu, G., Porikli, F., & Li, H. (2016a). Beyond local search: Tracking objects everywhere with instance-specific proposals. In CVPR (pp. 943–951).
Zhu, G., Porikli, F., & Li, H. (2016b). Robust visual tracking with deep convolutional neural network based object proposals on PETS. In CVPR workshop (pp. 26–33).
Zhu, G., Wang, J., Wu, Y., Zhang, X., & Lu, H. (2016c). MC-HOG correlation tracking with saliency proposal. In AAAI (pp. 3690–3696).
Zitnick, C. L., & Dollár, P. (2014). Edge Boxes: Locating object proposals from edges. In ECCV (pp. 391–405).

Download references

Acknowledgements

The authors gratefully acknowledge the support from National Natural Science Foundation of China under No. 61272145, 61402504, and 863 Program of China under No. 2012-AA012706.

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Dafei Huang, Lei Luo, Zhaoyun Chen, Mei Wen & Chunyuan Zhang
National Key Laboratory of Parallel and Distributed Processing, National University of Defense Technology, Changsha, China
Dafei Huang, Zhaoyun Chen, Mei Wen & Chunyuan Zhang

Authors

Dafei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mei Wen
View author publications
You can also search for this author in PubMed Google Scholar
Chunyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Luo.

Additional information

Communicated by Cordelia Schmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, D., Luo, L., Chen, Z. et al. Applying Detection Proposals to Visual Tracking for Scale and Aspect Ratio Adaptability. Int J Comput Vis 122, 524–541 (2017). https://doi.org/10.1007/s11263-016-0974-6

Download citation

Received: 16 December 2015
Accepted: 17 November 2016
Published: 26 December 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11263-016-0974-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying Detection Proposals to Visual Tracking for Scale and Aspect Ratio Adaptability

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying Detection Proposals to Visual Tracking for Scale and Aspect Ratio Adaptability

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation