Abstract
The newly proposed correlation filter based trackers can achieve appealing performance despite their great simplicity and superior speed. However, this kind of object trackers is not born with scale and aspect ratio adaptability, thus resulting in suboptimal tracking accuracy. To tackle this problem, this paper integrates the class-agnostic detection proposal method, which is widely adopted in object detection area, into a correlation filter tracker. In the tracker part, optimizations such as feature integration, robust model updating and proposal rejection are applied for efficient integration. As for proposal generation, through integrating and comparing four detection proposal generators along with two baseline methods, the quality of detection proposals is found to have considerable influence on tracking accuracy. Therefore, as the most promising proposal generator, EdgeBoxes is chosen and further enhanced with background suppression. Evaluations are mainly performed on a challenging 50-sequence dataset (OTB50) and its two subsets, 28 sequences with significant scale variation and 14 sequences with obvious aspect ratio change. Among the trackers equipped with different proposal generators, state-of-the-art trackers and existing correlation filter variants, our proposed tracker reports the highest accuracy while running efficiently at an average speed of 20.4 frames per second. Additionally, numerical performance analysis in per-sequence manner and experiment results on VOT2014 dataset are also presented to enable deeper insights into our approach.
Similar content being viewed by others
Notes
We notice that in some sequences, the tracking bounding box of STC shrinks to extremely small size, resulting in even faster speed but unreliable results.
Here “\(x\sim y\)” means that the variation is examined between each frame’s x-th and y-th previous frame.
Here “\(x\sim y\)” means that the relative variation exceeds (1 / x, x) but still remains within (1 / y, y).
References
Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. TPAMI, 34(11), 2189–2202.
Arbelaez, P., Pont-Tuset, J., Barron, J., Marqués, F., & Malik, J. (2014). Multiscale combinatorial grouping. In CVPR (pp. 328–335).
Belagiannis, V., Schubert, F., Navab, N., & Ilic, S. (2012). Segmentation based particle filtering for real-time 2D object tracking. In ECCV (pp. 842–855).
Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In CVPR (pp. 2544–2550).
Cai, Z., Wen, L., Yang, J., Lei, Z., & Li, S. (2012). Structured visual tracking with dynamic graph. In ACCV (pp. 86–97).
Carreira, J., & Sminchisescu, C. (2012). CPMC: Automatic object segmentation using constrained parametric min-cuts. TPAMI, 34(7), 1312–1328.
Cheng, M. M., Zhang, Z., Lin, W. Y., & Torr, P. H. S. (2014). BING: Binarized normed gradients for objectness estimation at 300fps. In CVPR (pp. 3286–3293).
Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. TPAMI, 25(5), 564–577.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (pp. 886–893).
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2014a). Accurate scale estimation for robust visual tracking. In BMVC.
Danelljan, M., Shahbaz Khan, F., Felsberg, M., & Van de Weijer, J. (2014b). Adaptive color attributes for real-time visual tracking. In CVPR (pp. 1090–1097).
Dollár, P., & Zitnick, C. L. (2013). Structured forests for fast edge detection. In ICCV (pp. 1841–1848).
Duffner, S., & Garcia, C. (2013). PixelTrack: A fast adaptive algorithm for tracking non-rigid objects. In ICCV (pp. 2480–2487).
Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. IJCV, 111(1), 98–136.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR (pp. 580–587).
Godec, M., Roth, P. M., & Bischof, H. (2011). Hough-based tracking of non-rigid objects. In ICCV (pp. 81–88).
Hare, S., Saffari, A., & Torr, P. H. S. (2011). Struck: Structured output tracking with kernels. In ICCV (pp. 263–270).
He, K., Zhang, X., Ren, S., Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV (pp. 346–361).
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In ECCV (pp. 702–715).
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. TPAMI. doi:10.1109/TPAMI.2014.2345390.
Hosang, J., Benenson, R., & Schiele, B. (2014). How good are detection proposals, really?. In BMVC.
Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection proposals? TPAMI. doi:10.1109/TPAMI.2015.2465908.
Hua, Y., Alahari, K., & Schmid, C. (2015). Online object tracking with proposal selection. In ICCV, (pp. 3092–3100).
Huang, D., Luo, L., Wen, M., Chen, Z., & Zhang, C. (2015). Enable scale and aspect ratio adaptability in visual tracking with detection proposals. In BMVC.
Jia, X., Lu, H., & Yang, M. H. (2012). Visual tracking via adaptive structural local sparse appearance model. In CVPR (pp. 1822–1829).
Kalal, Z., Matas, J., & Mikolajczyk, K. (2010). P-N learning: Bootstrapping binary classifiers by structural constraints. In CVPR (pp. 49–56).
Krähenbühl, P., & Koltun, V. (2014). Geodesic object proposals. In ECCV (pp. 725–739).
Kristan, M., Pflugfelder, R., & Leonardis, A, et al. (2013). The visual object tracking VOT2013 challenge results. In ICCV workshop (pp. 98–111).
Kristan, M., Pflugfelder, R., & Leonardis, A, et al. (2014). The visual object tracking VOT2014 challenge results. http://votchallenge.net/vot2014/download/vot_2014_paper.pdf
Kwon, J., & Lee, K. M. (2010). Visual tracking decomposition. In CVPR (pp. 1269–1276).
Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In ECCV workshop, (pp. 254–265).
Liang, P., Pang, Y., Liao, C., Mei, X., & Ling, H. (2016). Adaptive objectness for object tracking. IEEE Signal Processing Letters, 23(7), 949–953.
Liu, B., Huang, J., Yang, L., & Kulikowsk, C. (2011). Robust tracking using local sparse appearance model and k-selection. In CVPR (pp. 1313–1320).
Liu, T., Wang, G., & Yang, Q. (2015). Real-time part-based visual tracking via adaptive correlation filters. In CVPR (pp. 4902–4912).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
Uijlings, J. R. R., Van de Sande, K. E. A., Gevers, T., & Smeulders, A. W. M. (2013). Selective search for object recognition. IJCV, 104(2), 154–171.
Van de Weijer, J., Schmid, C., Verbeek, J., & Larlus, D. (2009). Learning color names for real-world applications. TIP, 18(7), 1512–1523.
Wang, A., Wan, G., Cheng, Z., & Li, S. (2009). An incremental extremely random forest classifier for online learning and tracking. In ICIP (pp. 1449–1452).
Wang, A., Cheng, Z., Martin, R. R., & Li, S. (2013). Multiple-cue-based visual object contour tracking with incremental learning. LNCS, 7544, 225–243.
Wen, L., Du, D., Lei, Z., Li, S. Z., & Yang, M. H. (2015). JOTS: Joint online tracking and segmentation. In CVPR (pp. 2226–2234).
Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In CVPR (pp. 2411–2418).
Zhang, K., Zhang, L., Zhang, D., & Yang, M. H. (2014). Fast visual tracking via dense spatio-temporal context learning. In ECCV (pp. 127–141).
Zhong, W., Lu, H., & Yang, M. H. (2012). Robust object tracking via sparsity-based collaborative model. In CVPR (pp. 1838–1845).
Zhou, T. (2015). Bing objectness proposal estimator matlab (mex-c) wrapper. https://github.com/tfzhou/BINGObjectness
Zhu, G., Porikli, F., & Li, H. (2016a). Beyond local search: Tracking objects everywhere with instance-specific proposals. In CVPR (pp. 943–951).
Zhu, G., Porikli, F., & Li, H. (2016b). Robust visual tracking with deep convolutional neural network based object proposals on PETS. In CVPR workshop (pp. 26–33).
Zhu, G., Wang, J., Wu, Y., Zhang, X., & Lu, H. (2016c). MC-HOG correlation tracking with saliency proposal. In AAAI (pp. 3690–3696).
Zitnick, C. L., & Dollár, P. (2014). Edge Boxes: Locating object proposals from edges. In ECCV (pp. 391–405).
Acknowledgements
The authors gratefully acknowledge the support from National Natural Science Foundation of China under No. 61272145, 61402504, and 863 Program of China under No. 2012-AA012706.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Cordelia Schmid.
Rights and permissions
About this article
Cite this article
Huang, D., Luo, L., Chen, Z. et al. Applying Detection Proposals to Visual Tracking for Scale and Aspect Ratio Adaptability. Int J Comput Vis 122, 524–541 (2017). https://doi.org/10.1007/s11263-016-0974-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0974-6