Abstract
This paper presents a multi-scale region proposal network (RPN) for visual object tracking, inspired by Faster R-CNN and Yolo detectors which adopt an RPN to significantly speed up the detection time and achieve state-of-the-art detection performance. We expand them to apply a multi-scale region proposal network for visual tracking. Our proposed network can utilize both fine-grained features from shallow convolutional layers and discriminative features from deep convolutional layers. The features of shallow layers are good at accurate objects localization, and the features of deep convolutional layers can efficiently distinguish between target objects and backgrounds. A multi-domain learning mechanism is applied to train our network in an end-to-end way. To predict a new target object and its location in a new frame, we propose an re-ranking algorithm to determine a true object by exploiting spatial modeling, scale variants and color attributes of object proposals. Our tracker is validated on the OTB-15 object tracking benchmark, and achieves 0.603 for the success rate and 0.760 for the precision rate of the one-pass evaluation. Additionally, our tracker can run at 22 frames per second, which is very close to real-time speed. Experiment results show its outstanding performance in both tracking accuracy and speed by comparing it with existing state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
IoU donates Intersection over Union function.
- 2.
Here, IoU means Interaction over Union between anchor boxes and ground-truth boxes. If \(IoU \ge 0.7\), the anchor box is considered the true object location (positive), and if \(IoU \le 0.3\), it is considered as false location (negative).
References
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 1409–1422 (2010)
Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision, pp. 188–203 (2014)
Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Zhu, G., Porikli, F., Li, H.: Beyond local search: tracking objects everywhere with instance-specific proposals. In: IEEE Computer Vision and Pattern Recognition, pp. 943–951 (2016)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)
Danelljan, M., Khan, F., Felsberg, M., van de Weijer, J.: Learning spatially regularized correlation filters for visual tracking. In: IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556 (2014)
Luca, B., Jack, V., Andrea, V., Philip, T.: Fully-convolutional Siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865 (2016)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: complementary learners for real-time tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference (2014). doi:10.5244/C.28.65
Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., M.-H.Yang, J.L.: Hedged deep tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4303–4311 (2016)
Danelljan, M., Khan, F., Felsberg, M., van de Weijer, J.: Adaptive color attributes for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097 (2014)
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2015-R1A2A2A03006190) and also supported by Nvidia GPU Grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Fang, Y., Ko, S., Jo, GS. (2017). Multi-scale Region Proposal Network Trained by Multi-domain Learning for Visual Object Tracking. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-70090-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)