ABSTRACT
Considering the peculiarity of SiamCAR, which decomposes visual tracking into a Siamese subnetwork for feature extraction and a classification-regression subnetwork for bounding box prediction, we propose an empirical method to avoid cropping the high-level semantic information for further anchor-free tracking tasks. Based on the preprocessing of the training image pairs, we focus on the center region of the feature maps in the template branch, which includes the whole object while reducing background clutter. However, the position of the object in a sequence is always in constant motion. The loss of high-level features causes the large movement of the object to weaken the performance of the strategy for tracking the current object in accordance with the object location of the previous frame. Therefore, we remove the inappropriate cuts to obtain a better similarity map. Extensive experiments and comparisons show that our proposed simple but effective method achieves credible results with remarkable real-time speed on the UAV123, LaSOT and GOT-10k benchmarks.
- Jianbo Shi and Tomasi, "Good features to track," 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593-600, 1994.Google Scholar
- J. Xing, H. Ai and S. Lao, "Multiple Human Tracking Based on Multi-view Upper-Body Detection and Discriminative Learning," 2010 20th International Conference on Pattern Recognition, pp. 1698-1701, 2010.Google Scholar
- Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. Fully-convolutional siamese networks for object tracking. In European conference on computer vision, pp. 850-865, Oct. 2016.Google ScholarCross Ref
- B. Li, J. Yan, W. Wu, Z. Zhu and X. Hu, "High Performance Visual Tracking with Siamese Region Proposal Network," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, 2018.Google Scholar
- D. Guo, J. Wang, Y. Cui, Z. Wang and S. Chen, "SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6268-6276, 2020.Google Scholar
- B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing and J. Yan, "SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4277-4286, 2019.Google Scholar
- H. Fan , "LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5369-5378, 2019.Google Scholar
- Mueller, M., Smith, N., & Ghanem, B. A benchmark and simulator for uav tracking. In European conference on computer vision, pp. 445-461, Oct. 2016.Google ScholarCross Ref
- Ma, C., Huang, J. B., Yang, X., & Yang, M. H. Robust visual tracking via hierarchical convolutional features. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2709-2723,2018.Google Scholar
- J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi and P. H. S. Torr, "End-to-End Representation Learning for Correlation Filter Based Tracking," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000-5008, 2017.Google ScholarCross Ref
- Lin, T. Y., Maire, M., & Belongie, S. (). Microsoft coco: Common objects in context. European conference on computer vision. Springer, Cham, 2014.Google Scholar
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211-252,2015.Google ScholarDigital Library
- E. Real, J. Shlens, S. Mazzocchi, X. Pan and V. Vanhoucke, "YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7473, 2017.Google Scholar
- L. Huang, X. Zhao and K. Huang, "GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562-1577, May.2021.Google ScholarCross Ref
- M. Danelljan, G. Bhat, F. S. Khan and M. Felsberg, "ECO: Efficient Convolution Operators for Tracking," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931-6939, 2017.Google Scholar
- Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV), pp. 101-117, 2018.Google ScholarDigital Library
- Yu, J., Jiang, Y., Wang, Z., Cao, Z., & Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia, pp. 516-520, Oct. 2016.Google ScholarDigital Library
- H. K. Galoogahi, A. Fagg and S. Lucey, "Learning Background-Aware Correlation Filters for Visual Tracking," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1144-1152, 2017.Google Scholar
- H. Nam and B. Han, "Learning Multi-domain Convolutional Neural Networks for Visual Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293-4302, 2016.Google Scholar
- Bhat, G., Johnander, J., Danelljan, M., Khan, F. S., & Felsberg, M. Unveiling the power of deep tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 483-498, 2018.Google ScholarDigital Library
- F. Li, C. Tian, W. Zuo, L. Zhang and M. -H. Yang, "Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4904-4913, 2018.Google ScholarCross Ref
- R. Tao, E. Gavves and A. W. M. Smeulders, "Siamese Instance Search for Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1420-1429, 2016.Google ScholarCross Ref
- Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan and S. Wang, "Learning Dynamic Siamese Network for Visual Object Tracking," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1781-1789, 2017.Google Scholar
- Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., & Lu, H. Structured siamese network for real-time visual tracking. In Proceedings of the European conference on computer vision (ECCV) ,pp. 351-366,2018.Google ScholarDigital Library
- Y. Song , "VITAL: VIsual Tracking via Adversarial Learning," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8990-8999, 2018.Google Scholar
- L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik and P. H. S. Torr, "Staple: Complementary Learners for Real-Time Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401-1409, 2016.Google Scholar
- Li, Y., & Zhu, J. A scale adaptive kernel correlation filter tracker with feature integration. In European conference on computer vision, pp. 254-265,Sep.2014.Google Scholar
- Pu, S., Song, Y., Ma, C., Zhang, H., & Yang, M. H. Deep attentive tracking via reciprocative learning. Advances in neural information processing systems, 31,2018.Google Scholar
- Zhang, J., Ma, S., & Sclaroff, S. MEEM: robust tracking via multiple experts using entropy minimization. In European conference on computer vision, pp. 188-203,Sep.2014.Google ScholarCross Ref
Index Terms
- Uncropped Siamese Fully Convolutional Network for Visual Tracking
Recommendations
Hand-Eye Camera Calibration with an Optical Tracking System
ICDSC '18: Proceedings of the 12th International Conference on Distributed Smart CamerasThis paper presents a method for hand-eye camera calibration via an optical tracking system (OTS) faciltating robotic applications. The camera pose cannot be directly tracked via the OTS. Because of this, a transformation matrix between a marker-plate ...
Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017In this paper, we propose a new visual object tracking which realizes robustness against object occlusion and deformation. In the proposed visual tracking, triplet convolutional neural network (triplet-CNN) structure is devised. The three inputs for the ...
Siamese network ensemble for visual tracking
Visual object tracking is a challenging task considering illumination variation, occlusion, rotation, deformation and other problems. In this paper, we extend a Siamese INstance search Tracker (SINT) with model updating mechanism to improve its tracking ...
Comments