skip to main content
10.1145/3579109.3579122acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvipConference Proceedingsconference-collections
research-article

Uncropped Siamese Fully Convolutional Network for Visual Tracking

Published:14 March 2023Publication History

ABSTRACT

Considering the peculiarity of SiamCAR, which decomposes visual tracking into a Siamese subnetwork for feature extraction and a classification-regression subnetwork for bounding box prediction, we propose an empirical method to avoid cropping the high-level semantic information for further anchor-free tracking tasks. Based on the preprocessing of the training image pairs, we focus on the center region of the feature maps in the template branch, which includes the whole object while reducing background clutter. However, the position of the object in a sequence is always in constant motion. The loss of high-level features causes the large movement of the object to weaken the performance of the strategy for tracking the current object in accordance with the object location of the previous frame. Therefore, we remove the inappropriate cuts to obtain a better similarity map. Extensive experiments and comparisons show that our proposed simple but effective method achieves credible results with remarkable real-time speed on the UAV123, LaSOT and GOT-10k benchmarks.

References

  1. Jianbo Shi and Tomasi, "Good features to track," 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593-600, 1994.Google ScholarGoogle Scholar
  2. J. Xing, H. Ai and S. Lao, "Multiple Human Tracking Based on Multi-view Upper-Body Detection and Discriminative Learning," 2010 20th International Conference on Pattern Recognition, pp. 1698-1701, 2010.Google ScholarGoogle Scholar
  3. Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. Fully-convolutional siamese networks for object tracking. In European conference on computer vision, pp. 850-865, Oct. 2016.Google ScholarGoogle ScholarCross RefCross Ref
  4. B. Li, J. Yan, W. Wu, Z. Zhu and X. Hu, "High Performance Visual Tracking with Siamese Region Proposal Network," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, 2018.Google ScholarGoogle Scholar
  5. D. Guo, J. Wang, Y. Cui, Z. Wang and S. Chen, "SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6268-6276, 2020.Google ScholarGoogle Scholar
  6. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing and J. Yan, "SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4277-4286, 2019.Google ScholarGoogle Scholar
  7. H. Fan , "LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5369-5378, 2019.Google ScholarGoogle Scholar
  8. Mueller, M., Smith, N., & Ghanem, B. A benchmark and simulator for uav tracking. In European conference on computer vision, pp. 445-461, Oct. 2016.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ma, C., Huang, J. B., Yang, X., & Yang, M. H. Robust visual tracking via hierarchical convolutional features. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2709-2723,2018.Google ScholarGoogle Scholar
  10. J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi and P. H. S. Torr, "End-to-End Representation Learning for Correlation Filter Based Tracking," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000-5008, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  11. Lin, T. Y., Maire, M., & Belongie, S. (). Microsoft coco: Common objects in context. European conference on computer vision. Springer, Cham, 2014.Google ScholarGoogle Scholar
  12. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211-252,2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Real, J. Shlens, S. Mazzocchi, X. Pan and V. Vanhoucke, "YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7473, 2017.Google ScholarGoogle Scholar
  14. L. Huang, X. Zhao and K. Huang, "GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562-1577, May.2021.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Danelljan, G. Bhat, F. S. Khan and M. Felsberg, "ECO: Efficient Convolution Operators for Tracking," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931-6939, 2017.Google ScholarGoogle Scholar
  16. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV), pp. 101-117, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yu, J., Jiang, Y., Wang, Z., Cao, Z., & Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia, pp. 516-520, Oct. 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. K. Galoogahi, A. Fagg and S. Lucey, "Learning Background-Aware Correlation Filters for Visual Tracking," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1144-1152, 2017.Google ScholarGoogle Scholar
  19. H. Nam and B. Han, "Learning Multi-domain Convolutional Neural Networks for Visual Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293-4302, 2016.Google ScholarGoogle Scholar
  20. Bhat, G., Johnander, J., Danelljan, M., Khan, F. S., & Felsberg, M. Unveiling the power of deep tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 483-498, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Li, C. Tian, W. Zuo, L. Zhang and M. -H. Yang, "Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4904-4913, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  22. R. Tao, E. Gavves and A. W. M. Smeulders, "Siamese Instance Search for Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1420-1429, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  23. Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan and S. Wang, "Learning Dynamic Siamese Network for Visual Object Tracking," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1781-1789, 2017.Google ScholarGoogle Scholar
  24. Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., & Lu, H. Structured siamese network for real-time visual tracking. In Proceedings of the European conference on computer vision (ECCV) ,pp. 351-366,2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Song , "VITAL: VIsual Tracking via Adversarial Learning," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8990-8999, 2018.Google ScholarGoogle Scholar
  26. L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik and P. H. S. Torr, "Staple: Complementary Learners for Real-Time Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401-1409, 2016.Google ScholarGoogle Scholar
  27. Li, Y., & Zhu, J. A scale adaptive kernel correlation filter tracker with feature integration. In European conference on computer vision, pp. 254-265,Sep.2014.Google ScholarGoogle Scholar
  28. Pu, S., Song, Y., Ma, C., Zhang, H., & Yang, M. H. Deep attentive tracking via reciprocative learning. Advances in neural information processing systems, 31,2018.Google ScholarGoogle Scholar
  29. Zhang, J., Ma, S., & Sclaroff, S. MEEM: robust tracking via multiple experts using entropy minimization. In European conference on computer vision, pp. 188-203,Sep.2014.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Uncropped Siamese Fully Convolutional Network for Visual Tracking

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICVIP '22: Proceedings of the 2022 6th International Conference on Video and Image Processing
      December 2022
      189 pages
      ISBN:9781450397568
      DOI:10.1145/3579109

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 March 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format