research-article

Uncropped Siamese Fully Convolutional Network for Visual Tracking

Authors:
Zhongwang Cai

Jiangxi Science and Technology Normal University, China

Jiangxi Science and Technology Normal University, China

0000-0002-7482-2985
View Profile

,
Yongqi Hu

Jiangxi Science and Technology Normal University, China

Jiangxi Science and Technology Normal University, China

0000-0002-4817-5905
View Profile

,
Zhen Yang

Jiangxi Science and Technology Normal University, China

Jiangxi Science and Technology Normal University, China

0000-0003-2487-1305
View Profile

ICVIP '22: Proceedings of the 2022 6th International Conference on Video and Image ProcessingDecember 2022Pages 72–77https://doi.org/10.1145/3579109.3579122

Published:14 March 2023Publication History

ICVIP '22: Proceedings of the 2022 6th International Conference on Video and Image Processing

Pages 72–77

ABSTRACT

Considering the peculiarity of SiamCAR, which decomposes visual tracking into a Siamese subnetwork for feature extraction and a classification-regression subnetwork for bounding box prediction, we propose an empirical method to avoid cropping the high-level semantic information for further anchor-free tracking tasks. Based on the preprocessing of the training image pairs, we focus on the center region of the feature maps in the template branch, which includes the whole object while reducing background clutter. However, the position of the object in a sequence is always in constant motion. The loss of high-level features causes the large movement of the object to weaken the performance of the strategy for tracking the current object in accordance with the object location of the previous frame. Therefore, we remove the inappropriate cuts to obtain a better similarity map. Extensive experiments and comparisons show that our proposed simple but effective method achieves credible results with remarkable real-time speed on the UAV123, LaSOT and GOT-10k benchmarks.

References

Jianbo Shi and Tomasi, "Good features to track," 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593-600, 1994.Google Scholar
J. Xing, H. Ai and S. Lao, "Multiple Human Tracking Based on Multi-view Upper-Body Detection and Discriminative Learning," 2010 20th International Conference on Pattern Recognition, pp. 1698-1701, 2010.Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. Fully-convolutional siamese networks for object tracking. In European conference on computer vision, pp. 850-865, Oct. 2016.Google ScholarCross Ref
B. Li, J. Yan, W. Wu, Z. Zhu and X. Hu, "High Performance Visual Tracking with Siamese Region Proposal Network," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, 2018.Google Scholar
D. Guo, J. Wang, Y. Cui, Z. Wang and S. Chen, "SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6268-6276, 2020.Google Scholar
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing and J. Yan, "SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4277-4286, 2019.Google Scholar
H. Fan , "LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5369-5378, 2019.Google Scholar
Mueller, M., Smith, N., & Ghanem, B. A benchmark and simulator for uav tracking. In European conference on computer vision, pp. 445-461, Oct. 2016.Google ScholarCross Ref
Ma, C., Huang, J. B., Yang, X., & Yang, M. H. Robust visual tracking via hierarchical convolutional features. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2709-2723,2018.Google Scholar
J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi and P. H. S. Torr, "End-to-End Representation Learning for Correlation Filter Based Tracking," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000-5008, 2017.Google ScholarCross Ref
Lin, T. Y., Maire, M., & Belongie, S. (). Microsoft coco: Common objects in context. European conference on computer vision. Springer, Cham, 2014.Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211-252,2015.Google ScholarDigital Library
E. Real, J. Shlens, S. Mazzocchi, X. Pan and V. Vanhoucke, "YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7473, 2017.Google Scholar
L. Huang, X. Zhao and K. Huang, "GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562-1577, May.2021.Google ScholarCross Ref
M. Danelljan, G. Bhat, F. S. Khan and M. Felsberg, "ECO: Efficient Convolution Operators for Tracking," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931-6939, 2017.Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV), pp. 101-117, 2018.Google ScholarDigital Library
Yu, J., Jiang, Y., Wang, Z., Cao, Z., & Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia, pp. 516-520, Oct. 2016.Google ScholarDigital Library
H. K. Galoogahi, A. Fagg and S. Lucey, "Learning Background-Aware Correlation Filters for Visual Tracking," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1144-1152, 2017.Google Scholar
H. Nam and B. Han, "Learning Multi-domain Convolutional Neural Networks for Visual Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293-4302, 2016.Google Scholar
Bhat, G., Johnander, J., Danelljan, M., Khan, F. S., & Felsberg, M. Unveiling the power of deep tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 483-498, 2018.Google ScholarDigital Library
F. Li, C. Tian, W. Zuo, L. Zhang and M. -H. Yang, "Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4904-4913, 2018.Google ScholarCross Ref
R. Tao, E. Gavves and A. W. M. Smeulders, "Siamese Instance Search for Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1420-1429, 2016.Google ScholarCross Ref
Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan and S. Wang, "Learning Dynamic Siamese Network for Visual Object Tracking," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1781-1789, 2017.Google Scholar
Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., & Lu, H. Structured siamese network for real-time visual tracking. In Proceedings of the European conference on computer vision (ECCV) ,pp. 351-366,2018.Google ScholarDigital Library
Y. Song , "VITAL: VIsual Tracking via Adversarial Learning," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8990-8999, 2018.Google Scholar
L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik and P. H. S. Torr, "Staple: Complementary Learners for Real-Time Tracking," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401-1409, 2016.Google Scholar
Li, Y., & Zhu, J. A scale adaptive kernel correlation filter tracker with feature integration. In European conference on computer vision, pp. 254-265,Sep.2014.Google Scholar
Pu, S., Song, Y., Ma, C., Zhang, H., & Yang, M. H. Deep attentive tracking via reciprocative learning. Advances in neural information processing systems, 31,2018.Google Scholar
Zhang, J., Ma, S., & Sclaroff, S. MEEM: robust tracking via multiple experts using entropy minimization. In European conference on computer vision, pp. 188-203,Sep.2014.Google ScholarCross Ref

Index Terms

Uncropped Siamese Fully Convolutional Network for Visual Tracking
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization

Recommendations

Hand-Eye Camera Calibration with an Optical Tracking System
ICDSC '18: Proceedings of the 12th International Conference on Distributed Smart Cameras

This paper presents a method for hand-eye camera calibration via an optical tracking system (OTS) faciltating robotic applications. The camera pose cannot be directly tracked via the OTS. Because of this, a transformation matrix between a marker-plate ...
Read More
Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

In this paper, we propose a new visual object tracking which realizes robustness against object occlusion and deformation. In the proposed visual tracking, triplet convolutional neural network (triplet-CNN) structure is devised. The three inputs for the ...
Read More
Siamese network ensemble for visual tracking

Visual object tracking is a challenging task considering illumination variation, occlusion, rotation, deformation and other problems. In this paper, we extend a Siamese INstance search Tracker (SINT) with model updating mechanism to improve its tracking ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICVIP '22: Proceedings of the 2022 6th International Conference on Video and Image Processing
December 2022
189 pages
ISBN:9781450397568
DOI:10.1145/3579109

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
regression
siamese
tracking
uncutting
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 18
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Uncropped Siamese Fully Convolutional Network for Visual Tracking

ICVIP '22: Proceedings of the 2022 6th International Conference on Video and Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hand-Eye Camera Calibration with an Optical Tracking System

Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network

Siamese network ensemble for visual tracking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Uncropped Siamese Fully Convolutional Network for Visual Tracking

ICVIP '22: Proceedings of the 2022 6th International Conference on Video and Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hand-Eye Camera Calibration with an Optical Tracking System

Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network

Siamese network ensemble for visual tracking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media