skip to main content
10.1145/3394171.3413743acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Reinforced Similarity Learning: Siamese Relation Networks for Robust Object Tracking

Published: 12 October 2020 Publication History

Abstract

Recently, Siamese networks based tracking algorithms have shown favorable performance. Latest work focuses on better feature embedding and target state estimation, which greatly improves the accuracy. Nevertheless, the simple cross-correlation operation of the features between a fixed template and the search region limits their robustness and discrimination capability. In this paper, we pay more attention to learn an outstanding similarity measure for robust tracking. We propose a novel relation network that can be integrated on top of previous trackers without any need for further training of the siamese networks, which achieves a superior discriminative ability. During online inference, we utilize the feedback from high-confidence tracking results to obtain an additional template and update it, which improves the robustness and generalization. We implement two versions of the proposed approach with the SiamFC-based tracker and SiamRPN-based tracker to validate the strong compatibility of our algorithm. Extensive experimental results on several tracking benchmarks indicate that the proposed method can effectively improve the performance and robustness of the underlying trackers without reducing speed too much, and performs superiorly against the state-of-the-art trackers.

Supplementary Material

MP4 File (3394171.3413743.mp4)
Recently, Siamese trackers have shown favorable performance. Latest work focuses on better feature embedding and scale estimation, which greatly improves the accuracy. Nevertheless, the simple cross-correlation of the features limits their robustness. In this paper, we pay more attention to learn an outstanding similarity measure for robust tracking. We propose a novel relation network that can be integrated on top of previous trackers, which achieves a superior discriminative ability. During inference, we utilize the feedback from tracking results to obtain an additional template and update it, which improves the robustness and generalization. We implement two versions of the proposed approach with the SiamFC-based and SiamRPN-based trackers to validate the compatibility of our algorithm. Extensive experimental results show that our method effectively improves the performance of the underlying trackers without reducing speed too much, and performs superiorly against the state-of-the-art trackers.

References

[1]
Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, and Philip H. S. Torr. 2016. Fully-Convolutional Siamese Networks for Object Tracking. In European Conference on Computer Vision.
[2]
Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning Discriminative Model Prediction for Tracking. In The IEEE International Conference on Computer Vision (ICCV).
[3]
Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2018. Unveiling the Power of Deep Tracking. In The European Conference on Computer Vision (ECCV).
[4]
Janghoon Choi, Junseok Kwon, and Kyoung Mu Lee. 2019. Deep Meta Learning for Real-Time Target-Aware Visual Tracking. In The IEEE International Conference on Computer Vision (ICCV).
[5]
Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, and Jianhua Li. 2019. Visual Tracking via Adaptive Spatially-Regularized Correlation Filters. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. ATOM: Accurate Tracking by Overlap Maximization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. Efficient Convolution Operators for Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
Xiaohan Ding, Yuchen Guo, Guiguang Ding, and Jungong Han. 2019. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In The IEEE International Conference on Computer Vision (ICCV).
[9]
Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Heng Fan and Haibin Ling. 2017. Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking. In The IEEE International Conference on Computer Vision (ICCV).
[11]
Heng Fan and Haibin Ling. 2019. Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12]
P. Gao, Y. Ma, R. Yuan, L. Xiao, and F. Wang. 2019. Learning Cascaded Siamese Networks for High Performance Visual Tracking. In The IEEE International Conference on Image Processing (ICIP).
[13]
Jiayuan Gu, Han Hu, Liwei Wang, Yichen Wei, and Jifeng Dai. 2018. Learning region features for object detection. In Proceedings of the European Conference on Computer Vision (ECCV). 381--395.
[14]
Qing Guo, Feng Wei, Ce Zhou, Huang Rui, and Wang Song. 2017. Learning Dynamic Siamese Network for Visual Object Tracking. In International Conference on Computer Vision (ICCV 2017).
[15]
Anfeng He, Chong Luo, Xinmei Tian, and Wenjun Zeng. 2018a. Towards a Better Match in Siamese Network Based Visual Object Tracker. In The European Conference on Computer Vision Workshop (ICCVW).
[16]
Anfeng He, Chong Luo, Xinmei Tian, and Wenjun Zeng. 2018b. A Twofold Siamese Network for Real-Time Object Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei. 2018. Relation networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3588--3597.
[19]
Han Hu, Zheng Zhang, Zhenda Xie, and Stephen Lin. 2019. Local relation networks for image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3464--3473.
[20]
Ziyuan Huang, Changhong Fu, Yiming Li, Fuling Lin, and Peng Lu. 2019. Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking. In The IEEE International Conference on Computer Vision (ICCV).
[21]
Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. 2018. Acquisition of Localization Confidence for Accurate Object Detection. In The European Conference on Computer Vision (ECCV).
[22]
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin, Alan Lukezic, and Gustavo Fernandez. 2016. The Visual Object Tracking VOT2016 Challenge Results. In IEEE International Conference on Computer Vision Workshops.
[23]
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, Gustavo Fernandez, and Garcia-Martin. 2018. The sixth Visual Object Tracking VOT2018 challenge results. In The European Conference on Computer Vision (ECCV) Workshops.
[24]
Alex Krizhevsky, I. Sutskever, and G. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In neural information processing systems (2012).
[25]
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019 b. SiamRPN: Evolution of Siamese Visual Tracking With Very Deep Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26]
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High Performance Visual Tracking With Siamese Region Proposal Network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27]
Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2019 a. GradNet: Gradient-Guided Network for Visual Object Tracking. In The IEEE International Conference on Computer Vision (ICCV).
[28]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and Larry Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV eccv ed.). European Conference on Computer Vision. https://www.microsoft.com/en-us/research/publication/microsoft-coco-common-objects-in-context/
[29]
Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian Reid, and Ming-Hsuan Yang. 2018. Deep Regression Tracking with Shrinkage Loss. In The European Conference on Computer Vision (ECCV).
[30]
Matthias Mueller, Neil Smith, and Bernard Ghanem. 2016. A benchmark and simulator for uav tracking. In European Conference on Computer Vision.
[31]
Hyeonseob Nam and Bohyung Han. 2016. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32]
Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, and Vincent Vanhoucke. 2017. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in neural information processing systems. 91--99.
[34]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.
[35]
Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. 2017. A simple neural network module for relational reasoning. In Advances in neural information processing systems. 4967--4976.
[36]
Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, and Cordelia Schmid. 2018a. Actor-centric relation network. In Proceedings of the European Conference on Computer Vision (ECCV). 318--334.
[37]
Chong Sun, Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. 2018b. Learning Spatial-Aware Regressions for Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199--1208.
[39]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision. 9627--9636.
[40]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[41]
Paul Voigtlaender, Jonathon Luiten, Philip HS Torr, and Bastian Leibe. 2019. Siam R-CNN: Visual Tracking by Re-Detection. arXiv preprint arXiv:1911.12836 (2019).
[42]
Guangting Wang, Chong Luo, Zhiwei Xiong, and Wenjun Zeng. 2019 a. SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43]
Mengmeng Wang, Yong Liu, and Zeyi Huang. 2017. Large margin object tracking with circulant feature maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4021--4029.
[44]
Qiang Wang, Zhu Teng, Junliang Xing, Weiming Hu, and Stephen Maybank. 2018b. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45]
Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H.S. Torr. 2019 b. Fast Online Object Tracking and Segmentation: A Unifying Approach. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local Neural Networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47]
Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object Tracking Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 9 (2015), 1834--1848.
[48]
Jiarui Xu, Yue Cao, Zheng Zhang, and Han Hu. 2019 a. Spatial-temporal relation networks for multi-object tracking. In Proceedings of the IEEE International Conference on Computer Vision. 3988--3998.
[49]
Yinda Xu, Zeyu Wang, Zuoxin Li, Yuan Ye, and Gang Yu. 2019 b. SiamFC: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. arXiv preprint arXiv:1911.06188 (2019).
[50]
Dawei Zhang and Zhonglong Zheng. 2020 a. High Performance Visual Tracking with Siamese Actor-Critic Network. In The IEEE International Conference on Image Processing (ICIP).
[51]
Dawei Zhang and Zhonglong Zheng. 2020 b. Joint Representation Learning with Deep Quadruplet Network for Real-Time Visual Tracking. In International Joint Conference on Neural Networks (IJCNN).
[52]
Dawei Zhang, Zhonglong Zheng, Xiaowei He, Liu Su, and Liyuan Chen. 2020. Learning Fine-Grained Similarity Matching Networks for Visual Tracking. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 296--300. https://doi.org/10.1145/3372278.3390729
[53]
Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, and Fahad Shahbaz Khan. 2019. Learning the Model Update for Siamese Trackers. In The IEEE International Conference on Computer Vision (ICCV).
[54]
Yunhua Zhang, Dong Wang, Lijun Wang, Jinqing Qi, and Huchuan Lu. 2018. Learning regression and verification networks for long-term visual tracking. arXiv preprint arXiv:1809.04320 (2018).
[55]
Zhipeng Zhang and Houwen Peng. 2019. Deeper and Wider Siamese Networks for Real-Time Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56]
Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu. 2018. Distractor-aware Siamese Networks for Visual Object Tracking. In The European Conference on Computer Vision (ECCV).

Cited By

View all
  • (2025)KTnet: Hazy weather object detection based on knowledge transferIET Intelligent Transport Systems10.1049/itr2.1260619:1Online publication date: 19-Feb-2025
  • (2025)Partitioned token fusion and pruning strategy for transformer trackingImage and Vision Computing10.1016/j.imavis.2025.105431154(105431)Online publication date: Feb-2025
  • (2024)DRRN: Differential rectification & refinement network for ischemic infarct segmentationCAAI Transactions on Intelligence Technology10.1049/cit2.12350Online publication date: 24-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. relation network
  2. siamese networks
  3. similarity learning
  4. visual object tracking

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Natural Science Foundation of Zhejiang Province

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)KTnet: Hazy weather object detection based on knowledge transferIET Intelligent Transport Systems10.1049/itr2.1260619:1Online publication date: 19-Feb-2025
  • (2025)Partitioned token fusion and pruning strategy for transformer trackingImage and Vision Computing10.1016/j.imavis.2025.105431154(105431)Online publication date: Feb-2025
  • (2024)DRRN: Differential rectification & refinement network for ischemic infarct segmentationCAAI Transactions on Intelligence Technology10.1049/cit2.12350Online publication date: 24-Jul-2024
  • (2024)An efficient multi-scale learning method for image super-resolution networksNeural Networks10.1016/j.neunet.2023.10.015169(120-133)Online publication date: Jan-2024
  • (2024)Eliminating and mining strategies for open-world object proposalNeurocomputing10.1016/j.neucom.2024.128026599(128026)Online publication date: Sep-2024
  • (2024)Perturbation-augmented Graph Convolutional Networks: A Graph Contrastive Learning architecture for effective node classification tasksEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107616129(107616)Online publication date: Mar-2024
  • (2024)Towards universal and sparse adversarial examples for visual object trackingApplied Soft Computing10.1016/j.asoc.2024.111252153(111252)Online publication date: Mar-2024
  • (2024)Emotion quantification and classification using the neutrosophic approach to deep learningApplied Soft Computing10.1016/j.asoc.2023.110896148:COnline publication date: 27-Feb-2024
  • (2024)VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product SalesInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00495-817:1Online publication date: 18-Apr-2024
  • (2024)Forestry Ecosystem Protection from the Perspective of Eco-civilization Based on Self-Attention Using Hierarchical Dilated Convolutional Neural NetworkInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00452-517:1Online publication date: 22-Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media