skip to main content
10.1145/3372278.3390729acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

Learning Fine-Grained Similarity Matching Networks for Visual Tracking

Published: 08 June 2020 Publication History

Abstract

Recently, siamese trackers have been increasingly popular in visual tracking community. Despite great success, it is still difficult to perform robust tracking in various challenging scenarios. In this paper, we propose a novel similarity matching network, that effectively extracts fine-grained semantic features by adding a Classification branch and a Category-Aware module into the classical Siamese framework (CCASiam). More specifically, the supervision module can fully utilize the class information to obtain a loss for classification and the whole network performs tracking loss, so that the network can extract more discriminative features for each specific target. During online tracking, the classification branch is removed and the category-aware module is designed to guide the selection of target-active features using a ridge regression network, which avoids unnecessary calculations and over-fitting. Furthermore, we introduce different types of attention mechanisms to selectively emphasize important semantic information. Due to the fine-grained and category-aware features, CCASiam can perform high performance tracking efficiently. Extensive experimental results on several tracking benchmarks, show that the proposed tracker obtains the state-of-the-art performance with a real-time speed.

References

[1]
Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej Miksik, and Philip H. S. Torr. 2016a. Staple: Complementary Learners for Real-Time Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2]
Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, and Philip H. S. Torr. 2016b. Fully-Convolutional Siamese Networks for Object Tracking. In The European Conference on Computer Vision (ECCV).
[3]
Janghoon Choi, Junseok Kwon, and Kyoung Mu Lee. 2019. Deep Meta Learning for Real-Time Target-Aware Visual Tracking. In The IEEE International Conference on Computer Vision (ICCV).
[4]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. Efficient Convolution Operators for Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Anfeng He, Chong Luo, Xinmei Tian, and Wenjun Zeng. 2018. A Twofold Siamese Network for Real-Time Object Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7]
David Held, Sebastian Thrun, and Silvio Savarese. 2016. Learning to Track at 100 FPS with Deep Regression Networks. In The European Conference on Computer Vision (ECCV).
[8]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9]
Chen Huang, Simon Lucey, and Deva Ramanan. 2017. Learning Policies for Adaptive Tracking With Deep Feature Cascades. In The IEEE International Conference on Computer Vision (ICCV).
[10]
Shaoli Huang, Zhe Xu, Dacheng Tao, and Ya Zhang. 2016. Part-Stacked CNN for Fine-Grained Visual Categorization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin, Alan Lukezic, and Gustavo Fernandez. 2016. The Visual Object Tracking VOT2016 Challenge Results. In IEEE International Conference on Computer Vision Workshops.
[12]
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojir, Alan Lukezic, Abdelrahman Eldesokey, and Gustavo Fernandez. 2017. The Visual Object Tracking VOT2017 Challenge Results. In The IEEE International Conference on Computer Vision (ICCV) Workshops.
[13]
Alex Krizhevsky, I. Sutskever, and G. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Neural Information Processing Systems.
[14]
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019 c. SiamRPN
[15]
: Evolution of Siamese Visual Tracking With Very Deep Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16]
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High Performance Visual Tracking With Siamese Region Proposal Network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17]
Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2019 a. GradNet: Gradient-Guided Network for Visual Object Tracking. In The IEEE International Conference on Computer Vision (ICCV).
[18]
Xin Li, Chao Ma, Baoyuan Wu, Zhenyu He, and Ming-Hsuan Yang. 2019 b. Target-Aware Deep Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Zhiyuan Liang and Jianbing Shen. 2019. Local Semantic Siamese Networks for Fast Tracking. IEEE Transactions on Image Processing, Vol. 29 (12 2019). https://doi.org/10.1109/TIP.2019.2959256
[20]
Alan Lukezic, Tomas Vojir, Luka Cehovin Zajc, Jiri Matas, and Matej Kristan. 2017. Discriminative Correlation Filter With Channel and Spatial Reliability. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21]
Hyeonseob Nam and Bohyung Han. 2016. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22]
Yuxin Peng, Xiangteng He, and Junjie Zhao. 2018. Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing, Vol. 27, 3 (2018), 1487--1500.
[23]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39 (06 2017), 1137--1149. https://doi.org/10.1109/TPAMI.2016.2577031
[24]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.
[25]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[26]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[27]
Ming Sun, Yuchen Yuan, Feng Zhou, and Errui Ding. 2018. Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition. In The European Conference on Computer Vision (ECCV).
[28]
Ran Tao, Efstratios Gavves, and Arnold W.M. Smeulders. 2016. Siamese Instance Search for Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29]
Jack Valmadre, Luca Bertinetto, Joao Henriques, Andrea Vedaldi, and Philip H. S. Torr. 2017. End-To-End Representation Learning for Correlation Filter Based Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30]
Qiang Wang, Zhu Teng, Junliang Xing, Weiming Hu, and Stephen Maybank. 2018c. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local Neural Networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32]
Yaming Wang, Vlad I. Morariu, and Larry S. Davis. 2018b. Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33]
Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online Object Tracking: A Benchmark. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34]
Wu Yi, Lim Jongwoo, and Yang Ming-Hsuan. 2015. Object Tracking Benchmark. IEEE Transactions on Pattern Analysis & Machine Intelligence, Vol. 37, 9 (2015), 1834--1848.
[35]
Han Zhang, Tao Xu, Mohamed Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed Elgammal, and Dimitris Metaxas. 2016. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36]
Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, and Fahad Shahbaz Khan. 2019. Learning the Model Update for Siamese Trackers. In The IEEE International Conference on Computer Vision (ICCV).
[37]
Zhipeng Zhang and Houwen Peng. 2019. Deeper and Wider Siamese Networks for Real-Time Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38]
Heliang Zheng, Jianlong Fu, Tao Mei, and Jiebo Luo. 2017. Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition. In The IEEE International Conference on Computer Vision (ICCV).
[39]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.

Cited By

View all
  • (2024)Perturbation-augmented Graph Convolutional Networks: A Graph Contrastive Learning architecture for effective node classification tasksEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107616129(107616)Online publication date: Mar-2024
  • (2022)FAFMOTS: A Fast and Anchor Free Method for Online Joint Multi-Object Tracking and Segmentation2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct57072.2022.00098(465-470)Online publication date: Oct-2022
  • (2021)Object Tracking Based on Global Context AttentionInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.28759515:4(1-16)Online publication date: 17-Sep-2021
  • Show More Cited By

Index Terms

  1. Learning Fine-Grained Similarity Matching Networks for Visual Tracking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval
    June 2020
    605 pages
    ISBN:9781450370875
    DOI:10.1145/3372278
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fine-grained representation
    2. siamese networks
    3. visual tracking

    Qualifiers

    • Short-paper

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    ICMR '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Perturbation-augmented Graph Convolutional Networks: A Graph Contrastive Learning architecture for effective node classification tasksEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107616129(107616)Online publication date: Mar-2024
    • (2022)FAFMOTS: A Fast and Anchor Free Method for Online Joint Multi-Object Tracking and Segmentation2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct57072.2022.00098(465-470)Online publication date: Oct-2022
    • (2021)Object Tracking Based on Global Context AttentionInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.28759515:4(1-16)Online publication date: 17-Sep-2021
    • (2020)HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object TrackingSensors10.3390/s2017480720:17(4807)Online publication date: 26-Aug-2020
    • (2020)Reinforced Similarity LearningProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413743(294-303)Online publication date: 12-Oct-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media