skip to main content
10.1145/3604078.3604132acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdipConference Proceedingsconference-collections
research-article

Siamese Network for Object Tracking with Diffusion Model

Published: 26 October 2023 Publication History

Abstract

Recently, Siamese networks have drawn great attention in object tracking community because of their balanced accuracy and speed. However, due to the scarcity of target samples in the training period, Siamese networks suffer from low adaptability to the variations of target appearance in realistic tracking scenarios. Therefore, Siamese network for object tracking with diffusion model is proposed. Based on the target template labeled with bounding box, the network generates more target samples with ADM-G, and compares the feature map of samples to the ones of images from the searching areas in the classic SiamRPN++ architecture. High quality and diversity samples are highly correlated with the object, which greatly enrich the object sample, and enhance the quality of the feature extraction. As demonstrated by experiments, the Diff-SiamRPN++ tracker is superior to the others on different benchmarks.

References

[1]
Zirui Zhang and Jun Cheng. 2013. Multi-Camera Tracking Helmet System. Journal of Image and Graphics. Papers 1(2), 76-79.
[2]
Jorge Henrique, Busatto Casagrande and Marcelo R. Stemmer. 2014. Abnormal Motion Analysis for Tracking-Based Approaches Using Region-Based Method with Mobile Grid. Journal of Image and Graphics. Papers 2(1), 22-27.
[3]
Karthik Dinesh and Sumana Gupta. 2014. Video Stabilization, Camera Motion Pattern Recognition and Motion Tracking Using Spatiotemporal Regularity Flow. Journal of Image and Graphics. Papers 2(1), 33-40.
[4]
Saad A. Yaseen and Sreela Sasi. 2014. Robust Algorithm for Object Detection and Tracking in a Dynamic Scene. Journal of Image and Graphics. Papers 2(1), 41-45 (2014).
[5]
Chin-Shiuh Shieh, Yong-Shixa Jhan, Yuan-Li Liu, Mong-Fong Horng, and Tsair-Fwu Lee. 2018. Video Object Tracking with Heuristic Optimization Methods. Journal of Image and Graphics, Papers 6(2), 95-99.
[6]
M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg. 2017. ECO: Efficient convolution operators for tracking. IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 6931-6939.
[7]
Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan, and S. Wang. 2017. Learning dynamic siamese network for visual object tracking. IEEE International Conference on Computer Vision (ICCV). 1781-1789.
[8]
M. M¨uller, N. Smith, and B. Ghanem. 2016. A benchmark and simulator for uav tracking. European Conference on Computer Vision (ECCV). 445-461.
[9]
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. Siamrpn++: Evolution of siamese visual tracking with very deep networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4277–4286.
[10]
M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg. 2019. Atom: Accurate tracking by overlap maximization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4655–4664.
[11]
E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke. 2017. Youtube-bounding boxes: A large high-precision human-annotated data set for object detection in video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7464–7473.
[12]
H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese. 2021. Generalized intersection over union: A metric and a loss for bounding box regression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 658–666.
[13]
F. Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1800–1807.
[14]
Prafulla Dhariwal, Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. arXiv. https://arxiv.org/abs/2105.05233.
[15]
M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, and B. Ghanem. 2018. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. European Conference on Computer Vision (ECCV). 300-317.
[16]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.
[17]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. 2020. End-to-end object detection with transformers. European Conference on Computer Vision (ECCV). 213–229.
[18]
G. Wang, C. Luo, X. Sun, Z. Xiong, and W. Zeng. 2020. Tracking by instance detection: A meta-learning approach. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6287–6296.
[19]
M. Danelljan, L. Van Gool, and R. Timofte. 2020. Probabilistic regression for visual tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7181–7190.
[20]
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. Proceedings of the 37th International Conference on Machine Learning (PMLR). 1691–1703.
[21]
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, and William Chan. 2020. Wavegrad: Estimating gradients for waveform generation. Proceedings of the 37th International Conference on Machine Learning (PMLR). 1691–1703.
[22]
Federico A. Galatolo, Mario G. C. A. Cimino, and Gigliola Vaglini. 2021. Generating images from caption and vice versa via clip-guided generative latent space search. arXiv. https://arxiv.org/abs/2102.01645.
[23]
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2021. Diffwave: A versatile diffusion model for audio synthesis. arXiv. https://arxiv.org/abs/2009.09761.
[24]
Eric Luhman and Troy Luhman. 2021. Knowledge distillation in iterative generative models for improved sampling speed. arXiv. https://arxiv.org/abs/2101.02388.
[25]
Alex Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. arXiv. https://arxiv.org/abs/2102.09672.
[26]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, and 2019. Pytorch: An imperative style, high-performance deep learning library. arXiv. https://arxiv.org/abs/1912.01703.
[27]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. 2021. Image super-resolution via iterative refinement. arXiv. https://arxiv.org/abs/2104.07636.
[28]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 8110–8119.
[29]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. arXiv. https://arxiv.org/abs/2006.11239.
[30]
Yang Song and Stefano Ermon. 2020. Improved techniques for training Score-Based generative models. arXiv. https://arxiv.org/abs/2006.09011.

Cited By

View all
  • (2024)Beyond traditional visual object tracking: a surveyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02345-716:2(1435-1460)Online publication date: 26-Aug-2024

Index Terms

  1. Siamese Network for Object Tracking with Diffusion Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing
    May 2023
    711 pages
    ISBN:9798400708237
    DOI:10.1145/3604078
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Object tracking
    2. Siamese network
    3. diffusion model

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICDIP 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Beyond traditional visual object tracking: a surveyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02345-716:2(1435-1460)Online publication date: 26-Aug-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media