research-article

Siamese Network for Object Tracking with Diffusion Model

Authors:

Jiacheng Zhang,

Yifeng ZhangAuthors Info & Claims

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

Article No.: 54, Pages 1 - 6

https://doi.org/10.1145/3604078.3604132

Published: 26 October 2023 Publication History

Abstract

Recently, Siamese networks have drawn great attention in object tracking community because of their balanced accuracy and speed. However, due to the scarcity of target samples in the training period, Siamese networks suffer from low adaptability to the variations of target appearance in realistic tracking scenarios. Therefore, Siamese network for object tracking with diffusion model is proposed. Based on the target template labeled with bounding box, the network generates more target samples with ADM-G, and compares the feature map of samples to the ones of images from the searching areas in the classic SiamRPN++ architecture. High quality and diversity samples are highly correlated with the object, which greatly enrich the object sample, and enhance the quality of the feature extraction. As demonstrated by experiments, the Diff-SiamRPN++ tracker is superior to the others on different benchmarks.

References

[1]

Zirui Zhang and Jun Cheng. 2013. Multi-Camera Tracking Helmet System. Journal of Image and Graphics. Papers 1(2), 76-79.

[2]

Jorge Henrique, Busatto Casagrande and Marcelo R. Stemmer. 2014. Abnormal Motion Analysis for Tracking-Based Approaches Using Region-Based Method with Mobile Grid. Journal of Image and Graphics. Papers 2(1), 22-27.

[3]

Karthik Dinesh and Sumana Gupta. 2014. Video Stabilization, Camera Motion Pattern Recognition and Motion Tracking Using Spatiotemporal Regularity Flow. Journal of Image and Graphics. Papers 2(1), 33-40.

[4]

Saad A. Yaseen and Sreela Sasi. 2014. Robust Algorithm for Object Detection and Tracking in a Dynamic Scene. Journal of Image and Graphics. Papers 2(1), 41-45 (2014).

[5]

Chin-Shiuh Shieh, Yong-Shixa Jhan, Yuan-Li Liu, Mong-Fong Horng, and Tsair-Fwu Lee. 2018. Video Object Tracking with Heuristic Optimization Methods. Journal of Image and Graphics, Papers 6(2), 95-99.

[6]

M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg. 2017. ECO: Efficient convolution operators for tracking. IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 6931-6939.

[7]

Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan, and S. Wang. 2017. Learning dynamic siamese network for visual object tracking. IEEE International Conference on Computer Vision (ICCV). 1781-1789.

[8]

M. M¨uller, N. Smith, and B. Ghanem. 2016. A benchmark and simulator for uav tracking. European Conference on Computer Vision (ECCV). 445-461.

[9]

B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. Siamrpn++: Evolution of siamese visual tracking with very deep networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4277–4286.

[10]

M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg. 2019. Atom: Accurate tracking by overlap maximization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4655–4664.

[11]

E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke. 2017. Youtube-bounding boxes: A large high-precision human-annotated data set for object detection in video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7464–7473.

[12]

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese. 2021. Generalized intersection over union: A metric and a loss for bounding box regression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 658–666.

[13]

F. Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1800–1807.

[14]

Prafulla Dhariwal, Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. arXiv. https://arxiv.org/abs/2105.05233.

[15]

M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, and B. Ghanem. 2018. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. European Conference on Computer Vision (ECCV). 300-317.

[16]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.

[17]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. 2020. End-to-end object detection with transformers. European Conference on Computer Vision (ECCV). 213–229.

[18]

G. Wang, C. Luo, X. Sun, Z. Xiong, and W. Zeng. 2020. Tracking by instance detection: A meta-learning approach. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6287–6296.

[19]

M. Danelljan, L. Van Gool, and R. Timofte. 2020. Probabilistic regression for visual tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7181–7190.

[20]

Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. Proceedings of the 37th International Conference on Machine Learning (PMLR). 1691–1703.

[21]

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, and William Chan. 2020. Wavegrad: Estimating gradients for waveform generation. Proceedings of the 37th International Conference on Machine Learning (PMLR). 1691–1703.

[22]

Federico A. Galatolo, Mario G. C. A. Cimino, and Gigliola Vaglini. 2021. Generating images from caption and vice versa via clip-guided generative latent space search. arXiv. https://arxiv.org/abs/2102.01645.

[23]

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2021. Diffwave: A versatile diffusion model for audio synthesis. arXiv. https://arxiv.org/abs/2009.09761.

[24]

Eric Luhman and Troy Luhman. 2021. Knowledge distillation in iterative generative models for improved sampling speed. arXiv. https://arxiv.org/abs/2101.02388.

[25]

Alex Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. arXiv. https://arxiv.org/abs/2102.09672.

[26]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, and 2019. Pytorch: An imperative style, high-performance deep learning library. arXiv. https://arxiv.org/abs/1912.01703.

[27]

Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. 2021. Image super-resolution via iterative refinement. arXiv. https://arxiv.org/abs/2104.07636.

[28]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 8110–8119.

[29]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. arXiv. https://arxiv.org/abs/2006.11239.

[30]

Yang Song and Stefano Ermon. 2020. Improved techniques for training Score-Based generative models. arXiv. https://arxiv.org/abs/2006.09011.

Cited By

Abdelaziz OShehata MMohamed M(2024)Beyond traditional visual object tracking: a surveyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02345-716:2(1435-1460)Online publication date: 26-Aug-2024
https://doi.org/10.1007/s13042-024-02345-7

Index Terms

Siamese Network for Object Tracking with Diffusion Model
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Twofold Siamese Network with Attentional Feature Fusion for Object Tracking
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence

Object tracking is still a critical and challenging problem in computer vision. More and more researchers pay attention to applying deep learning to obtain the powerful feature for robust tracking. Nowadays, feature fusion is an essential part of ...
Template Attentional Siamese Network for Object Tracking
ICVIP '18: Proceedings of the 2018 2nd International Conference on Video and Image Processing

Recent years, visual object tracking has attracted more and more attention as a fundamental topic. Many deep based trackers, especially Siamese Network based trackers, have achieved state-of-the-art performance on multiple benchmarks. However, most of ...
Visual Object Tracking via an Improved Lightweight Siamese Network
Pattern Recognition and Computer Vision
Abstract
Object tracking has recently raised a great research interest, and many Siamese network-based trackers have achieved the state-of-the-art performances. However, by analyzing their network structure, it can be found that those feature extraction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

May 2023

711 pages

ISBN:9798400708237

DOI:10.1145/3604078

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

State Key Laboratory for Novel Software Technology, Nanjing University
Natural Science Foundation of China
Natural Science Foundation of Jiangsu Province

Conference

ICDIP 2023

ICDIP 2023: The 15th International Conference on Digital Image Processing

May 19 - 22, 2023

Nanjing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
53
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Abdelaziz OShehata MMohamed M(2024)Beyond traditional visual object tracking: a surveyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02345-716:2(1435-1460)Online publication date: 26-Aug-2024
https://doi.org/10.1007/s13042-024-02345-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten