Skip to main content
Log in

SiamDAG: Siamese dynamic receptive field and global context modeling network for visual tracking

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Trackers based on anchor-free strategy have achieved a great success in recent years. However, they have limitations. To be specific, receptive fields of their models in each layer are fixed, so that the flexibility is lost. Then, they have no effective modeling of global context. Therefore, our model SiamDAG is put forward in this paper. The core part is Global Context - Selective Kernel block. This part can dynamically adjust its receptive field size based on multiple scales of input information, and model the global context effectively so that the tracker has the global understanding of a visual scene. Meanwhile, the Intersection over Union (IoU) prediction branch linking classification task and regression task is added. Our tracker was evaluated in VOT2019, OTB100 and GOT-10 k benchmark datasets, which achieved good results. It can also run up to 65FPS, far above the real-time requirement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

All the training data sets and test data sets used in our experiment are public and can be downloaded from their official websites. The results of all published algorithms can also be obtained from the websites provided by their respective authors.

References

  1. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865

  2. Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6182–6191

  3. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0

  4. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6668–6677

  5. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. Springer International Publishing

  6. Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6638–6646

  7. Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4660–4669

  8. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) CenterNet: Keypoint triplets for object detection. In: International Conference on Computer Vision. pp. 6569–6578

  9. Guo Q, Wei F, Zhou C, Rui H, Song W (2017) Learning Dynamic Siamese Network for Visual Object Tracking. In: 2017 IEEE International Conference on Computer Vision (ICCV)

  10. Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6269–6277

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778

  12. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, pp 749–765

  13. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for Mobile vision applications. arXiv preprint arXiv:170404861

  14. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7132–7141

  15. Huang L, Zhao X, Huang K (2019) GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43:1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464

    Article  Google Scholar 

  16. Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. arXiv preprint arXiv:150602025

  17. Kristan M, Matas J, Leonardis A, Felsberg M, Pflugfelder R, Kamarainen J-K, Cehovin Zajc L, Drbohlav O, Lukezic A (2019) Berg a the seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25:1097–1105

    Google Scholar 

  19. Law H, Deng J (2020) CornerNet: detecting objects as paired Keypoints. Int J Comput Vis 128(3):642–656

    Article  Google Scholar 

  20. L-h F, Ding Y, Y-b D, Zhang B, Wang L-y, Wang D (2020) SiamMN: Siamese modulation network for visual object tracking. Multimed Tools Appl 79(43):32623–32641

    Google Scholar 

  21. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8971–8980

  22. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4282–4291

  23. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 510–519

  24. Li M-W, Wang Y-T, Geng J, Hong W-C (2021) Chaos cloud quantum bat hybrid optimization algorithm. Nonlinear Dynamics 103(1):1167–1193

    Article  Google Scholar 

  25. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  26. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4293–4302

  27. Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5296–5305

  28. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Machine Intelli 39(6):1137–1149

    Article  Google Scholar 

  29. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9

  31. Tian Z, Shen C, Chen H, He T (2020) FCOS: a simple and strong anchor-free object detector. IEEE Trans Pattern Anal Machine Intell PP 99:1–1

    Google Scholar 

  32. Tripathi AS, Danelljan M, Van Gool L, Timofte R (2019) Tracking the known and the unknown by leveraging semantic information. In: BMVC. p 6

  33. Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2805–2813

  34. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:170603762

  35. Wang Q, Teng Z, Xing J, Gao J, Maybank S (2018) Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

  36. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7794–7803

  37. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1328–1338

  38. Wang G, Luo C, Xiong Z, Zeng W (2019) Spm-tracker: series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3643–3652

  39. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). pp. 3–19

  40. Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226

    Article  Google Scholar 

  41. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1492–1500

  42. Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol 07. pp. 12549–12556

  43. Yang T, Chan AB (2018) Learning dynamic memory networks for object tracking. In: Proceedings of the European conference on computer vision (ECCV). pp. 152–167

  44. Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: An advanced object detection network. In: Proceedings of the 24th ACM international conference on Multimedia. pp 516–520

  45. Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4591–4600

  46. Zhang YF, Xia T, Liu Y (2019) 3D convolution network and Siamese-attention mechanism for expression recognition. Multimed Tools Appl 78(21):30355–30371

    Article  Google Scholar 

  47. Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-Aware Anchor-Free Tracking. In, Cham. Computer Vision – ECCV 2020. Springer International Publishing, pp 771–787

  48. Zhao L, Wang J, Li X, Tu Z, Zeng W (2016) Deep convolutional neural networks with merge-and-run mappings. arXiv preprint arXiv:161107718

  49. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 101–117

  50. Zhu J, Zhang G, Zhou S, Li K (2021) Relation-aware Siamese region proposal network for visual object tracking. Multimed Tools Appl 80(10):15469–15485

    Article  Google Scholar 

Download references

Code availability

The source code is available at https://github.com/Huangggjian/SiamDAG.

Funding

This research was funded by National Natural Science Foundation of China(NSFC, 61972123) and Zhejiang Provincial Key Lab of Equipment Electronics(2019E10009).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, J.H. and Q.S.; Formal analysis, J.H. and Q.S.; Investigation, J.H., Q.S. and Z.L.; Methodology, Z.L. and C.Z.; Project administration, Q.S.; Resources, Q.S.; Supervision, B.Y.; Validation, J.H.; Writing—original draft, J.H. and Q.S.; Writing—review & editing, J.H. and C.Z.

Corresponding author

Correspondence to Hai-bing Yin.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Written informed consent for publication was obtained from all participants.

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheng, Qh., Huang, J., Li, Z. et al. SiamDAG: Siamese dynamic receptive field and global context modeling network for visual tracking. Multimed Tools Appl 82, 681–701 (2023). https://doi.org/10.1007/s11042-022-12008-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12008-w

Keywords

Navigation