Skip to main content

Dual Siamese Channel Attention Networks for Visual Object Tracking

  • Conference paper
  • First Online:
  • 926 Accesses

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 659))

Abstract

Siamese network based trackers have achieved remarkable performance on visual object tracking. The target position is determined by the similarity map produced via cross-correlation over features generated from template branch and search branch. The interaction between the template and search branches is essential for achieving high-performance object tracking task, which is neglected in previous works as features of the two branches are computed separately. In this paper, we propose Dual Siamese Channel Attentions Networks, referred as SiamDCA, which exploits the channel attentions to further improve tracking robustness. Firstly, a convolutional version of Squeeze and Excitation Networks (CSENet) is embedded in backbone to explicitly formulate interdependencies between channels to recalibrate channel-wise feature responses adaptively. Meanwhile, we propose a novel Global Channel Enhancement (GCE) module, which is capable of capturing attention weights of each channel in template branch, so as to normalize the channel characteristics in search branch. We experiment on benchmark OTB2015, VOT2016 and UAV123 where our algorithm demonstrates competitive performance versus other state-of-the-art trackers.

Supported by the National Natural Science Foundation of China (No. 61977052).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  2. Cao, Y., Xu, J., Lin, S., Wei, F.: GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, October 2019

    Google Scholar 

  3. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR, pp. 4660–4669, June 2019

    Google Scholar 

  4. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 6638–6646, July 2017

    Google Scholar 

  5. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4310–4318, December 2015

    Google Scholar 

  6. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  7. Hadfield, S., Bowden, R., Lebeda, K.: The visual object tracking VOT2016 challenge results. In: ECCV Workshops, vol. 9914, pp. 777–823, October 2016

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778, June 2016

    Google Scholar 

  9. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 99, 7132–7141 (2017)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  11. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)

    Google Scholar 

  12. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, pp. 8971–8980, June 2018

    Google Scholar 

  13. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 254–265. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_18

    Chapter  Google Scholar 

  14. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P.: Microsoft COCO: common objects in context. In: ECCV, pp. 740–755 (2014)

    Google Scholar 

  15. Liu, L., Xing, J., Ai, H., Ruan, X.: Hand posture recognition using finger geometric feature. In: ICPR, pp. 565–568 (2013)

    Google Scholar 

  16. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27

    Chapter  Google Scholar 

  17. Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: YouTube-BoundingBoxes: a large high-precision human-annotated data set for object detection in video. In: CVPR, pp. 5296–5305, July 2017

    Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  19. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328–1338, June 2019

    Google Scholar 

  20. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803, June 2018

    Google Scholar 

  21. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. TPAMI 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  22. Xing, J., Ai, H., Lao, S.: Multiple human tracking based on multi-view upper-body detection and discriminative learning. In: ICPR, pp. 1698–1701 (2010)

    Google Scholar 

  23. Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable Siamese attention networks for visual object tracking. In: CVPR, pp. 6728–6737, June 2020

    Google Scholar 

  24. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: ECCV, pp. 103–119 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolin Tian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, W., Tian, X., Zhang, Y., Jia, N., Yang, T., Jiao, L. (2022). Dual Siamese Channel Attention Networks for Visual Object Tracking. In: Shi, Z., Jin, Y., Zhang, X. (eds) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-031-14903-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14903-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14902-3

  • Online ISBN: 978-3-031-14903-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics