Split-merge-excitation: a robust channel-wise feature attention mechanism applied to MDNet tracking

Wu, Han; Liu, Guizhong

doi:10.1007/s11042-022-12752-z

Split-merge-excitation: a robust channel-wise feature attention mechanism applied to MDNet tracking

Published: 13 May 2022

Volume 81, pages 40737–40754, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

200 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Object tracking is a fundamental problem of computer vision. Although being studied for decades, the single object tracking problem has not been completely solved, since there exist various challenges in the real physical world, such as object deformation, complex background and imperfect imaging, which make tracking difficult. For these challenges, we design a robust feature extraction network. Specifically, we propose a novel channel-wise feature attention mechanism, which is integrated into the pipeline of a well-known convolutional neural network based visual tracking algorithm. It is crucial to represent the object robustly. Due to the representative feature, the tracking performance is improved. In experiments, we test the proposed tracking algorithm in OTB100, VOT2018, VOT2020 and VOT-TIR datasets. Compared to the baseline algorithm, our proposed algorithm obtains consistent performance improvement for different benchmarks with absolute increase of tracking success score in OTB100 up to 0.6, and absolute increase of EAO up to 0.022, 0.007, and 0.008 in VOT2018, VOT2020, VOT-TIR2015 respectively. The source codes are publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-granularity Feature Fusion for Transformer-Based Single Object Tracking

Evota: an enhanced visual object tracking network with attention mechanism

Article 17 August 2023

An Advanced Version of MDNet for Visual Tracking

References

Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: European conference on computer vision 2016 workshops. https://doi.org/10.1007/978-3-319-48881-3_56, pp 850–865
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: In 2019 IEEE/CVF international conference on computer vision. https://doi.org/10.1109/ICCV.2019.00628, pp 6181–6190
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW). https://doi.org/10.1109/ICCVW.2019.00246, pp 1971–1980
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: ECCV 2020 - 16th European conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Danelljan M, Häger G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. Int Conf Comput Vis 4310–4318:2015. https://doi.org/10.1109/ICCV.2015.490
Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking. In: 2017 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2017.733, pp 6931–6939
Danelljan M, Hager G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE transactions on pattern analysis and machine intelligence, pattern analysis and machine Intelligence, IEEE Transactions on, IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928
Article Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: accurate tracking by overlap maximization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2019.00479, pp 4655–4664
Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), computer vision and pattern recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR. https://doi.org/10.1109/CVPR42600.2020.00721. IEEE, pp 7181–7190
Fan H, Bai H, Lin L, Yang F, Chu P, Deng G, Yu S, Harshit HM, Liu J, Xu Y, Liao C, Yuan L, Ling H (2021) LaSOT: a high-quality large-scale single object tracking benchmark international. J Comput Vis 129(2):439–461. https://doi.org/10.1007/s11263-020-01387-y
Article Google Scholar
Felsberg M., Berg A., Hager G., Ahlberg J., Kristan M., Matas J., Pflugfelder R. (2015) The thermal infrared visual object tracking VOT-TIR2015 challenge results. In Proceedings of the ieee international conference on computer vision workshops (pp. 76–88). https://doi.org/10.1109/ICCVW.2015.86
Galoogahi HK, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, 2017-October. https://doi.org/10.1109/ICCV.2017.129, pp 1144–1152
Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00478. IEEE, pp 4644–4654
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.90, pp 770–778
Henriques J, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596. https://doi.org/10.1109/TPAMI.2014.2345390
Article Google Scholar
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: 2018 IEEE/CVF computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2018.00378. IEEE, pp 3588–3597
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42 (8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Article Google Scholar
Huang L, Zhao X, Huang K (2021) GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464
Article Google Scholar
Jiang F, Kong B, Li J, Dashtipour K, Gogate M (2021) Robust visual saliency optimization based on bidirectional Markov chains. Cogn Comput 13(1):69. https://doi.org/10.1007/s12559-020-09724-6
Article Google Scholar
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc LC et al (2018) The sixth visual object tracking VOT2018 challenge results, vol 11129 LNCS. Springer Verlag. https://doi.org/10.1007/978-3-030-11009-3_1
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Kamarainen J-K, Zajc LC et al (2020) The eighth visual object tracking VOT2020 challenge results. In: European conference on computer vision, workshops ECCV 2020. Lecture notes in computer science. https://doi.org/10.1007/978-3-030-68238-5_39, vol 12539. Springer, Cham
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2018.00935, pp 8971–8980
Li F, Tian C, Zuo W, Zhang L, Yang M-H (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: 2018 IEEE/CVF conference on computer vision and pattern recognition,(CVPR). https://doi.org/10.1109/CVPR.2018.00515. IEEE, pp 4904–4913
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: In 2019 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2019.00441, pp 4277–4286
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00060. IEEE, pp 510–519
Li X, Sun W, Wu T (2020) Attentive normalization. ECCV 2020. Lecture notes in computer science, vol 12362. Springer, Cham. https://doi.org/10.1007/978-3-030-58520-4_5
Müller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-030-01246-5_19, vol 11205 LNCS, pp 310–327
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.465, pp 4293–4302
Park E, Berg AC (2018) Meta-tracker: fast and robust online adaptation for visual object trackers. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics): vol 11207 LNCS, pp 587–604. https://doi.org/10.1007/978-3-030-01219-9_35
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings. 1409.1556
Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468. https://doi.org/10.1109/TPAMI.2013.230
Article Google Scholar
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, 2017-January 5000–5008. https://doi.org/10.1109/CVPR.2017.531
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. Paper presented at the advances in neural information processing systems, 2017-December, pp 5999–6009. http://papers.nips.cc/paper/7181-attention-is-all-you-need
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition(CVPR). https://doi.org/10.1109/CVPR.2018.00510. IEEE, pp 4854–4863
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: In 2018 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2018.00813, pp 7794–7803
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), computer vision and pattern recognition (CVPR), 2020 IEEE/CVF conference on, CVPR. https://doi.org/10.1109/CVPR42600.2020.01155. IEEE, pp 11531–11539
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. ECCV 2018. Lecture Notes in computer science, vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1
Google Scholar
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Article Google Scholar
Xu Y, Zhou X, Chen S, Li F (2019) Deep learning for multiple object tracking: a survey. IET Comput Vis (Wiley-Blackwell) 13(4):355–368. https://doi.org/10.0.4.25/iet-cvi.2018.5598
Article Google Scholar
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. ECCV 2020. Lecture notes in computer science, vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_46
Google Scholar
Zhao H, Jia J, Koltun V (2020) Exploring self-attention for image recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR42600.2020.01009. IEEE, pp 10073–10082
Zhou X, Xie L, Zhang P, Zhang Y (2014) An ensemble of deep neural networks for object tracking. In: 2014 IEEE International conference on image processing (ICIP). https://doi.org/10.1109/ICIP.2014.7025169. IEEE, pp 843–847
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: European conference on computer vision. 15th European conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IX. https://doi.org/10.1007/978-3-030-01240-3_7, pp 103–119
Zhu Z, Wu W, Zou W, Yan J (2018) End-to-end flow correlation tracking with spatial-temporal attention. In: 2018 IEEE/CVF conference on computer vision and pattern recognition(CVPR). https://doi.org/10.1109/CVPR.2018.00064. IEEE, pp 548–557

Download references

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
Han Wu & Guizhong Liu

Authors

Han Wu
View author publications
You can also search for this author in PubMed Google Scholar
Guizhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guizhong Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, H., Liu, G. Split-merge-excitation: a robust channel-wise feature attention mechanism applied to MDNet tracking. Multimed Tools Appl 81, 40737–40754 (2022). https://doi.org/10.1007/s11042-022-12752-z

Download citation

Received: 20 March 2021
Revised: 26 July 2021
Accepted: 21 February 2022
Published: 13 May 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11042-022-12752-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Split-merge-excitation: a robust channel-wise feature attention mechanism applied to MDNet tracking

Abstract

Access this article

Similar content being viewed by others

Multi-granularity Feature Fusion for Transformer-Based Single Object Tracking

Evota: an enhanced visual object tracking network with attention mechanism

An Advanced Version of MDNet for Visual Tracking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Split-merge-excitation: a robust channel-wise feature attention mechanism applied to MDNet tracking

Abstract

Access this article

Similar content being viewed by others

Multi-granularity Feature Fusion for Transformer-Based Single Object Tracking

Evota: an enhanced visual object tracking network with attention mechanism

An Advanced Version of MDNet for Visual Tracking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation