Skip to main content
Log in

SiamMN: Siamese modulation network for visual object tracking

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual object tracking methods based on Siamese network are often difficult to distinguish objects with the same semantic or similar appearance as tracking target in tracking process due to the lack of discriminating strategies for the confusing objects. We propose a visual object tracking method based on Siamese modulation network. It takes the given bounding box in the target frame and the current frame as input, and fuses these multi-layer convolutional features to obtain more target appearance information of bounding box and the current frame. The feature modulator generates feature modulation vector based on the given bounding box to enhance visual appearance information of target instance in multi-layer feature of the current frame, so as to make target instance obtain higher score in response map of region proposal network, and thus realize target instance-specific tracking task. Experiments on two public benchmark datasets, OTB2015 and VOT2018, show that the proposed tracker has a competitive performance among other state-of-the art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. L Bertinetto, J Valmadre, JF Henriques, et al. (2016). Fully-convolutional Siamese networks for object tracking[C]. 2016 European Conference on Computer Vision(ECCV), Springer International publishing

  2. G Bhat, J Johnander, M Danelljan, et al. (2018). Unveiling the Power of Deep Tracking[C]. 2018 European Conference on Computer Vision(ECCV), Springer International publishing

  3. DS Bolme, JR Beveridge, BA Draper, et al (2010). Visual object tracking using adaptive correlation filters[C], 2010 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  4. F Chelsea, A Pieter, L Sergey (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv: 1703.03400

  5. M Danelljan, G Bhat, F Shahbaz Khan, and M Felsberg (2017) [C]. Eco: Efficient convolution operators for tracking. 2017 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  6. M Danelljan, G Hager, FS Khan, and M Felsberg (2015). Learning spatially regularized correlation filters for visual tracking[C]. 2015 IEEE International Conference on Computer Vision(ICCV)

  7. M Danelljan, G Hager, FS Khan, et al. (2014). Accurate scale estimation for robust visual tracking[C]. 2014 British Machine Vision Conference(BMVC)

  8. M Danelljan, G Hager, FS Khan, et al. (2015). Convolutional features for correlation filter based visual tracking[C]. 2015 IEEE International Conference on Computer Vision Workshop (ICCVW)

  9. M Danelljan, A Robinson, FS Khan, and M Felsberg (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]. 2016 European Conference on Computer Vision(ECCV), Springer International publishing

  10. D Fan, W Wang, M Cheng, et al. (2019). Shifting More Attention to Video Salient Object Detection[C]. 2019 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  11. HK Galoogahi, A Fagg, and S Lucey (2017). Learning background-aware correlation filters for visual tracking[C]. 2017 IEEE International Conference on Computer Vision(ICCV)

  12. B Hariharan and R Girshick (2017). Low-shot visual recognition by shrinking and hallucinating features[C]. 2017 IEEE International Conference on Computer Vision(ICCV)

  13. K He, X Zhang, S Ren, J Sun (2015). Deep residual learning for image recognition[C]. 2015 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  14. D Held, S Thrun, S Savarese (2016). Learning to track at 100 fps with deep regression networks[C]. 2016 European Conference on Computer Vision(ECCV), Springer International publishing

  15. Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters[J]. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  16. JF Henriques, R Caseiro, P Martins, et al. (2012). Exploiting the circulant structure of tracking-by-detection with kernels[C]. 2012 European Conference on Computer Vision(ECCV), Springer International publishing

  17. J Hu, J Lu, Y Tan (2014). Discriminative deep metric learning for face verification in the wild[C]. 2014 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  18. M Kristan, A Leonardis, J Matas, M Felsberg, R Pflflugfelder, L Cehovin Zajc, T Vojir, G Hager, A Lukezic, A Eldesokey, G Fernandez (2017). The visual object tracking VOT2017 challenge results[C]. 2017 IEEE International Conference on Computer Vision Workshop(ICCVW)

  19. M Kristan, A Leonardis, J Matas, M Felsberg, R Pfugfelder, LC Zajc, T Vojir, G Bhat, A Lukezic, A Eldesokey, G Fernandez, and et al. (2018). The sixth visual object tracking vot2018 challenge results[C]. 2018 European Conference on Computer Vision(ECCV)

  20. Lee KH, Hwang JN (2015) On-road pedestrian tracking across multiple driving recorders[J]. IEEE Transactions on Multimedia 17(9):1429–1438

    Article  Google Scholar 

  21. F Li, C Tian, W Zuo, et al. (2018). Learning spatial-temporal regularized correlation filters for visual tracking[C]. 2018 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  22. B Li, W Wu, Q Wang, F Zhang, J Xing, J Yan (2019). SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks[C]. 2019 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  23. B Li, J Yan, W Wu, Z Zhu, X Hu (2018). High performance visual tracking with siamese region proposal network[C]. 2018 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  24. Li J, Zhou X, Chan S, Chen S (2017) Object tracking using a convolutional network and a structured output SVM[J]. Computa-tional visual media 003(004):325–335

    Article  Google Scholar 

  25. X Lu, B Ni, C Ma, X Yang (2019). Adaptive region proposal with channel regularization for robust Object tracking[J]. IEEE Transactions on Circuits and Systems for Video Technology, doi: https://doi.org/10.1109/TCSVT.2019.2944654

  26. Lu X, Ni B, Ma C, Yang X (2019) Learning Transform-Aware Attentive Network for Object Tracking[J]. Neurocomputing 349(JUL.15):133–144

    Article  Google Scholar 

  27. X Lu, W Wang, C Ma, et al. (2019). See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks[C]. 2019 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  28. Lukežic A, Zajc LC, Kristan M (2017) Deformable parts correlation filters for robust visual tracking[J]. IEEE transactions on cybernetics 48(6):1849–1861

    Article  Google Scholar 

  29. Y Qin, S He, Y Zhao, et al. (2016). Learning multi-domain convolutional neural networks for visual tracking[C]. 2016 2nd International Conference on Artificial Intelligence and Industrial Engineering(AIIE)

  30. S Ravi, H Larochelle (2017). Optimization as a model for few-shot learning[C]. 2017 International Conference on Learning Representations(ICLR)

  31. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  32. S Tang, M Andriluka, B Andres, B Schiele (2017). Multiple people tracking by lifted multicut and person reidentification[C]. IEEE conference on computer vision and pattern recognition(CVPR), IEEE, 2017

  33. J Valmadre, L Bertinetto, JF Henriques, et al. (2017). End-to-end representation learning for correlation filter based tracking[C]. 2017 IEEE conference on computer vision and pattern recognition(CVPR), IEEE

  34. Q Wang, J Gao, J Xing, M Zhang, and W Hu (2017). DCFNet: Discriminant correlation filters network for visual tracking. arXiv preprint arXiv: 1704.04057

  35. Wang Z, Zou C, Cai W (2020) Small sample classification of Hyperspectral remote sensing images based on sequential joint Deeping Learning model[J]. IEEE Access 8:71353–71363. https://doi.org/10.1109/ACCESS.2020.2986267

    Article  Google Scholar 

  36. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  37. B Xiao, H Wu, Y Wei (2018). Simple baselines for human pose estimation and tracking[C]. 2018 European Conference on Computer Vision(ECCV), Springer International publishing

  38. J Xing, H Ai, S Lao (2010). Multiple human tracking based on multi-view upper-body detection and discriminative learning[C]. 20th International Conference on Pattern Recognition(ICPR)

  39. Z Zhu, Q Wang, B Li, W Wu, J Yan, W Hu (2018). Distractor-aware siamese networks for visual object tracking[C]. 2018 European Conference on Computer Vision(ECCV), Springer International publishing

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-hua Fu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Lh., Ding, Y., Du, Yb. et al. SiamMN: Siamese modulation network for visual object tracking. Multimed Tools Appl 79, 32623–32641 (2020). https://doi.org/10.1007/s11042-020-09546-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09546-6

Keywords

Navigation