Abstract
Visual object tracking methods based on Siamese network are often difficult to distinguish objects with the same semantic or similar appearance as tracking target in tracking process due to the lack of discriminating strategies for the confusing objects. We propose a visual object tracking method based on Siamese modulation network. It takes the given bounding box in the target frame and the current frame as input, and fuses these multi-layer convolutional features to obtain more target appearance information of bounding box and the current frame. The feature modulator generates feature modulation vector based on the given bounding box to enhance visual appearance information of target instance in multi-layer feature of the current frame, so as to make target instance obtain higher score in response map of region proposal network, and thus realize target instance-specific tracking task. Experiments on two public benchmark datasets, OTB2015 and VOT2018, show that the proposed tracker has a competitive performance among other state-of-the art trackers.
Similar content being viewed by others
References
L Bertinetto, J Valmadre, JF Henriques, et al. (2016). Fully-convolutional Siamese networks for object tracking[C]. 2016 European Conference on Computer Vision(ECCV), Springer International publishing
G Bhat, J Johnander, M Danelljan, et al. (2018). Unveiling the Power of Deep Tracking[C]. 2018 European Conference on Computer Vision(ECCV), Springer International publishing
DS Bolme, JR Beveridge, BA Draper, et al (2010). Visual object tracking using adaptive correlation filters[C], 2010 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
F Chelsea, A Pieter, L Sergey (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv: 1703.03400
M Danelljan, G Bhat, F Shahbaz Khan, and M Felsberg (2017) [C]. Eco: Efficient convolution operators for tracking. 2017 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
M Danelljan, G Hager, FS Khan, and M Felsberg (2015). Learning spatially regularized correlation filters for visual tracking[C]. 2015 IEEE International Conference on Computer Vision(ICCV)
M Danelljan, G Hager, FS Khan, et al. (2014). Accurate scale estimation for robust visual tracking[C]. 2014 British Machine Vision Conference(BMVC)
M Danelljan, G Hager, FS Khan, et al. (2015). Convolutional features for correlation filter based visual tracking[C]. 2015 IEEE International Conference on Computer Vision Workshop (ICCVW)
M Danelljan, A Robinson, FS Khan, and M Felsberg (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]. 2016 European Conference on Computer Vision(ECCV), Springer International publishing
D Fan, W Wang, M Cheng, et al. (2019). Shifting More Attention to Video Salient Object Detection[C]. 2019 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
HK Galoogahi, A Fagg, and S Lucey (2017). Learning background-aware correlation filters for visual tracking[C]. 2017 IEEE International Conference on Computer Vision(ICCV)
B Hariharan and R Girshick (2017). Low-shot visual recognition by shrinking and hallucinating features[C]. 2017 IEEE International Conference on Computer Vision(ICCV)
K He, X Zhang, S Ren, J Sun (2015). Deep residual learning for image recognition[C]. 2015 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
D Held, S Thrun, S Savarese (2016). Learning to track at 100 fps with deep regression networks[C]. 2016 European Conference on Computer Vision(ECCV), Springer International publishing
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters[J]. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
JF Henriques, R Caseiro, P Martins, et al. (2012). Exploiting the circulant structure of tracking-by-detection with kernels[C]. 2012 European Conference on Computer Vision(ECCV), Springer International publishing
J Hu, J Lu, Y Tan (2014). Discriminative deep metric learning for face verification in the wild[C]. 2014 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
M Kristan, A Leonardis, J Matas, M Felsberg, R Pflflugfelder, L Cehovin Zajc, T Vojir, G Hager, A Lukezic, A Eldesokey, G Fernandez (2017). The visual object tracking VOT2017 challenge results[C]. 2017 IEEE International Conference on Computer Vision Workshop(ICCVW)
M Kristan, A Leonardis, J Matas, M Felsberg, R Pfugfelder, LC Zajc, T Vojir, G Bhat, A Lukezic, A Eldesokey, G Fernandez, and et al. (2018). The sixth visual object tracking vot2018 challenge results[C]. 2018 European Conference on Computer Vision(ECCV)
Lee KH, Hwang JN (2015) On-road pedestrian tracking across multiple driving recorders[J]. IEEE Transactions on Multimedia 17(9):1429–1438
F Li, C Tian, W Zuo, et al. (2018). Learning spatial-temporal regularized correlation filters for visual tracking[C]. 2018 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
B Li, W Wu, Q Wang, F Zhang, J Xing, J Yan (2019). SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks[C]. 2019 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
B Li, J Yan, W Wu, Z Zhu, X Hu (2018). High performance visual tracking with siamese region proposal network[C]. 2018 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
Li J, Zhou X, Chan S, Chen S (2017) Object tracking using a convolutional network and a structured output SVM[J]. Computa-tional visual media 003(004):325–335
X Lu, B Ni, C Ma, X Yang (2019). Adaptive region proposal with channel regularization for robust Object tracking[J]. IEEE Transactions on Circuits and Systems for Video Technology, doi: https://doi.org/10.1109/TCSVT.2019.2944654
Lu X, Ni B, Ma C, Yang X (2019) Learning Transform-Aware Attentive Network for Object Tracking[J]. Neurocomputing 349(JUL.15):133–144
X Lu, W Wang, C Ma, et al. (2019). See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks[C]. 2019 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
Lukežic A, Zajc LC, Kristan M (2017) Deformable parts correlation filters for robust visual tracking[J]. IEEE transactions on cybernetics 48(6):1849–1861
Y Qin, S He, Y Zhao, et al. (2016). Learning multi-domain convolutional neural networks for visual tracking[C]. 2016 2nd International Conference on Artificial Intelligence and Industrial Engineering(AIIE)
S Ravi, H Larochelle (2017). Optimization as a model for few-shot learning[C]. 2017 International Conference on Learning Representations(ICLR)
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
S Tang, M Andriluka, B Andres, B Schiele (2017). Multiple people tracking by lifted multicut and person reidentification[C]. IEEE conference on computer vision and pattern recognition(CVPR), IEEE, 2017
J Valmadre, L Bertinetto, JF Henriques, et al. (2017). End-to-end representation learning for correlation filter based tracking[C]. 2017 IEEE conference on computer vision and pattern recognition(CVPR), IEEE
Q Wang, J Gao, J Xing, M Zhang, and W Hu (2017). DCFNet: Discriminant correlation filters network for visual tracking. arXiv preprint arXiv: 1704.04057
Wang Z, Zou C, Cai W (2020) Small sample classification of Hyperspectral remote sensing images based on sequential joint Deeping Learning model[J]. IEEE Access 8:71353–71363. https://doi.org/10.1109/ACCESS.2020.2986267
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
B Xiao, H Wu, Y Wei (2018). Simple baselines for human pose estimation and tracking[C]. 2018 European Conference on Computer Vision(ECCV), Springer International publishing
J Xing, H Ai, S Lao (2010). Multiple human tracking based on multi-view upper-body detection and discriminative learning[C]. 20th International Conference on Pattern Recognition(ICPR)
Z Zhu, Q Wang, B Li, W Wu, J Yan, W Hu (2018). Distractor-aware siamese networks for visual object tracking[C]. 2018 European Conference on Computer Vision(ECCV), Springer International publishing
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fu, Lh., Ding, Y., Du, Yb. et al. SiamMN: Siamese modulation network for visual object tracking. Multimed Tools Appl 79, 32623–32641 (2020). https://doi.org/10.1007/s11042-020-09546-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09546-6