Abstract
In recent years, considering a balanced accuracy and efficiency, Fully-Convolutional Siamese network (SiamFC) is widely used in the field of visual tracking. Although SiamFC has achieved great success, it is still frustrated in discrimination especially in the discriminative scene. The main reason for the poor discrimination ability of SiamFC is that during the training process, it pays more attention to fitting the whole dataset than learning discrimination ability to similar objects. In terms of this issue, we propose Ensemble Siamese networks (ESiamFC) for tracking by introducing ensemble learning into SiamFC. In detail, firstly, we map the training dataset ILSVRC2015 into embedded space. Secondly, we use balanced k-means to cluster video features. Thirdly, in each cluster, we apply transfer learning into SiamFC to obtain k base trackers with their preferences. Last but not least, to leverage the diversity of base trackers, we propose a Cluster Weight fusion module which can automatically assign fusion weight to base trackers according to the semantic information of the tracking object. Extensive experiments on multiple benchmarks demonstrate that our tracker outperforms SiamFC in precision with a relative increase of 7.1%, 8.6%, 6.7% on Tcolor128, DTB70, LaSOT, respectively.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li S, Yeung DY (2017) Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. National Conference on Artificial Intelligence, pp. 4140–4146
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans Image Process 24(12):5630–5644. https://doi.org/10.1109/TIP.2015.2482905
Bertinetto L, Valmadra J (2016) Fully-convolutional siamese networks for object tracking. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-319-48881-3_56
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.465
Wang L, Ouyang W, Wang X (2015) Visual tracking with fully convolutional networks. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2015.357
Bolme DS, Beveridge JR (2010) Visual object tracking using adaptive correlation filters. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2010.5539960
Danelljan M, Bhat G, Khan FS (2017) ECO: efficient convolution operators for tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.733
Danelljan M, Hager G, Khan FS (2015) Learning spatially regularized correlation filters for visual tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2015.490
Deng L, Platt JC (2014) Ensemble deep learning for speech recognition. Conference of the International Speech Communication Association, pp.1915–1919
Jan ZM, Verma BK (2020) Multiple strong and balanced cluster-based ensemble of deep learners. Pattern Recogn 107:107420. https://doi.org/10.1016/J.PATCOG.2020.107420
Yin Z, Zhao M, Wang Y (2017) Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Comput Methods Programs Biomed 140:93–110. https://doi.org/10.1016/J.CMPB.2016.12.005
Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53. https://doi.org/10.1109/MCI.2015.2471235
Hartigan JA, Wong MA (1979) A K-Means clustering algorithm. J Royal Stat Soc Seri C-Appl Stat 28(1):100–108. https://doi.org/10.2307/2346830
Li Y, Ang KH, Chong GCY (2006) PID control system analysis and design. IEEE Control Syst Mag 26(1):32–41. https://doi.org/10.1109/MCS.2006.1580152
Fan H, Ling H, Yang F (2019) LaSOT: a high-quality benchmark for large-scale single object tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2019.00552
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2005.177
Jvd W, Schmid C, Verbeek J (2009) Learning color names for real-world applications. IEEE Trans Image Process 18(7):1512–1523. https://doi.org/10.1109/TIP.2009.2019809
Possegger H, Mauthner T, Bischof H (2015) In defense of color-based model-free tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2015.7298823
Razavian AS, Azizpour H, Sullivan J (2014) CNN Features Off-the-Shelf: an astounding baseline for recognition. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPRW.2014.131
Ma C, Huang JB, Yang X (2015) Hierarchical convolutional features for visual tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2015.352
He Z, Fan Y, Zhuang J (2017) Correlation filters with weighted convolution responses. Int Conf Comput Vision. https://doi.org/10.1109/ICCVW.2017.233
Lukezic A, Vojir T, Zajc LC (2017) Discriminative correlation filter with channel and spatial reliability. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.515
Gladh S, Danelljan M, Khan FS (2016) Deep motion features for visual tracking. Int Conf Pattern Recogn. https://doi.org/10.1109/ICPR.2016.7899807
Bhat G, Johnander J, Danelljan M (2018) Unveiling the power of deep tracking. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-030-01216-8_30
Zhang K, Liu W, Wu Y (2016) Robust visual tracking via convolutional networks without training. IEEE Trans Image Process 25(4):1779–1792. https://doi.org/10.1109/TIP.2016.2531283
Smeulders AWM, Chu DM, Cucchiara R (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468. https://doi.org/10.1109/TPAMI.2013.230
Russakovsky O, Deng J, Su H (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252. https://doi.org/10.1007/S11263-015-0816-Y
Huang L, Zhao X, Huang K (2019) GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2957464
Held D, Thrun S, Savarese S (2016) Learning to track at 100 FPS with deep regression networks. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-319-46448-0_45
Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.158
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint. https://arxiv.org/abs/1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.90
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. IEEE Conf Computer Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2019.00472
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: evolution of siamese visual tracking with very deep networks. IEEE Conf Computer Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2019.00441
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: siamese fully convolutional classification and regression for visual tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR42600.2020.00630
Yang K, He Z, Zhou Z, Fan N (2020) SiamAtt: Siamese attention network for visual tracking. Knowled Based Syst. https://doi.org/10.1016/J.KNOSYS.2020.106079
Henriques JF, Caseiro R, Martins P (2015) High-speed tracking with Kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596. https://doi.org/10.1109/TPAMI.2014.2345390
Danelljan M, Häger G, Khan FS (2014) Accurate scale estimation for robust visual tracking. British Mach Vision Conf. https://doi.org/10.5244/C.28.65
Han B, Sim J, Adam H (2017) BranchOut: regularization for online ensemble tracking with convolutional neural networks. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.63
Zhang X, Miao Z, Li Y (2017) Ensemble tracking based on CNN. International symposium on computational intelligence and design, pp. 131–134. https://doi.org/10.1109/ISCID.2017.98
Avidan S, Mitsubishi (2007) Ensemble tracking. IEEE Trans on Pattern Analysis and Machine Intelligence. 29(2): 261–271. https://doi.org/10.1109/TPAMI.2007.35
Zhu G, Wang J, Lu H (2014) Clustering ensemble tracking. Asian Conf Comput Vision. https://doi.org/10.1007/978-3-319-16814-2_25
Han Y, Zhuang P, Zhou T (2020) Ensemble tracking based on diverse collaborative framework with multi-cue dynamic fusion. IEEE Trans Multimed 22(10):2698–2710. https://doi.org/10.1109/TMM.2019.2958759
Choi J, Chang HJ, Yun S (2017) Attentional correlation filter network for adaptive visual tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.513
Meshgi K, Oba S, Ishii S (2018) Efficient diverse ensemble for discriminative co-tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2018.00506
Guo J, Xu T (2017) Deep ensemble tracking. IEEE Signal Process Lett 24(10):1562–1566. https://doi.org/10.1109/LSP.2017.2749458
Tang M, Yu B, Zhuang F (2018) High-Speed Tracking with Multi-kernel Correlation Filters. Computer Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2018.00512
Fan H, Ling H (2017) Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2017.585
Danelljan M, Hager G, Khan FS (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928
Wang N, Zhou W, Hong R (2018) Multi-cue Correlation Filters for Robust Visual Tracking. Comput Vision Pattern Recogn, pp. 4844–4853
Bai Q, Wu Z, Sclaroff S (2013) Randomized ensemble tracking. Int Conf Comput Vision. https://doi.org/10.1109/ICCV.2013.255
Zhang L, Varadarajan J, Suganthan PN (2017) Robust visual tracking using oblique random forests. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.617
Jiang C, Xiao J, Xie Y (2018) Siamese network ensemble for visual tracking. Neurocomputing 275:892–2903. https://doi.org/10.1016/J.NEUCOM.2017.10.043
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Neural Inf Process Syst, pp. 1097–1105
Burges CJC, Lucent A (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167. https://doi.org/10.1023/A:1009715923555
Aarts EE, Korst JJ (2016) Simulated annealing and Boltzmann machines. Handbook of Brain Theory Neural Netw, pp. 1039–1044
Bertinetto L, Valmadre J, Golodetz S (2016) Staple: complementary learners for real-time tracking. Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.156
Galoogahi HK, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. IEEE Comput Soc. https://doi.org/10.1109/ICCV.2017.129
Danelljan M, Hager G, Khan FS (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2016.159
Li F, Tian C, Zou W (2018) Learning spatial-temporal regularized correlation filters for visual tracking. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2018.00515
Yang L, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. Eur Conf Comput Vision. https://doi.org/10.1007/978-3-319-16181-5_18
Acknowledgements
This work is supported by the Foundation of National Natural Science Foundation of China (Grant No.61972307).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, H., Liu, G., Zhang, Y. et al. Ensemble siamese networks for object tracking. Neural Comput & Applic 34, 8173–8191 (2022). https://doi.org/10.1007/s00521-022-06911-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-06911-4