Abstract
Existing Siamese-based trackers divide visual tracking into two stages, i.e., feature extraction (backbone subnetwork), and prediction (head subnetwork). However, they mainly implement task-level supervision (classification and regression), barely considering the feature-level supervision in the knowledge learning process, which could result in deficient knowledge interaction among the features of the tracker’s targets and background interference during the online tracking process. To solve the issues, this paper proposes an educational pattern-guided self-knowledge distillation methodology by guiding Siamese-based trackers to learn feature knowledge by themselves, which can serve as a generic training protocol to improve any Siamese-based tracker. Our key insight is to utilize two educational self-distillation patterns, i.e., focal self-distillation and discriminative self-distillation, to educate the tracker to possess self-learning ability. The focal self-distillation pattern educates the tracking network to focus on valuable pixels and channels by decoupling the spatial learning and channel learning of target features. The discriminative self-distillation pattern aims at maximizing the discrimination between foreground and background features, ensuring that the trackers are unaffected by background pixels. As one of the first attempts to introduce self-knowledge distillation into the visual tracking field, our method is effective and efficient and has a strong generalization ability, which might be instructive for other research. Codes and data are publicly available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: HiFT: hierarchical feature transformer for aerial tracking. In: ICCV, pp. 15437–15446 (2021). https://doi.org/10.1109/ICCV48922.2021.01517
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: SiamAPN++: Siamese attentional aggregation network for real-time UAV tracking. In: IEEE IROS, pp. 3086–3092 (2021). https://doi.org/10.1109/IROS51168.2021.9636309
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: CVPR, pp. 6667–6676 (2020). https://doi.org/10.1109/CVPR42600.2020.00670
Dong, X., Shen, J., Shao, L., Porikli, F.: CLNet: a compact latent network for fast adjusting Siamese trackers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 378–395. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_23
Du, F., Liu, P., Zhao, W., Tang, X.: Correlation-guided attention for corner detection based visual tracking. In: CVPR, pp. 6835–6844 (2020). https://doi.org/10.1109/CVPR42600.2020.00687
Fan, H., Bai, H., Lin, L., Yang, F., Ling, H.: LaSOT: a high-quality large-scale single object tracking benchmark. IJCV 129, 439–461 (2020)
Fu, C., Cao, Z., Li, Y., et al.: Onboard real-time aerial tracking with efficient Siamese anchor proposal network. IEEE TGRS 60, 1–13 (2022). https://doi.org/10.1109/TGRS.2021.3083880
Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for Speed: a benchmark for higher frame rate object tracking. In: ICCV, pp. 1134–1143 (2017). https://doi.org/10.1109/ICCV.2017.128
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: CVPR, June 2021
Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: CVPR, pp. 6268–6276 (2020). https://doi.org/10.1109/CVPR42600.2020.00630
Guo, M., et al.: Learning target-aware representation for visual tracking via informative interactions (2022)
Ji, M., Shin, S., Hwang, S., Park, G., Moon, I.C.: Refine myself by teaching myself: feature refinement via self-knowledge distillation. In: CVPR, pp. 10659–10668 (2021). https://doi.org/10.1109/CVPR46437.2021.01052
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, pp. 4277–4286 (2019). https://doi.org/10.1109/CVPR.2019.00441
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, pp. 8971–8980 (2018). https://doi.org/10.1109/CVPR.2018.00935
Li, S., Yeung, D.Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: ICCV, pp. 4140–4146 (2017)
Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE TIP 24(12), 5630–5644 (2015). https://doi.org/10.1109/TIP.2015.2482905
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Shen, Q., et al.: Unsupervised learning of accurate Siamese tracking. In: CVPR, pp. 8091–8100 (2022). https://doi.org/10.1109/CVPR52688.2022.00793
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: ICCV, pp. 3712–3721 (2019). https://doi.org/10.1109/ICCV.2019.00381
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: CVPR, pp. 4586–4595 (2019). https://doi.org/10.1109/CVPR.2019.00472
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 771–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_46
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Q., Zhang, X. (2024). Educational Pattern Guided Self-knowledge Distillation for Siamese Visual Tracking. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-8181-6_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8180-9
Online ISBN: 978-981-99-8181-6
eBook Packages: Computer ScienceComputer Science (R0)