Abstract
Bilinear features arise in fine-grained visual recognition. They are advantageous to encode detailed representations and attributes to differentiate visually similar objects. The apparent similarity is challenging in visual tracking where background distractors interfere siamese trackers to localize the target object. Especially when distractors and the target belong to the same object category. To increase the discrimination between similar appearance objects, we propose an efficient bilinear encoding method for siamese tracking. The proposed method consists of a self-bilinear encoder and an cross-bilinear encoder. The bilinear features generated via the self-bilinear encoder and the cross-bilinear encoder represent target variations itself and target distractor difference, respectively. To this end, the proposed bilinear encoders advance siamese trackers to capture target appearance variations while differentiating the target and background distractors. Experiments on the benchmark datasets show the effectiveness of bilinear features. Our tracker performs favorably against state-of-the-art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) Computer Vision – ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV, pp. 6182–6191 (October 2019)
Chen, K., Tao, W.: Convolutional regression for visual tracking. TIP 27(7), 3611–3620 (2018)
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: CVPR, pp. 6668–6677 (2020)
Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: CVPR (July 2017)
Dai, K., Wang, D., Lu, H., Sun, H., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, June 2019 (2019)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR, June 2019 (2019)
Danelljan, M., Gool, L.V., Timofte, R.: Probabilistic regression for visual tracking. In: CVPR, pp. 7183–7192 (2020)
Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29
Dong, X., Shen, J.: Triplet loss in Siamese network for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 472–488. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_28
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019)
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR, June 2019 (2019)
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: CVPR, June 2019 (2019)
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR, June 2016 (2016)
Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: CVPR, June 2020, pp. 6269–6277 (2020)
Han, B., Sim, J., Adam, H.: BranchOut: regularization for online ensemble tracking with convolutional neural networks. In: CVPR (2017)
He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
He, Z., Fan, Y., Zhuang, J., Dong, Y., Bai, H.: Correlation filters with weighted convolution responses. In ICCV, October 2017 (2017)
Huang, L., Zhao, X., Huang, K.: GlobalTrack: a simple and strong baseline for long-term tracking. In: AAAI, vol. 34, pp. 11037–11044 (2020)
Huang, L., Zhao, X., Huang, K.: GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. TPAMI 43(5), 1562–1577 (2021)
Kristan, M., et al.: The visual object tracking vot2017 challenge results. In: ICCV (2017)
Kristan, M., et al.: The seventh visual object tracking VOT2019 challenge results. In ICCV, October 2019 (2019)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, June 2019 (2019)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, June 2018 (2018)
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: GradNET: gradient-guided network for visual object tracking. In: ICCV, pp. 6162–6171 (2019)
Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: CVPR, June 2018 (2018)
Li, X., Ma, C., Wu, B., He, Z., Yang, M.-H.: Target-aware deep tracking. In: CVPR, pp. 1369–1378 (2019)
Li, Y., Wang, N., Liu, J., Hou, X.: Factorized bilinear models for image recognition. In: ICCV, pp. 2079–2087 (2017)
Li, Y., Zhu, J., Hoi, S.C.: Reliable patch trackers: robust visual tracking by exploiting reliable patches. In: CVPR, June 2015 (2015)
Lin, T.-Y.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lin, T.-Y., Maji, S., Koniusz, P.: Second-order democratic aggregation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 639–656. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_38
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, December 2015 (2015)
Lukezic, A., Matas, J., Kristan, M.: D3S - a discriminative single shot segmentation tracker. In: CVPR, pp. 7133–7142 (2020)
Ma, C., Huang, J.-B., Yang, X., Yang, M.-H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015)
Ma, Z., Wang, L., Zhang, H., Lu, W., Yin, J.: RPT: learning point set representation for Siamese visual tracking. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12539, pp. 653–665. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68238-5_43
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR (2016)
Pu, S., Song, Y., Ma, C., Zhang, H., Yang, M.-H.: Deep attentive tracking via reciprocative learning. In: NeurIPS (2018)
Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. TPAMI 36, 1442–1468 (2014)
Song, Y., et al.: VITAL: visual tracking via adversarial learning. In: CVPR, pp. 8990–8999 (2018)
Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam R-CNN: visual tracking by re-detection. In: CVPR, pp. 6578–6588 (2020)
Wang, G., Luo, G., Xiong, Z., Zeng, W.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: CVPR, pp. 3643–3652, June 2019 (2019)
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional Siamese network for high performance online visual tracking. In: CVPR (2018)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, June 2019 (2019)
Wei, X., Zhang, Y., Gong, Y., Zhang, J., Zheng, N.: Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 365–380. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_22
Wu, Y., Lim, J., Yang, M.-H.: Object tracking benchmark. TPAMI 37, 1834–1848 (2015)
Xu, T., Feng, Z.-H., Wu, X.-J., Kittler, J.: Joint group feature selection and discriminative filter learning for robust visual object tracking. In: ICCV, October 2019 (2019)
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, vol. 34, pp. 12549–12556 (2020)
Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 153–169. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_10
Yang, T., Xu, P., Hu, R., Chai, H., Chan, A.B.: ROAM: recurrently optimizing tracking model. In: CVPR, pp. 6718–6727, June 2020 (2020)
Yazdi, M., Bouwmans, T.: New trends on moving object detection in video images captured by a moving camera: a survey. Comput. Sci. Rev. 28, 157–177 (2018)
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 595–610. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_35
Yue, K., Sun, M., Yuan, Y., Zhou, F., Ding, E., Xu, F.: Compact generalized non-local network. In: NeurIPS, November 2018 (2018)
Zhang, T., Xu, C., Yang, M.-H.: Multi-task correlation particle filter for robust object tracking. In: CVPR (2017)
Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., Lu, H.: Structured Siamese network for real-time visual tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 355–370. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_22
Zhang, Z., Peng, H.:L Deeper and wider Siamese networks for real-time visual tracking. In: CVPR, June 2019 (2019)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: CVPR (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Pi, Z., Gao, C., Sang, N. (2022). Siamese Tracking with Bilinear Features. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-02444-3_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)