Siamese Tracking with Bilinear Features

Pi, Zhixiong; Gao, Changxin; Sang, Nong

doi:10.1007/978-3-031-02444-3_32

Zhixiong Pi¹⁰,
Changxin Gao¹⁰ &
Nong Sang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13189))

Included in the following conference series:

Asian Conference on Pattern Recognition

896 Accesses

Abstract

Bilinear features arise in fine-grained visual recognition. They are advantageous to encode detailed representations and attributes to differentiate visually similar objects. The apparent similarity is challenging in visual tracking where background distractors interfere siamese trackers to localize the target object. Especially when distractors and the target belong to the same object category. To increase the discrimination between similar appearance objects, we propose an efficient bilinear encoding method for siamese tracking. The proposed method consists of a self-bilinear encoder and an cross-bilinear encoder. The bilinear features generated via the self-bilinear encoder and the cross-bilinear encoder represent target variations itself and target distractor difference, respectively. To this end, the proposed bilinear encoders advance siamese trackers to capture target appearance variations while differentiating the target and background distractors. Experiments on the benchmark datasets show the effectiveness of bilinear features. Our tracker performs favorably against state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) Computer Vision – ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV, pp. 6182–6191 (October 2019)
Google Scholar
Chen, K., Tao, W.: Convolutional regression for visual tracking. TIP 27(7), 3611–3620 (2018)
MathSciNet MATH Google Scholar
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: CVPR, pp. 6668–6677 (2020)
Google Scholar
Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: CVPR (July 2017)
Google Scholar
Dai, K., Wang, D., Lu, H., Sun, H., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, June 2019 (2019)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR, June 2019 (2019)
Google Scholar
Danelljan, M., Gool, L.V., Timofte, R.: Probabilistic regression for visual tracking. In: CVPR, pp. 7183–7192 (2020)
Google Scholar
Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29
Chapter Google Scholar
Dong, X., Shen, J.: Triplet loss in Siamese network for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 472–488. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_28
Chapter Google Scholar
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019)
Google Scholar
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR, June 2019 (2019)
Google Scholar
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: CVPR, June 2019 (2019)
Google Scholar
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR, June 2016 (2016)
Google Scholar
Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: CVPR, June 2020, pp. 6269–6277 (2020)
Google Scholar
Han, B., Sim, J., Adam, H.: BranchOut: regularization for online ensemble tracking with convolutional neural networks. In: CVPR (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, Z., Fan, Y., Zhuang, J., Dong, Y., Bai, H.: Correlation filters with weighted convolution responses. In ICCV, October 2017 (2017)
Google Scholar
Huang, L., Zhao, X., Huang, K.: GlobalTrack: a simple and strong baseline for long-term tracking. In: AAAI, vol. 34, pp. 11037–11044 (2020)
Google Scholar
Huang, L., Zhao, X., Huang, K.: GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. TPAMI 43(5), 1562–1577 (2021)
Article Google Scholar
Kristan, M., et al.: The visual object tracking vot2017 challenge results. In: ICCV (2017)
Google Scholar
Kristan, M., et al.: The seventh visual object tracking VOT2019 challenge results. In ICCV, October 2019 (2019)
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, June 2019 (2019)
Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, June 2018 (2018)
Google Scholar
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: GradNET: gradient-guided network for visual object tracking. In: ICCV, pp. 6162–6171 (2019)
Google Scholar
Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: CVPR, June 2018 (2018)
Google Scholar
Li, X., Ma, C., Wu, B., He, Z., Yang, M.-H.: Target-aware deep tracking. In: CVPR, pp. 1369–1378 (2019)
Google Scholar
Li, Y., Wang, N., Liu, J., Hou, X.: Factorized bilinear models for image recognition. In: ICCV, pp. 2079–2087 (2017)
Google Scholar
Li, Y., Zhu, J., Hoi, S.C.: Reliable patch trackers: robust visual tracking by exploiting reliable patches. In: CVPR, June 2015 (2015)
Google Scholar
Lin, T.-Y.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Lin, T.-Y., Maji, S., Koniusz, P.: Second-order democratic aggregation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 639–656. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_38
Chapter Google Scholar
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, December 2015 (2015)
Google Scholar
Lukezic, A., Matas, J., Kristan, M.: D3S - a discriminative single shot segmentation tracker. In: CVPR, pp. 7133–7142 (2020)
Google Scholar
Ma, C., Huang, J.-B., Yang, X., Yang, M.-H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015)
Google Scholar
Ma, Z., Wang, L., Zhang, H., Lu, W., Yin, J.: RPT: learning point set representation for Siamese visual tracking. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12539, pp. 653–665. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68238-5_43
Chapter Google Scholar
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Chapter Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR (2016)
Google Scholar
Pu, S., Song, Y., Ma, C., Zhang, H., Yang, M.-H.: Deep attentive tracking via reciprocative learning. In: NeurIPS (2018)
Google Scholar
Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. TPAMI 36, 1442–1468 (2014)
Article Google Scholar
Song, Y., et al.: VITAL: visual tracking via adversarial learning. In: CVPR, pp. 8990–8999 (2018)
Google Scholar
Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam R-CNN: visual tracking by re-detection. In: CVPR, pp. 6578–6588 (2020)
Google Scholar
Wang, G., Luo, G., Xiong, Z., Zeng, W.: SPM-tracker: series-parallel matching for real-time visual object tracking. In: CVPR, pp. 3643–3652, June 2019 (2019)
Google Scholar
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional Siamese network for high performance online visual tracking. In: CVPR (2018)
Google Scholar
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, June 2019 (2019)
Google Scholar
Wei, X., Zhang, Y., Gong, Y., Zhang, J., Zheng, N.: Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 365–380. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_22
Chapter Google Scholar
Wu, Y., Lim, J., Yang, M.-H.: Object tracking benchmark. TPAMI 37, 1834–1848 (2015)
Article Google Scholar
Xu, T., Feng, Z.-H., Wu, X.-J., Kittler, J.: Joint group feature selection and discriminative filter learning for robust visual object tracking. In: ICCV, October 2019 (2019)
Google Scholar
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, vol. 34, pp. 12549–12556 (2020)
Google Scholar
Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 153–169. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_10
Chapter Google Scholar
Yang, T., Xu, P., Hu, R., Chai, H., Chan, A.B.: ROAM: recurrently optimizing tracking model. In: CVPR, pp. 6718–6727, June 2020 (2020)
Google Scholar
Yazdi, M., Bouwmans, T.: New trends on moving object detection in video images captured by a moving camera: a survey. Comput. Sci. Rev. 28, 157–177 (2018)
Article MathSciNet Google Scholar
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 595–610. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_35
Chapter Google Scholar
Yue, K., Sun, M., Yuan, Y., Zhou, F., Ding, E., Xu, F.: Compact generalized non-local network. In: NeurIPS, November 2018 (2018)
Google Scholar
Zhang, T., Xu, C., Yang, M.-H.: Multi-task correlation particle filter for robust object tracking. In: CVPR (2017)
Google Scholar
Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., Lu, H.: Structured Siamese network for real-time visual tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 355–370. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_22
Chapter Google Scholar
Zhang, Z., Peng, H.:L Deeper and wider Siamese networks for real-time visual tracking. In: CVPR, June 2019 (2019)
Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Chapter Google Scholar
Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: CVPR (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, HuaZhong University of Science and Technology, Wuhan, China
Zhixiong Pi, Changxin Gao & Nong Sang

Authors

Zhixiong Pi
View author publications
You can also search for this author in PubMed Google Scholar
Changxin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Nong Sang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixiong Pi .

Editor information

Editors and Affiliations

Korea University, Seoul, Korea (Republic of)
Christian Wallraven
Nanjing University, Nanjing, China
Qingshan Liu
Osaka University, Osaka, Japan
Hajime Nagahara

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 633 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pi, Z., Gao, C., Sang, N. (2022). Siamese Tracking with Bilinear Features. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-02444-3_32
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Siamese Tracking with Bilinear Features