SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking

Zheng, Yu; Liu, Yong; Che, Xun

doi:10.1007/s00530-024-01524-4

SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking

Regular Paper
Published: 23 October 2024

Volume 30, article number 323, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yu Zheng¹,
Yong Liu¹ &
Xun Che¹

203 Accesses
Explore all metrics

Abstract

Locating and classifying the target object is performed by the siamese-based tracking framework by evaluating the similarity on the feature maps from the template and search branches. While the promising tracking performances have been achieved by the state-of-the-art (SOTA) trackers, the robustness and accuracy of these trackers significantly decline in complex scenes, such as deformation in appearance and interference in background. In order to suppress these defects, sophisticated template updating mechanisms as well as refinement modules have been proposed by many recent works, but these methods do not consider higher-order correlations between features, which play an important role in obtaining accurate localization and classification information. For this, we propose a high-order spatial constraint (HOSC) module, considering high-order correlations between features through recursively executing interactions in feature maps. Additionally, to learn superior feature representations and more discriminative features, a more efficient and effective channel attention with mutual compensation (CAMC) module is proposed in this work, where channel attention from the template branch is utilized to enhance the channel constraint of the search branch for improving the learning of discriminative features and it would be advantageous for the template branch to encode more contextual information from the search image. Finally, extensive experiments were conducted on datasets (OTB100, VOT2018, LaSOT and GOT-10K), and the proposed method achieves competitive performance compared to SOTA trackers (CNN-based).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SiamATA: an asymmetric target-aware and frequency domain task-aware Siamese network for visual tracking

Article 30 September 2024

Dual Siamese Channel Attention Networks for Visual Object Tracking

Efficient Siamese model for visual object tracking with attention-based fusion modules

Article 20 July 2024

Data availability

The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request.

References

Chen, X., Yan, X., Zheng, F., Jiang, Y., Xia, S.-T., Zhao, Y., Ji, R.: One-shot adversarial attacks on visual tracking with dual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Zhang, J., Yuan, T., He, Y., Wang, J.: A background-aware correlation filter with adaptive saliency-aware regularization for visual tracking. Neural Comput. Appl. 34, 6359–6376 (2022). https://doi.org/10.1007/s00521-021-06771-4
Article Google Scholar
Wang, Z., Li, M., Lu, Y., Bao, Y., Li, Z., Zhao, J.: Effective multiple pedestrian tracking system in video surveillance with monocular stationary camera. Expert Syst. Appl. 178, 114992 (2021). https://doi.org/10.1016/j.eswa.2021.114992
Article Google Scholar
Bao, Y., Yu, Y., Qi, Y., Wang, Z.: Multiple object tracking with adaptive multi-features fusion and improved learnable graph matching. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02916-9
Article Google Scholar
Gao, J., Zhang, T., Xu, C.: I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. Proc. AAAI Conf. Artif. Intell. 33, 8303–8311 (2019). https://doi.org/10.1609/aaai.v33i01.33018303
Article Google Scholar
Hu, Y., Fu, J., Chen, M., Gao, J., Dong, J., Fan, B., Liu, H.: Learning proposal-aware re-ranking for weakly-supervised temporal action localization. IEEE Trans. Circuits Syst. Video Technol. 34(1), 207–220 (2024). https://doi.org/10.1109/TCSVT.2023.3283430
Article Google Scholar
Fu, Z., Liu, Q., Fu, Z., Wang, Y.: Stmtrack: Template-free visual tracking with space-time memory networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.01356
Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable siamese attention networks for visual object tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr42600.2020.00676
Danelljan, M., Bhat, G., Khan, F. Felsberg, M.: ATOM: Accurate Tracking by Overlap Maximization (2018)
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking, pp. 771–787 (2020). https://doi.org/10.1007/978-3-030-58589-1-46
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00670
Bertinetto, L., Valmadre, J., Henriques, J.A.P., Vedaldi, A. Torr, P.H.S.: Fully-Convolutional Siamese Networks for Object Tracking (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016). https://doi.org/10.1109/tpami.2016.2577031
Article Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00935
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00441
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00942
Mayer, C., Danelljan, M., Pani Paudel, D., Van Gool, L.: Learning target candidate association to keep track of what not to track. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01319
Yang, K., Zhang, H., Shi, J., Ma, J.: Bandt: A border-aware network with deformable transformers for visual tracking. IEEE Trans. Consum. Electron. 69(3), 377–390 (2023). https://doi.org/10.1109/TCE.2023.3251407
Article Google Scholar
Rao, Y., Zhao, W., Tang, Y., Zhou, J., Lim, S.-N., Lu, J.: HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions (2022)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16$\times$16 words: Transformers for image recognition at scale. Computer Vision and Pattern Recognition (2020)
Cheng, S., Zhong, B., Li, G., Liu, X., Tang, Z., Li, X., Wang, J.: Learning to filter: Siamese relation network for robust tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00440
Bromley, J., Guyon, I., LeCun, Y., Sackinger, E. Shah, R.: Signature Verification using a “Siamese” Time Delay Neural Network (1993)
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese Instance Search for Tracking (2016)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese Networks for Visual Object Tracking, pp. 103–119 (2018). https://doi.org/10.1007/978-3-030-01240-3-7
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00814
Wang, G., Luo, C., Xiong, Z., Zeng, W.: Spm-tracker: Series-parallel matching for real-time visual object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00376
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00472
Zhang, L., Gonzalez-Garcia, A., Weijer, J., Danelljan, M., Khan, F.: Learning the Model Update for Siamese Trackers (2019)
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: Gradient-guided network for visual object tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020). https://doi.org/10.1109/iccv.2019.00626
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: Residual attentional siamese network for high performance online visual tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00510
Chen, B., Li, P., Sun, C., Wang, D., Yang, G., Lu, H.: Multi attention module for visual tracking. Pattern Recogn. 87, 80–93 (2019). https://doi.org/10.1016/j.patcog.2018.10.005
Article Google Scholar
Saribas, H., Cevikalp, H., Köpüklü, O., Uzun, B.: TRAT: tracking by attention using spatio-temporal features. (2020) CoRR: arXiv:2011.09524
Chen, J., Huang, B., Li, J., Wang, Y., Ren, M., Xu, T.: Learning spatio-temporal attention based siamese network for tracking uavs in the wild. Remote Sensing 14(8), 1797 (2022). https://doi.org/10.3390/rs14081797
Article Google Scholar
Huang, K., Qin, P., Tu, X., Leng, L., Chu, J.: Siamcam: A real-time siamese network for object tracking with compensating attention mechanism. Appl. Sci. 12(8), 3931 (2022). https://doi.org/10.3390/app12083931
Article Google Scholar
Wang, Z., Li, Z., Leng, J., Li, M., Bai, L.: Multiple pedestrian tracking with graph attention map on urban road scene. IEEE Trans. Intell. Transp. Syst. 24(8), 8567–8579 (2023). https://doi.org/10.1109/TITS.2022.3193961
Article Google Scholar
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00478
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011–2023 (2019). https://doi.org/10.1109/tpami.2019.2913372
Woo, S., Park, J., Lee, J.-Y., Kweon, I.: CBAM: Convolutional Block Attention Module (2018)
Hao, C., Chen, Y., Yang, Z.-X., Wu, E.: Higher-order potentials for video object segmentation in bilateral space. Neurocomputing 401, 28–35 (2020). https://doi.org/10.1016/j.neucom.2020.03.020
Article Google Scholar
Xing, D., Evangeliou, N., Tsoukalas, A., Tzes, A.: Siamese transformer pyramid networks for real-time uav tracking. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022). https://doi.org/10.1109/wacv51458.2022.00196
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable Convolutional Networks (2017)
Wu, Y., Lim, J., Yang, M.-H.: Object tracking benchmark. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1834–1848. (2015). https://doi.org/10.1109/tpami.2014.2388226
Kristan, M., Leonardis, A., Matas, J. et al.: The Sixth Visual Object Tracking VOT2018 Challenge Results, pp. 3–53 (2019). https://doi.org/10.1007/978-3-030-11009-3-1
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00552
Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1562–1577 (2021). https://doi.org/10.1109/tpami.2019.2957464
Article Google Scholar
Miller, G.A.: Wordnet: a lexical database for english: New horizons in commercial and industrial AI. Communications of The ACM, (1995)
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01028
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: Hierarchical feature transformer for aerial tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01517
Mayer, C., Danelljan, M., Bhat, G., Paul, M., Pani, D., Fisher, P., Luc, Y., Gool, V.: Transforming model prediction for tracking, pp. 8731–8740. (2022)
Song, Z., Yu, J., Chen, Y.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8781–8790 (2022). https://doi.org/10.1109/CVPR52688.2022.00859
Bhat, G., Danelljan, M., Gool, L., Timofte, R.: Know Your Surroundings: Exploiting Scene Information for Object Tracking (2020)
Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020). https://doi.org/10.1109/iccv.2019.00628
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.S.: Fast online object tracking and segmentation: a unifying approach. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00142
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00472
Zhao, M., Okada, K., Inaba, M.: Trtr: Visual tracking with transformer. Computer Vision and Pattern Recognition (2021)
Voigtlaender, P., Luiten, J., Torr, P.H.S., Leibe, B.: Siam r-cnn: Visual tracking by re-detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00661
Danelljan, M., Van Gool, L., Timofte, R.: Probabilistic regression for visual tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00721
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proc. AAAI Conf. Artif. Intell. 34, 12549–12556 (2020). https://doi.org/10.1609/aaai.v34i07.6944
Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: Exploiting temporal context for robust visual tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00162

Download references

Acknowledgements

We want to thank Xun Che for help with experiments and equipment, and Yong Liu for guidance on method design and implementation.

Funding

This work was financially supported in part by Jiangsu Provincial Key Research and Development Program (Grant No. BE2017301), Jiangsu Provincial Key Research and Development Program (Grant No. BE2022363), Project of Jiangsu Modern Agricultural Machinery Equipment Technology Demonstration and Promotion (Grant No. NJ2022-03), National Natural Science Fundation of China (Grant No. 61473155), and Six Talent Peaks Project in Jiangsu Province of China (No. GDZB-039).

Author information

Authors and Affiliations

The School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolinwei 200, Nanjing, 210094, JiangSu, China
Yu Zheng, Yong Liu & Xun Che

Authors

Yu Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Yong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xun Che
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Y.Z. completed manuscript writing, framework design, and evaluation. X.C. performed experiment configuration, model training, and manuscript quality improvement. Y.L. controlled the direction and design of this study. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yong Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by Junyu Gao.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 6833 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zheng, Y., Liu, Y. & Che, X. SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking. Multimedia Systems 30, 323 (2024). https://doi.org/10.1007/s00530-024-01524-4

Download citation

Received: 15 May 2024
Accepted: 01 October 2024
Published: 23 October 2024
DOI: https://doi.org/10.1007/s00530-024-01524-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SiamATA: an asymmetric target-aware and frequency domain task-aware Siamese network for visual tracking

Dual Siamese Channel Attention Networks for Visual Object Tracking

Efficient Siamese model for visual object tracking with attention-based fusion modules

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary Information

Supplementary file 1 (pdf 6833 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now