Abstract
Locating and classifying the target object is performed by the siamese-based tracking framework by evaluating the similarity on the feature maps from the template and search branches. While the promising tracking performances have been achieved by the state-of-the-art (SOTA) trackers, the robustness and accuracy of these trackers significantly decline in complex scenes, such as deformation in appearance and interference in background. In order to suppress these defects, sophisticated template updating mechanisms as well as refinement modules have been proposed by many recent works, but these methods do not consider higher-order correlations between features, which play an important role in obtaining accurate localization and classification information. For this, we propose a high-order spatial constraint (HOSC) module, considering high-order correlations between features through recursively executing interactions in feature maps. Additionally, to learn superior feature representations and more discriminative features, a more efficient and effective channel attention with mutual compensation (CAMC) module is proposed in this work, where channel attention from the template branch is utilized to enhance the channel constraint of the search branch for improving the learning of discriminative features and it would be advantageous for the template branch to encode more contextual information from the search image. Finally, extensive experiments were conducted on datasets (OTB100, VOT2018, LaSOT and GOT-10K), and the proposed method achieves competitive performance compared to SOTA trackers (CNN-based).















Similar content being viewed by others
Data availability
The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request.
References
Chen, X., Yan, X., Zheng, F., Jiang, Y., Xia, S.-T., Zhao, Y., Ji, R.: One-shot adversarial attacks on visual tracking with dual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Zhang, J., Yuan, T., He, Y., Wang, J.: A background-aware correlation filter with adaptive saliency-aware regularization for visual tracking. Neural Comput. Appl. 34, 6359–6376 (2022). https://doi.org/10.1007/s00521-021-06771-4
Wang, Z., Li, M., Lu, Y., Bao, Y., Li, Z., Zhao, J.: Effective multiple pedestrian tracking system in video surveillance with monocular stationary camera. Expert Syst. Appl. 178, 114992 (2021). https://doi.org/10.1016/j.eswa.2021.114992
Bao, Y., Yu, Y., Qi, Y., Wang, Z.: Multiple object tracking with adaptive multi-features fusion and improved learnable graph matching. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02916-9
Gao, J., Zhang, T., Xu, C.: I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. Proc. AAAI Conf. Artif. Intell. 33, 8303–8311 (2019). https://doi.org/10.1609/aaai.v33i01.33018303
Hu, Y., Fu, J., Chen, M., Gao, J., Dong, J., Fan, B., Liu, H.: Learning proposal-aware re-ranking for weakly-supervised temporal action localization. IEEE Trans. Circuits Syst. Video Technol. 34(1), 207–220 (2024). https://doi.org/10.1109/TCSVT.2023.3283430
Fu, Z., Liu, Q., Fu, Z., Wang, Y.: Stmtrack: Template-free visual tracking with space-time memory networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.01356
Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable siamese attention networks for visual object tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr42600.2020.00676
Danelljan, M., Bhat, G., Khan, F. Felsberg, M.: ATOM: Accurate Tracking by Overlap Maximization (2018)
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking, pp. 771–787 (2020). https://doi.org/10.1007/978-3-030-58589-1-46
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00670
Bertinetto, L., Valmadre, J., Henriques, J.A.P., Vedaldi, A. Torr, P.H.S.: Fully-Convolutional Siamese Networks for Object Tracking (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016). https://doi.org/10.1109/tpami.2016.2577031
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00935
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00441
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00942
Mayer, C., Danelljan, M., Pani Paudel, D., Van Gool, L.: Learning target candidate association to keep track of what not to track. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01319
Yang, K., Zhang, H., Shi, J., Ma, J.: Bandt: A border-aware network with deformable transformers for visual tracking. IEEE Trans. Consum. Electron. 69(3), 377–390 (2023). https://doi.org/10.1109/TCE.2023.3251407
Rao, Y., Zhao, W., Tang, Y., Zhou, J., Lim, S.-N., Lu, J.: HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions (2022)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16\(\times\)16 words: Transformers for image recognition at scale. Computer Vision and Pattern Recognition (2020)
Cheng, S., Zhong, B., Li, G., Liu, X., Tang, Z., Li, X., Wang, J.: Learning to filter: Siamese relation network for robust tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00440
Bromley, J., Guyon, I., LeCun, Y., Sackinger, E. Shah, R.: Signature Verification using a “Siamese” Time Delay Neural Network (1993)
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese Instance Search for Tracking (2016)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese Networks for Visual Object Tracking, pp. 103–119 (2018). https://doi.org/10.1007/978-3-030-01240-3-7
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00814
Wang, G., Luo, C., Xiong, Z., Zeng, W.: Spm-tracker: Series-parallel matching for real-time visual object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00376
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00472
Zhang, L., Gonzalez-Garcia, A., Weijer, J., Danelljan, M., Khan, F.: Learning the Model Update for Siamese Trackers (2019)
Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: Gradient-guided network for visual object tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020). https://doi.org/10.1109/iccv.2019.00626
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: Residual attentional siamese network for high performance online visual tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00510
Chen, B., Li, P., Sun, C., Wang, D., Yang, G., Lu, H.: Multi attention module for visual tracking. Pattern Recogn. 87, 80–93 (2019). https://doi.org/10.1016/j.patcog.2018.10.005
Saribas, H., Cevikalp, H., Köpüklü, O., Uzun, B.: TRAT: tracking by attention using spatio-temporal features. (2020) CoRR: arXiv:2011.09524
Chen, J., Huang, B., Li, J., Wang, Y., Ren, M., Xu, T.: Learning spatio-temporal attention based siamese network for tracking uavs in the wild. Remote Sensing 14(8), 1797 (2022). https://doi.org/10.3390/rs14081797
Huang, K., Qin, P., Tu, X., Leng, L., Chu, J.: Siamcam: A real-time siamese network for object tracking with compensating attention mechanism. Appl. Sci. 12(8), 3931 (2022). https://doi.org/10.3390/app12083931
Wang, Z., Li, Z., Leng, J., Li, M., Bai, L.: Multiple pedestrian tracking with graph attention map on urban road scene. IEEE Trans. Intell. Transp. Syst. 24(8), 8567–8579 (2023). https://doi.org/10.1109/TITS.2022.3193961
Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00478
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011–2023 (2019). https://doi.org/10.1109/tpami.2019.2913372
Woo, S., Park, J., Lee, J.-Y., Kweon, I.: CBAM: Convolutional Block Attention Module (2018)
Hao, C., Chen, Y., Yang, Z.-X., Wu, E.: Higher-order potentials for video object segmentation in bilateral space. Neurocomputing 401, 28–35 (2020). https://doi.org/10.1016/j.neucom.2020.03.020
Xing, D., Evangeliou, N., Tsoukalas, A., Tzes, A.: Siamese transformer pyramid networks for real-time uav tracking. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022). https://doi.org/10.1109/wacv51458.2022.00196
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable Convolutional Networks (2017)
Wu, Y., Lim, J., Yang, M.-H.: Object tracking benchmark. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1834–1848. (2015). https://doi.org/10.1109/tpami.2014.2388226
Kristan, M., Leonardis, A., Matas, J. et al.: The Sixth Visual Object Tracking VOT2018 Challenge Results, pp. 3–53 (2019). https://doi.org/10.1007/978-3-030-11009-3-1
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00552
Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1562–1577 (2021). https://doi.org/10.1109/tpami.2019.2957464
Miller, G.A.: Wordnet: a lexical database for english: New horizons in commercial and industrial AI. Communications of The ACM, (1995)
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01028
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: Hierarchical feature transformer for aerial tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01517
Mayer, C., Danelljan, M., Bhat, G., Paul, M., Pani, D., Fisher, P., Luc, Y., Gool, V.: Transforming model prediction for tracking, pp. 8731–8740. (2022)
Song, Z., Yu, J., Chen, Y.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8781–8790 (2022). https://doi.org/10.1109/CVPR52688.2022.00859
Bhat, G., Danelljan, M., Gool, L., Timofte, R.: Know Your Surroundings: Exploiting Scene Information for Object Tracking (2020)
Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020). https://doi.org/10.1109/iccv.2019.00628
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.S.: Fast online object tracking and segmentation: a unifying approach. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00142
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00472
Zhao, M., Okada, K., Inaba, M.: Trtr: Visual tracking with transformer. Computer Vision and Pattern Recognition (2021)
Voigtlaender, P., Luiten, J., Torr, P.H.S., Leibe, B.: Siam r-cnn: Visual tracking by re-detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00661
Danelljan, M., Van Gool, L., Timofte, R.: Probabilistic regression for visual tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00721
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proc. AAAI Conf. Artif. Intell. 34, 12549–12556 (2020). https://doi.org/10.1609/aaai.v34i07.6944
Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: Exploiting temporal context for robust visual tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00162
Acknowledgements
We want to thank Xun Che for help with experiments and equipment, and Yong Liu for guidance on method design and implementation.
Funding
This work was financially supported in part by Jiangsu Provincial Key Research and Development Program (Grant No. BE2017301), Jiangsu Provincial Key Research and Development Program (Grant No. BE2022363), Project of Jiangsu Modern Agricultural Machinery Equipment Technology Demonstration and Promotion (Grant No. NJ2022-03), National Natural Science Fundation of China (Grant No. 61473155), and Six Talent Peaks Project in Jiangsu Province of China (No. GDZB-039).
Author information
Authors and Affiliations
Contributions
Y.Z. completed manuscript writing, framework design, and evaluation. X.C. performed experiment configuration, model training, and manuscript quality improvement. Y.L. controlled the direction and design of this study. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by Junyu Gao.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zheng, Y., Liu, Y. & Che, X. SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking. Multimedia Systems 30, 323 (2024). https://doi.org/10.1007/s00530-024-01524-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01524-4