Skip to main content
Log in

Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The development of a real-time and robust RGB-T tracker is an extremely challenging task because the tracked object may suffer from shared and specific challenges in RGB and thermal (T) modalities. In this work, we observe that the implicit attribute information can boost the model discriminability, and propose a novel attribute-driven representation network to improve the RGB-T tracking performance. First, according to appearance change in RGB-T tracking scenarios, we divide the major and special challenges into four typical attributes: extreme illumination, occlusion, motion blur, and thermal crossover. Second, we design an attribute-driven residual branch for each heterogeneous attribute to mine the attribute-specific property and therefore build a powerful residual representation for object modeling. Furthermore, we aggregate these representations in channel and pixel levels by using the proposed attribute ensemble network (AENet) to adaptively fit the attribute-agnostic tracking process. The AENet can effectively make aware of appearance change while suppressing the distractors. Finally, we conduct numerous experiments on three RGB-T tracking benchmarks to compare the proposed trackers with other state-of-the-art methods. Experimental results show that our tracker achieves very competitive results with a real-time tracking speed. Code will be available at https://github.com/zhang-pengyu/ADRNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. We exclude DAPNet Zhu et al. 2019 for comparison, which does not report its speed in the original paper.

References

  • Ak, K.E., Kassim, A.A., Lim, J.H., & Tham, J.Y., (2018)Learning attribute representations with localization for flexible fashion search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7708–7717

  • Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: (2016) Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision Workshop, pp. 850–865

  • Bhat, G., Danelljan, M., Gool, L.V., & Timofte, R., (2019)Learning discriminative model prediction for tracking. In: IEEE International Conference on Computer Vision, pp. 6182–6191

  • Bolme, D.S., Beveridge, J.R., Draper, B.A., & Lui, Y.M., (2010)Visual object tracking using adaptive correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2544–2550

  • Camplani, M., Hannuna, S., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., Burghardt, T.: Real-time RGB-D tracking with depth scaling kernelised correlation filters and occlusion handling. In: British Machine Vision Conference, pp. 1–11

  • Chen, Z., Zhong, B., Li, G., Zhang, S., & Ji, R., (2020) Siamese box adaptive network for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6668–6677

  • Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M., (2017) ECO: Efficient convolution operators for tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646

  • Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M., (2019) ATOM: Accurate tracking by overlap maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660–4669

  • Danelljan, M., Gool, L.V., & Timofte, R., (2020) Probabilistic regression for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7183–7192

  • Danelljan, M., Hager, G., Khan, F. S., & Felsberg, M. (2017). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), 1561–1575.

    Article  Google Scholar 

  • Ding, P., & Song, Y., (2015) Robust object tracking using color and depth images with a depth based occlusion handling and recovery. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 930–935

  • Feng, Q., Ablavsky, V., Bai, Q., & Sclaroff, S., (2019)Robust visual object tracking with natural language region proposal network. CoRR abs/1912.02048

  • Feng, Q., Ablavsky, V., Bai, Q., Li, G., & Sclaroff, S.,(2020)Real-time visual object tracking with natural language description. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 700–709

  • Gao, Y., Li, C., Zhu, Y., Tang, J., He, T., & Wang, F.,(2019) Deep adaptive fusion network for high performance RGBT tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 1–9

  • Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E., (2018) Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141

  • Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y., (2018) Acquisition of localization confidence for accurate object detection. In: European Conference on Computer Vision, pp. 784–799

  • Jung, I., Son, J., Baek, M., & Han, B., (2018) Real-time MDNet. In: European Conference on Computer Vision, pp. 83–98

  • Kart, U., Kamarainen, J.K., & Matas, J., (2018) How to make an RGBD tracker? In: European Conference on Computer Vision Workshop, pp. 1–15

  • Kart, U., Kamarainen, J.K., Matas, J., Fan, L., & Cricri, F., (2018) Depth masked discriminative correlation filter. In: International Conference on Pattern Recognition, pp. 2112–2117

  • Kart, U., Lukezic, A., Kristan, M., Kamarainen, J.K., & Matas, J., (2019) Object tracking by reconstruction with view-specific discriminative correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1339–1348

  • Kim, D. Y., & Jeon, M. (2014). Data fusion of radar and image measurements for multi-object tracking via kalman filtering. Information Fusion, 278(10), 641–652.

    MathSciNet  Google Scholar 

  • Kristan, M., Matas, J., Leonardis, A., Felsberg, M., & et al., (2019) The seventh visual object tracking VOT2019 challenge results. In: IEEE International Conference on Computer Vision Workshop, pp. 1–36

  • Lan, X., Ye, M., Zhang, S., & Yuen, P.C., (2018) Robust collaborative discriminative learning for RGB-infrared tracking. In: AAAI Conference on Artificial Intelligence, pp. 1–8

  • Lan, X., Ye, M., Zhang, S., Zhou, H., & Yuen, P.C., (2018) Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters

  • Lan, X., Ye, M., Shao, R., & Zhong, B. (2019). Online non-negative multi-modality feature template learning for RGB-assisted infrared tracking. IEEE Access, 7, 67761–67771.

    Article  Google Scholar 

  • Lan, X., Ye, M., Shao, R., Zhong, B., Yuen, P. C., & Zhou, H. (2019). Learning modality-consistency feature templates: A robust RGB-Infrared tracking system. IEEE Transactions on Industrial Electronics, 66(12), 9887–9897.

    Article  Google Scholar 

  • Li, Y., & Zhu, J., (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265

  • Li, C., Hu, S., Gao, S., & Tang, J., (2016) Real-time grayscale-thermal tracking via laplacian sparse representation. In: International Conference on Multimedia Modeling, pp. 54–65

  • Li, C., Liang, X., Lu, Y., Zhao, N., & Tang, J., (2019) RGB-T object tracking: Benchmark and baseline. Pattern Recognition 96(12), 106,977

  • Li, C., Liu, L., Lu, A., Ji, Q., & Tang, J., (2020) Challenge-aware RGBT tracking. In: European Conference on Computer Vision, pp. 222–237

  • Li, C., Lu, A., Zheng, A., Tu, Z., & Tang, J., (2019) Multi-adapter RGBT tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 2262–2270

  • Li, Z., Tao, R., Gavves, E., Snoek, C.G., & Smeulders, A.W., (2017) Tracking by natural language specification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6495–6503

  • Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X., High performance visual tracking with siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980

  • Li, C., Zhao, N., Lu, Y., Zhu, C., & Tang, J., (2017) Weighted sparse representation regularized graph learning for RGB-T object tracking. In: ACM International Conference on Multimedia, pp. 1856–1864

  • Li, C., Zhu, C., Huang, Y., Tang, J., & Wang, L., (2018) Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: European Conference on Computer Vision, pp. 808–823

  • Li, C., Cheng, H., Hu, S., Liu, X., Tang, J., & Lin, L. (2016). Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transaction Image Processing, 25(12), 5743–5756.

    Article  MathSciNet  Google Scholar 

  • Liu, H., & Sun, F. (2012). Fusion tracking in color and infrared images using joint sparse representation. Information Sciences, 55(3), 590–599.

    MathSciNet  Google Scholar 

  • Li, C., Wu, X., Zhao, N., Cao, X., & Tang, J. (2018). Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing, 281, 78–85.

    Article  Google Scholar 

  • Luo, C., Sun, B., Yang, K., Lu, T., & Yeh, W. C. (2019). Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics & Technology, 99, 265–276.

    Article  Google Scholar 

  • Lu, H., & Wang, D. (2019). Online Visual Tracking. Berlin: Springer.

    Book  Google Scholar 

  • Megherbi, N., Ambellouis, S., Colot, O., & Cabestaing, F., (2005) Joint audio-video people tracking using belief theory. In: IEEE Conference on Advanced Video and Signal based Surveillance, pp. 135–140

  • Nam, H., & Han, B., (2016) Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302

  • Ning, J., Yang, J., Jiang, S., Zhang, L., & Yang, M.H., (2016) Object tracking via dual linear structured svm and explicit feature map. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4266–4274

  • Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., & Yang, M.H., (2019) Learning attribute-specific representations for visual tracking. In: AAAI Conference on Artificial Intelligence, pp. 8835–8842

  • Ronneberger, O., Fischer, P., & Brox, T.,(2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241

  • Seunghoon Hong Tackgeun You, S.K., & Han, B., (2015) Online tracking by learning discriminative saliency map with convolutional neural network. pp. 597–606

  • Simonyan, K., & Zisserman, A., (2015) Very deep convolutional networks for large-scale image recognition. In: IEEE International Conference on Learning Representations, pp. 1–14

  • Song, X., Zhao, H., Cui, J., Shao, X., Shibasaki, R., & Zha, H. (2013). An online system for multiple interacting targets tracking: Fusion of laser and vision, tracking and learning. ACM Transactions on Intelligent Systems and Technology, 4(1), 1–21.

    Article  Google Scholar 

  • Voigtlaender, P., Luiten, J., Torr, P.H., & Leibe, B., (2020) Siam R-CNN: Visual tracking by re-detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6578–6588

  • Wang, N., & Yeung, D.Y., (2013) Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp. 1–9

  • Wang, C., Xu, C., Cui, Z., Zhou, L., Zhang, T., Zhang, X., & Yang, J., (2020) Cross-modal pattern-propagation for rgb-t tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7064–7073

  • Wang, Z., Xu, J., Liu, L., Zhu, F., & Shao, L., (2019) RANet: Ranking attention network for fast video object segmentation. In: IEEE International Conference on Computer Vision, pp. 3978–3987

  • Wang, D., Lu, H., Xiao, Z., & Yang, M. H. (2015). Inverse sparse tracker with a locally weighted distance metric. IEEE Transaction Image Processing, 24(9), 2446–2457.

    MathSciNet  MATH  Google Scholar 

  • Wang, W., Yan, Y., Winkler, S., & Sebe, N. (2016). Category specific dictionary learning for attribute specific feature selection. IEEE Transaction Image Processing, 25(3), 1465–1478.

    Article  MathSciNet  Google Scholar 

  • Woo, S., Park, J., Lee, J.Y., & Kweon, I.S.,(2018) CBAM: Convolutional block attention module. In: European Conference on Computer Vision, pp. 3–19

  • Wu, Y., Blasch, E., Chen, G., Bai, L., & Ling, H., (2011) Multiple source data fusion via sparse representation for robust visual tracking. In: International Conference on Information Fusion, pp. 1–8

  • Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G., (2020) SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, pp. 12,549–12,556

  • Yang, Z., Kumar, T., Chen, T., Su, J., & Luo, J. (2020). Grounding-tracking-integration. IEEE Transactions on Circuits and Systems for Video Technology.

  • Yang, R., Zhu, Y., Wang, X., Li, C., Tang, J., (2019) Learning target-oriented dual attention for robust RGB-T tracking. In: IEEE International Conference on Image Processing, pp. 1–8

  • Yu, Y., Xiong, Y., Huang, W., & Scott, M.R., (2020) Deformable siamese attention networks for visual object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6728–6737

  • Zhai, S., Shao, P., Liang, X., & Wang, X. (2019). Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing, 334, 172–181.

    Article  Google Scholar 

  • Zhang, T., Ghanem, B., Liu, S., & Ahuja, N., (2012) Low-rank sparse learning for robust visual tracking. In: European Conference on Computer Vision, pp. 470–484

  • Zhang, X., Zhang, X., Du, X., Zhou, X., & Yin, J., (2018) Learning multi-domain convolutional network for RGB-T visual tracking. In: International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, pp. 1–6

  • Zhang, H., Zhang, L., Zhuo, L., & Zhang, J. (2020). Object tracking in RGB-T videos using modal-aware attention network and competitive learning. Sensors,20(2).

  • Zhang, P., Zhao, J., Wang, D., Lu, H., & Yang, X., Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Transactions on Image Processing 30, 3335 – 3347

  • Zhang, Z., & Peng, H., (2019) Deeper and wider siamese networks for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4591–4600

  • Zhu, Y., Li, C., Lu, Y., Lin, L., Luo, B., & Tang, J., (2018) FANet: Quality-aware feature aggregation network for RGB-T tracking. CoRR abs/1811.09855

  • Zhu, Y., Li, C., Luo, B., Tang, J., & Wang, X., (2019) Dense feature aggregation and pruning for RGBT tracking. In: ACM International Conference on Multimedia, pp. 465–472

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62022021, Grant 61806037, Grant 61872056, and Grant 61725202; and in part by the Science and Technology Innovation Foundation of Dalian under Grant 2020JJ26GX036; and in part by the Fundamental Research Funds for the Central Universities under Grant DUT21LAB127.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Wang.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 30142 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, P., Wang, D., Lu, H. et al. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int J Comput Vis 129, 2714–2729 (2021). https://doi.org/10.1007/s11263-021-01495-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01495-3

Keywords

Navigation