Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking

Zhang, Pengyu; Wang, Dong; Lu, Huchuan; Yang, Xiaoyun

doi:10.1007/s11263-021-01495-3

Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking

Published: 15 July 2021

Volume 129, pages 2714–2729, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Pengyu Zhang^1,2,
Dong Wang^1,2,
Huchuan Lu^1,2 &
…
Xiaoyun Yang³

2120 Accesses
51 Citations
1 Altmetric
Explore all metrics

Abstract

The development of a real-time and robust RGB-T tracker is an extremely challenging task because the tracked object may suffer from shared and specific challenges in RGB and thermal (T) modalities. In this work, we observe that the implicit attribute information can boost the model discriminability, and propose a novel attribute-driven representation network to improve the RGB-T tracking performance. First, according to appearance change in RGB-T tracking scenarios, we divide the major and special challenges into four typical attributes: extreme illumination, occlusion, motion blur, and thermal crossover. Second, we design an attribute-driven residual branch for each heterogeneous attribute to mine the attribute-specific property and therefore build a powerful residual representation for object modeling. Furthermore, we aggregate these representations in channel and pixel levels by using the proposed attribute ensemble network (AENet) to adaptively fit the attribute-agnostic tracking process. The AENet can effectively make aware of appearance change while suppressing the distractors. Finally, we conduct numerous experiments on three RGB-T tracking benchmarks to compare the proposed trackers with other state-of-the-art methods. Experimental results show that our tracker achieves very competitive results with a real-time tracking speed. Code will be available at https://github.com/zhang-pengyu/ADRNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

Notes

We exclude DAPNet Zhu et al. 2019 for comparison, which does not report its speed in the original paper.

References

Ak, K.E., Kassim, A.A., Lim, J.H., & Tham, J.Y., (2018)Learning attribute representations with localization for flexible fashion search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7708–7717
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: (2016) Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision Workshop, pp. 850–865
Bhat, G., Danelljan, M., Gool, L.V., & Timofte, R., (2019)Learning discriminative model prediction for tracking. In: IEEE International Conference on Computer Vision, pp. 6182–6191
Bolme, D.S., Beveridge, J.R., Draper, B.A., & Lui, Y.M., (2010)Visual object tracking using adaptive correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2544–2550
Camplani, M., Hannuna, S., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., Burghardt, T.: Real-time RGB-D tracking with depth scaling kernelised correlation filters and occlusion handling. In: British Machine Vision Conference, pp. 1–11
Chen, Z., Zhong, B., Li, G., Zhang, S., & Ji, R., (2020) Siamese box adaptive network for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6668–6677
Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M., (2017) ECO: Efficient convolution operators for tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646
Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M., (2019) ATOM: Accurate tracking by overlap maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660–4669
Danelljan, M., Gool, L.V., & Timofte, R., (2020) Probabilistic regression for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7183–7192
Danelljan, M., Hager, G., Khan, F. S., & Felsberg, M. (2017). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), 1561–1575.
Article Google Scholar
Ding, P., & Song, Y., (2015) Robust object tracking using color and depth images with a depth based occlusion handling and recovery. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 930–935
Feng, Q., Ablavsky, V., Bai, Q., & Sclaroff, S., (2019)Robust visual object tracking with natural language region proposal network. CoRR abs/1912.02048
Feng, Q., Ablavsky, V., Bai, Q., Li, G., & Sclaroff, S.,(2020)Real-time visual object tracking with natural language description. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 700–709
Gao, Y., Li, C., Zhu, Y., Tang, J., He, T., & Wang, F.,(2019) Deep adaptive fusion network for high performance RGBT tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 1–9
Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E., (2018) Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141
Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y., (2018) Acquisition of localization confidence for accurate object detection. In: European Conference on Computer Vision, pp. 784–799
Jung, I., Son, J., Baek, M., & Han, B., (2018) Real-time MDNet. In: European Conference on Computer Vision, pp. 83–98
Kart, U., Kamarainen, J.K., & Matas, J., (2018) How to make an RGBD tracker? In: European Conference on Computer Vision Workshop, pp. 1–15
Kart, U., Kamarainen, J.K., Matas, J., Fan, L., & Cricri, F., (2018) Depth masked discriminative correlation filter. In: International Conference on Pattern Recognition, pp. 2112–2117
Kart, U., Lukezic, A., Kristan, M., Kamarainen, J.K., & Matas, J., (2019) Object tracking by reconstruction with view-specific discriminative correlation filters. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1339–1348
Kim, D. Y., & Jeon, M. (2014). Data fusion of radar and image measurements for multi-object tracking via kalman filtering. Information Fusion, 278(10), 641–652.
MathSciNet Google Scholar
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., & et al., (2019) The seventh visual object tracking VOT2019 challenge results. In: IEEE International Conference on Computer Vision Workshop, pp. 1–36
Lan, X., Ye, M., Zhang, S., & Yuen, P.C., (2018) Robust collaborative discriminative learning for RGB-infrared tracking. In: AAAI Conference on Artificial Intelligence, pp. 1–8
Lan, X., Ye, M., Zhang, S., Zhou, H., & Yuen, P.C., (2018) Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters
Lan, X., Ye, M., Shao, R., & Zhong, B. (2019). Online non-negative multi-modality feature template learning for RGB-assisted infrared tracking. IEEE Access, 7, 67761–67771.
Article Google Scholar
Lan, X., Ye, M., Shao, R., Zhong, B., Yuen, P. C., & Zhou, H. (2019). Learning modality-consistency feature templates: A robust RGB-Infrared tracking system. IEEE Transactions on Industrial Electronics, 66(12), 9887–9897.
Article Google Scholar
Li, Y., & Zhu, J., (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265
Li, C., Hu, S., Gao, S., & Tang, J., (2016) Real-time grayscale-thermal tracking via laplacian sparse representation. In: International Conference on Multimedia Modeling, pp. 54–65
Li, C., Liang, X., Lu, Y., Zhao, N., & Tang, J., (2019) RGB-T object tracking: Benchmark and baseline. Pattern Recognition 96(12), 106,977
Li, C., Liu, L., Lu, A., Ji, Q., & Tang, J., (2020) Challenge-aware RGBT tracking. In: European Conference on Computer Vision, pp. 222–237
Li, C., Lu, A., Zheng, A., Tu, Z., & Tang, J., (2019) Multi-adapter RGBT tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 2262–2270
Li, Z., Tao, R., Gavves, E., Snoek, C.G., & Smeulders, A.W., (2017) Tracking by natural language specification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6495–6503
Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X., High performance visual tracking with siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980
Li, C., Zhao, N., Lu, Y., Zhu, C., & Tang, J., (2017) Weighted sparse representation regularized graph learning for RGB-T object tracking. In: ACM International Conference on Multimedia, pp. 1856–1864
Li, C., Zhu, C., Huang, Y., Tang, J., & Wang, L., (2018) Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: European Conference on Computer Vision, pp. 808–823
Li, C., Cheng, H., Hu, S., Liu, X., Tang, J., & Lin, L. (2016). Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transaction Image Processing, 25(12), 5743–5756.
Article MathSciNet Google Scholar
Liu, H., & Sun, F. (2012). Fusion tracking in color and infrared images using joint sparse representation. Information Sciences, 55(3), 590–599.
MathSciNet Google Scholar
Li, C., Wu, X., Zhao, N., Cao, X., & Tang, J. (2018). Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing, 281, 78–85.
Article Google Scholar
Luo, C., Sun, B., Yang, K., Lu, T., & Yeh, W. C. (2019). Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics & Technology, 99, 265–276.
Article Google Scholar
Lu, H., & Wang, D. (2019). Online Visual Tracking. Berlin: Springer.
Book Google Scholar
Megherbi, N., Ambellouis, S., Colot, O., & Cabestaing, F., (2005) Joint audio-video people tracking using belief theory. In: IEEE Conference on Advanced Video and Signal based Surveillance, pp. 135–140
Nam, H., & Han, B., (2016) Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302
Ning, J., Yang, J., Jiang, S., Zhang, L., & Yang, M.H., (2016) Object tracking via dual linear structured svm and explicit feature map. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4266–4274
Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., & Yang, M.H., (2019) Learning attribute-specific representations for visual tracking. In: AAAI Conference on Artificial Intelligence, pp. 8835–8842
Ronneberger, O., Fischer, P., & Brox, T.,(2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241
Seunghoon Hong Tackgeun You, S.K., & Han, B., (2015) Online tracking by learning discriminative saliency map with convolutional neural network. pp. 597–606
Simonyan, K., & Zisserman, A., (2015) Very deep convolutional networks for large-scale image recognition. In: IEEE International Conference on Learning Representations, pp. 1–14
Song, X., Zhao, H., Cui, J., Shao, X., Shibasaki, R., & Zha, H. (2013). An online system for multiple interacting targets tracking: Fusion of laser and vision, tracking and learning. ACM Transactions on Intelligent Systems and Technology, 4(1), 1–21.
Article Google Scholar
Voigtlaender, P., Luiten, J., Torr, P.H., & Leibe, B., (2020) Siam R-CNN: Visual tracking by re-detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6578–6588
Wang, N., & Yeung, D.Y., (2013) Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp. 1–9
Wang, C., Xu, C., Cui, Z., Zhou, L., Zhang, T., Zhang, X., & Yang, J., (2020) Cross-modal pattern-propagation for rgb-t tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7064–7073
Wang, Z., Xu, J., Liu, L., Zhu, F., & Shao, L., (2019) RANet: Ranking attention network for fast video object segmentation. In: IEEE International Conference on Computer Vision, pp. 3978–3987
Wang, D., Lu, H., Xiao, Z., & Yang, M. H. (2015). Inverse sparse tracker with a locally weighted distance metric. IEEE Transaction Image Processing, 24(9), 2446–2457.
MathSciNet MATH Google Scholar
Wang, W., Yan, Y., Winkler, S., & Sebe, N. (2016). Category specific dictionary learning for attribute specific feature selection. IEEE Transaction Image Processing, 25(3), 1465–1478.
Article MathSciNet Google Scholar
Woo, S., Park, J., Lee, J.Y., & Kweon, I.S.,(2018) CBAM: Convolutional block attention module. In: European Conference on Computer Vision, pp. 3–19
Wu, Y., Blasch, E., Chen, G., Bai, L., & Ling, H., (2011) Multiple source data fusion via sparse representation for robust visual tracking. In: International Conference on Information Fusion, pp. 1–8
Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G., (2020) SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, pp. 12,549–12,556
Yang, Z., Kumar, T., Chen, T., Su, J., & Luo, J. (2020). Grounding-tracking-integration. IEEE Transactions on Circuits and Systems for Video Technology.
Yang, R., Zhu, Y., Wang, X., Li, C., Tang, J., (2019) Learning target-oriented dual attention for robust RGB-T tracking. In: IEEE International Conference on Image Processing, pp. 1–8
Yu, Y., Xiong, Y., Huang, W., & Scott, M.R., (2020) Deformable siamese attention networks for visual object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6728–6737
Zhai, S., Shao, P., Liang, X., & Wang, X. (2019). Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing, 334, 172–181.
Article Google Scholar
Zhang, T., Ghanem, B., Liu, S., & Ahuja, N., (2012) Low-rank sparse learning for robust visual tracking. In: European Conference on Computer Vision, pp. 470–484
Zhang, X., Zhang, X., Du, X., Zhou, X., & Yin, J., (2018) Learning multi-domain convolutional network for RGB-T visual tracking. In: International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, pp. 1–6
Zhang, H., Zhang, L., Zhuo, L., & Zhang, J. (2020). Object tracking in RGB-T videos using modal-aware attention network and competitive learning. Sensors,20(2).
Zhang, P., Zhao, J., Wang, D., Lu, H., & Yang, X., Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Transactions on Image Processing 30, 3335 – 3347
Zhang, Z., & Peng, H., (2019) Deeper and wider siamese networks for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4591–4600
Zhu, Y., Li, C., Lu, Y., Lin, L., Luo, B., & Tang, J., (2018) FANet: Quality-aware feature aggregation network for RGB-T tracking. CoRR abs/1811.09855
Zhu, Y., Li, C., Luo, B., Tang, J., & Wang, X., (2019) Dense feature aggregation and pruning for RGBT tracking. In: ACM International Conference on Multimedia, pp. 465–472

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62022021, Grant 61806037, Grant 61872056, and Grant 61725202; and in part by the Science and Technology Innovation Foundation of Dalian under Grant 2020JJ26GX036; and in part by the Fundamental Research Funds for the Central Universities under Grant DUT21LAB127.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, China
Pengyu Zhang, Dong Wang & Huchuan Lu
Ningbo Institute, Dalian University of Technology, Ningbo, 315016, China
Pengyu Zhang, Dong Wang & Huchuan Lu
Remark Holdings, Las Vegas, NV, USA
Xiaoyun Yang

Authors

Pengyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huchuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Wang.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 30142 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, P., Wang, D., Lu, H. et al. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int J Comput Vis 129, 2714–2729 (2021). https://doi.org/10.1007/s11263-021-01495-3

Download citation

Received: 15 December 2020
Accepted: 09 June 2021
Published: 15 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11263-021-01495-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation