Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking

Xu, Tianyang; Pan, Yifan; Feng, Zhenhua; Zhu, Xuefeng; Cheng, Chunyang; Wu, Xiao-Jun; Kittler, Josef

doi:10.1007/s11263-024-02182-9

Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking

Published: 12 July 2024

Volume 132, pages 6021–6038, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Tianyang Xu ORCID: orcid.org/0000-0002-9015-3128¹,
Yifan Pan¹,
Zhenhua Feng²,
Xuefeng Zhu¹,
Chunyang Cheng¹,
Xiao-Jun Wu¹ &
…
Josef Kittler²

834 Accesses
Explore all metrics

Abstract

In recent years, deep-learning-based visual object tracking has obtained promising results. However, a drastic performance drop is observed when transferring a pre-trained model to changing weather conditions, such as hazy imaging scenarios, where the data distribution differs from that of a natural training set. This problem challenges the open-world practical applications of accurate target tracking. In principle, visual tracking performance relies on the discriminative degree of features between the target and its surroundings, rather than the image-level visual quality. To this end, we design a feature restoration transformer that adaptively enhances the representation capability of the extracted visual features for robust tracking in both natural and hazy scenarios. Specifically, a feature restoration transformer is constructed with dedicated self-attention hierarchies for the refinement of potentially contaminated deep feature maps. We endow the feature extraction process with a refinement mechanism typically for hazy imaging scenarios, establishing a tracking system that is robust against foggy videos. In essence, the feature restoration transformer is jointly trained with a Siamese tracking transformer. Intuitively, the supervision for learning discriminative and salient features is facilitated by the entire restoration tracking system. The experimental results obtained on hazy imaging scenarios demonstrate the merits and superiority of the proposed restoration tracking system, with complementary restoration power to image-level dehazing. In addition, consistent advantages of our design can be observed when generalised to different video attributes, demonstrating its capacity to deal with open-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Dehazing YOLO for Object Detection

Edge-guided dynamic feature fusion network for object detection under foggy conditions

Article 03 December 2022

Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition

Article 27 May 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of Data and Materials

The datasets supporting the conclusions of this article are included within the article.

References

Berman, D., & Avidan, S. (2016). Non-local image dehazing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1674–1682).
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In Proceedings of the European conference on computer vision (pp. 850–865).
Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In IEEE international conference on computer vision (pp. 6182–6191).
Cai, B., Xu, X., Jia, K., et al. (2016). Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25(11), 5187–5198.
Article MathSciNet Google Scholar
Cao, Z., Fu, C., Ye, J., Li, B., & Li, Y. (2021). Hift: Hierarchical feature transformer for aerial tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15457–15466).
Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., & Fu, C. (2022). Tctrack: Temporal contexts for aerial tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14798–14808)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Proceedings of the European conference on computer vision (pp. 213–229).
Charbonnier, P., Blanc-Feraud, L., Aubert, G., & Barlaud, M. (1994). Two deterministic half-quadratic regularization algorithms for computed imaging. IEEE International Conference on Image Processing, 2, 168–172.
Article Google Scholar
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., & Lu, H. (2021). Transformer tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8126–8135).
Chen, Z., Zhong, B., Li, G., Zhang, S., & Ji, R. (2020). Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6668–6677).
Cui, Y., Jiang, C., Wang, L., & Wu, G. (2022a). Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13608–13618).
Cui, Y., Jiang, C., Wang, L., & Wu, G. M. (2022b). End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18–24).
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2017). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), 1561–1575.
Article Google Scholar
Danelljan, M., Bhat, G., Khan, F. S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Danelljan, M., Gool, L. V., & Timofte, R. (2020). Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7183–7192)
Dosovitskiy, A., Beyer, L., Kolesnikov A, Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., & Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5374–5383).
Gao, S., Zhou, C., Ma, C., Wang, X., & Yuan, J. (2022). Aiatrack: Attention in attention for transformer visual tracking. In European conference on computer vision, Springer (pp. 146–164).
Guo, D., Wang, J., Cui, Y., Wang, Z., & Chen, S. (2020). Siamcar: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6269–6277).
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., & Shen, C. (2021). Graph attention tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9543–9552).
He, K., Sun, J., & Tang, X. (2010). Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2341–2353.
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 770–778).
Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In Proceedings of the European conference on computer vision, Springer (pp. 749–765).
Hong, M., Xie, Y., Li, C., & Qu, Y. (2020). Distilling image dehazing with heterogeneous task imitation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3462–3471).
Huang, L., Zhao, X., & Huang, K. (2019). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1562.
Article Google Scholar
Jiao, L., Wang, D., Bai, Y., Chen, P., & Liu, F. (2021). Deep learning in visual tracking: A review. IEEE Transactions on Neural Networks and Learning Systems, 34, 5497.
Article Google Scholar
Kang, B., Chen, X., Wang, D., Peng, H., & Lu, H. (2023). Exploring lightweight hierarchical vision transformers for efficient visual tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9612–9621)
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., & Fernandez, G. (2018). The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision (ECCV) workshops pp 1–50.
Li, A., Lin, M., Wu, Y., Yang, M. H., & Yan, S. (2016). Nus-pro: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 335–349.
Article Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp 8971–8980).
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J,. & Yan, J. (2019). Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4282–4291).
Li, S., Danelljan, M., Ding, H., Huang, T. E., & Yu, F. (2022). Tracking every thing in the wild. In European conference on computer vision, Springer (pp. 498–515).
Li, S., Yang, Y., Zeng, D., & Wang, X. (2023). Adaptive and background-aware vision transformer for real-time UAV tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13989–14000).
Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 24(12), 5630–5644.
Article MathSciNet Google Scholar
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll‘ar, P., & Zitnick, C. L. (2014) Microsoft coco: Common objects in context. In Proceedings of the European conference on computer vision (pp 740–755).
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Mayer, C., Danelljan, M., Paudel, D. P., & Van Gool, L. (2021). Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13444–13454).
Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D. P., Yu, F., & Van Gool, L. (2022). Transforming model prediction for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8731–8740).
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (pp. 300–317).
Qin, X., Wang, Z., Bai, Y., Xie, X., & Jia, H. (2020). Ffa-net: Feature fusion attention network for single image dehazing. In AAAI conference on artificial intelligence (pp. 11908–11915).
Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., & Yang, M. H. (2016). Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European conference on computer vision (pp. 154–169).
Song, Y., He, Z., Qian, H., & Du, X. (2023). Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32, 1927–1941.
Article Google Scholar
Tao, R., Gavves, E., & Smeulders, A. W. (2016). Siamese instance search for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (pp. 1420–1429).
Wang, N., Shi, J., Yeung, D. Y., & Jia, J. (2015). Understanding and diagnosing visual tracking systems. In IEEE international conference on computer vision (pp. 3101–3109).
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., & Torr, P. H. (2019). Fast online object tracking and segmentation: A unifying approach. In IEEE conference on computer vision and pattern recognition (pp. 1328–1338).
Wang, X., Tang, J., Luo, B., Wang, Y., Tian, Y., & Wu, F. (2021). Tracking by joint local and global search: A target-aware attention-based approach. IEEE Transactions on Neural Networks and Learning Systems, 33(11), 6931–6945.
Article Google Scholar
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. (2021). Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22–31).
Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In IEEE Conference on computer vision and pattern recognition (pp. 2411–2418).
Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.
Xu, T., Feng, Z. H., Wu, X. J., & Kittler, J. (2019). Joint group feature selection and discriminative filter learning for robust visual object tracking. In Proceedings of the IEEE international conference on computer vision (pp. 7950–7960).
Xu, T., Feng, Z. H., Wu, X. J., & Kittler, J. (2020). Learning low-rank and sparse discriminative correlation filters for coarse-to-fine visual object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 30(10), 3727–3739.
Article Google Scholar
Xu, T., Feng, Z., Wu, X. J., & Kittler, J. (2021). Adaptive channel selection for robust visual object tracking with discriminative correlation filters. International Journal of Computer Vision, 129, 1359–1375.
Article Google Scholar
Xu, T., Feng, Z., Wu, X. J., & Kittler, J. (2023). Toward robust visual object tracking with independent target-agnostic detection and effective siamese cross-task interaction. IEEE Transactions on Image Processing, 32, 1541–1554.
Article Google Scholar
Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G. (2020b). SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In AAAI conference on artificial intelligence (pp. 12549–12556).
Yan, B., Peng, H., Fu, J., Wang, D., & Lu, H. (2021). Learning spatio-temporal transformer for visual tracking. arXiv preprint arXiv:2103.17154
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., & Wei, Y. (2022). Motr: End-to-end multiple-object tracking with transformer. In European conference on computer vision, Springer (pp. 659–675).
Zhang, Y., Ding, L., & Sharma, G. (2017). Hazerd: An outdoor scene dataset and benchmark for single image dehazing. In ICIP (pp. 3205–3209).
Zhu, H., Peng, X., Chandrasekhar, V., Li, L., & Lim, J. H. (2018). Dehazegan: When image dehazing meets differential programming. In International joint conference on artificial intelligence (pp. 1234–1240).
Zhu, Q., Mai, J., & Shao, L. (2014). Single image dehazing using color attenuation prior. In BMVC.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (62106089, 62020106012, 62332008, 62336004), and in part by the the Engineering and Physical Sciences Research Council (EPSRC), U.K. (Grant EP/N007743/1, Grant MURI/EPSRC/DSTL, and Grant EP/R018456/1).

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Tianyang Xu, Yifan Pan, Xuefeng Zhu, Chunyang Cheng & Xiao-Jun Wu
School of Computer Science and Electronic Engineering and the Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, GU2 7XH, UK
Zhenhua Feng & Josef Kittler

Authors

Tianyang Xu
View author publications
You can also search for this author inPubMed Google Scholar
Yifan Pan
View author publications
You can also search for this author inPubMed Google Scholar
Zhenhua Feng
View author publications
You can also search for this author inPubMed Google Scholar
Xuefeng Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Chunyang Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Xiao-Jun Wu
View author publications
You can also search for this author inPubMed Google Scholar
Josef Kittler
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Tianyang Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Communicated by Hong Liu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, T., Pan, Y., Feng, Z. et al. Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking. Int J Comput Vis 132, 6021–6038 (2024). https://doi.org/10.1007/s11263-024-02182-9

Download citation

Received: 15 December 2023
Accepted: 04 July 2024
Published: 12 July 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s11263-024-02182-9

Keywords

Part of a collection:

Special Issue on Open-World Visual Recognition

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Dehazing YOLO for Object Detection

Edge-guided dynamic feature fusion network for object detection under foggy conditions

Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition

Explore related subjects

Availability of Data and Materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now