Skip to main content

Advertisement

Log in

Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In recent years, deep-learning-based visual object tracking has obtained promising results. However, a drastic performance drop is observed when transferring a pre-trained model to changing weather conditions, such as hazy imaging scenarios, where the data distribution differs from that of a natural training set. This problem challenges the open-world practical applications of accurate target tracking. In principle, visual tracking performance relies on the discriminative degree of features between the target and its surroundings, rather than the image-level visual quality. To this end, we design a feature restoration transformer that adaptively enhances the representation capability of the extracted visual features for robust tracking in both natural and hazy scenarios. Specifically, a feature restoration transformer is constructed with dedicated self-attention hierarchies for the refinement of potentially contaminated deep feature maps. We endow the feature extraction process with a refinement mechanism typically for hazy imaging scenarios, establishing a tracking system that is robust against foggy videos. In essence, the feature restoration transformer is jointly trained with a Siamese tracking transformer. Intuitively, the supervision for learning discriminative and salient features is facilitated by the entire restoration tracking system. The experimental results obtained on hazy imaging scenarios demonstrate the merits and superiority of the proposed restoration tracking system, with complementary restoration power to image-level dehazing. In addition, consistent advantages of our design can be observed when generalised to different video attributes, demonstrating its capacity to deal with open-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of Data and Materials

The datasets supporting the conclusions of this article are included within the article.

References

  • Berman, D., & Avidan, S. (2016). Non-local image dehazing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1674–1682).

  • Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In Proceedings of the European conference on computer vision (pp. 850–865).

  • Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In IEEE international conference on computer vision (pp. 6182–6191).

  • Cai, B., Xu, X., Jia, K., et al. (2016). Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25(11), 5187–5198.

    Article  MathSciNet  Google Scholar 

  • Cao, Z., Fu, C., Ye, J., Li, B., & Li, Y. (2021). Hift: Hierarchical feature transformer for aerial tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15457–15466).

  • Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., & Fu, C. (2022). Tctrack: Temporal contexts for aerial tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14798–14808)

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Proceedings of the European conference on computer vision (pp. 213–229).

  • Charbonnier, P., Blanc-Feraud, L., Aubert, G., & Barlaud, M. (1994). Two deterministic half-quadratic regularization algorithms for computed imaging. IEEE International Conference on Image Processing, 2, 168–172.

    Article  Google Scholar 

  • Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., & Lu, H. (2021). Transformer tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8126–8135).

  • Chen, Z., Zhong, B., Li, G., Zhang, S., & Ji, R. (2020). Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6668–6677).

  • Cui, Y., Jiang, C., Wang, L., & Wu, G. (2022a). Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13608–13618).

  • Cui, Y., Jiang, C., Wang, L., & Wu, G. M. (2022b). End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18–24).

  • Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2017). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), 1561–1575.

    Article  Google Scholar 

  • Danelljan, M., Bhat, G., Khan, F. S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

  • Danelljan, M., Gool, L. V., & Timofte, R. (2020). Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7183–7192)

  • Dosovitskiy, A., Beyer, L., Kolesnikov A, Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  • Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., & Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5374–5383).

  • Gao, S., Zhou, C., Ma, C., Wang, X., & Yuan, J. (2022). Aiatrack: Attention in attention for transformer visual tracking. In European conference on computer vision, Springer (pp. 146–164).

  • Guo, D., Wang, J., Cui, Y., Wang, Z., & Chen, S. (2020). Siamcar: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6269–6277).

  • Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., & Shen, C. (2021). Graph attention tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9543–9552).

  • He, K., Sun, J., & Tang, X. (2010). Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2341–2353.

    Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 770–778).

  • Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In Proceedings of the European conference on computer vision, Springer (pp. 749–765).

  • Hong, M., Xie, Y., Li, C., & Qu, Y. (2020). Distilling image dehazing with heterogeneous task imitation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3462–3471).

  • Huang, L., Zhao, X., & Huang, K. (2019). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1562.

    Article  Google Scholar 

  • Jiao, L., Wang, D., Bai, Y., Chen, P., & Liu, F. (2021). Deep learning in visual tracking: A review. IEEE Transactions on Neural Networks and Learning Systems, 34, 5497.

    Article  Google Scholar 

  • Kang, B., Chen, X., Wang, D., Peng, H., & Lu, H. (2023). Exploring lightweight hierarchical vision transformers for efficient visual tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9612–9621)

  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., & Fernandez, G. (2018). The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision (ECCV) workshops pp 1–50.

  • Li, A., Lin, M., Wu, Y., Yang, M. H., & Yan, S. (2016). Nus-pro: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 335–349.

    Article  Google Scholar 

  • Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp 8971–8980).

  • Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J,. & Yan, J. (2019). Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4282–4291).

  • Li, S., Danelljan, M., Ding, H., Huang, T. E., & Yu, F. (2022). Tracking every thing in the wild. In European conference on computer vision, Springer (pp. 498–515).

  • Li, S., Yang, Y., Zeng, D., & Wang, X. (2023). Adaptive and background-aware vision transformer for real-time UAV tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13989–14000).

  • Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 24(12), 5630–5644.

    Article  MathSciNet  Google Scholar 

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll‘ar, P., & Zitnick, C. L. (2014) Microsoft coco: Common objects in context. In Proceedings of the European conference on computer vision (pp 740–755).

  • Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

  • Mayer, C., Danelljan, M., Paudel, D. P., & Van Gool, L. (2021). Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13444–13454).

  • Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D. P., Yu, F., & Van Gool, L. (2022). Transforming model prediction for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8731–8740).

  • Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (pp. 300–317).

  • Qin, X., Wang, Z., Bai, Y., Xie, X., & Jia, H. (2020). Ffa-net: Feature fusion attention network for single image dehazing. In AAAI conference on artificial intelligence (pp. 11908–11915).

  • Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., & Yang, M. H. (2016). Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European conference on computer vision (pp. 154–169).

  • Song, Y., He, Z., Qian, H., & Du, X. (2023). Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32, 1927–1941.

    Article  Google Scholar 

  • Tao, R., Gavves, E., & Smeulders, A. W. (2016). Siamese instance search for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (pp. 1420–1429).

  • Wang, N., Shi, J., Yeung, D. Y., & Jia, J. (2015). Understanding and diagnosing visual tracking systems. In IEEE international conference on computer vision (pp. 3101–3109).

  • Wang, Q., Zhang, L., Bertinetto, L., Hu, W., & Torr, P. H. (2019). Fast online object tracking and segmentation: A unifying approach. In IEEE conference on computer vision and pattern recognition (pp. 1328–1338).

  • Wang, X., Tang, J., Luo, B., Wang, Y., Tian, Y., & Wu, F. (2021). Tracking by joint local and global search: A target-aware attention-based approach. IEEE Transactions on Neural Networks and Learning Systems, 33(11), 6931–6945.

    Article  Google Scholar 

  • Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. (2021). Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22–31).

  • Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In IEEE Conference on computer vision and pattern recognition (pp. 2411–2418).

  • Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.

  • Xu, T., Feng, Z. H., Wu, X. J., & Kittler, J. (2019). Joint group feature selection and discriminative filter learning for robust visual object tracking. In Proceedings of the IEEE international conference on computer vision (pp. 7950–7960).

  • Xu, T., Feng, Z. H., Wu, X. J., & Kittler, J. (2020). Learning low-rank and sparse discriminative correlation filters for coarse-to-fine visual object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 30(10), 3727–3739.

    Article  Google Scholar 

  • Xu, T., Feng, Z., Wu, X. J., & Kittler, J. (2021). Adaptive channel selection for robust visual object tracking with discriminative correlation filters. International Journal of Computer Vision, 129, 1359–1375.

    Article  Google Scholar 

  • Xu, T., Feng, Z., Wu, X. J., & Kittler, J. (2023). Toward robust visual object tracking with independent target-agnostic detection and effective siamese cross-task interaction. IEEE Transactions on Image Processing, 32, 1541–1554.

    Article  Google Scholar 

  • Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G. (2020b). SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In AAAI conference on artificial intelligence (pp. 12549–12556).

  • Yan, B., Peng, H., Fu, J., Wang, D., & Lu, H. (2021). Learning spatio-temporal transformer for visual tracking. arXiv preprint arXiv:2103.17154

  • Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., & Wei, Y. (2022). Motr: End-to-end multiple-object tracking with transformer. In European conference on computer vision, Springer (pp. 659–675).

  • Zhang, Y., Ding, L., & Sharma, G. (2017). Hazerd: An outdoor scene dataset and benchmark for single image dehazing. In ICIP (pp. 3205–3209).

  • Zhu, H., Peng, X., Chandrasekhar, V., Li, L., & Lim, J. H. (2018). Dehazegan: When image dehazing meets differential programming. In International joint conference on artificial intelligence (pp. 1234–1240).

  • Zhu, Q., Mai, J., & Shao, L. (2014). Single image dehazing using color attenuation prior. In BMVC.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (62106089, 62020106012, 62332008, 62336004), and in part by the the Engineering and Physical Sciences Research Council (EPSRC), U.K. (Grant EP/N007743/1, Grant MURI/EPSRC/DSTL, and Grant EP/R018456/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyang Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Communicated by Hong Liu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, T., Pan, Y., Feng, Z. et al. Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking. Int J Comput Vis 132, 6021–6038 (2024). https://doi.org/10.1007/s11263-024-02182-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-024-02182-9

Keywords