Abstract
In recent years, deep-learning-based visual object tracking has obtained promising results. However, a drastic performance drop is observed when transferring a pre-trained model to changing weather conditions, such as hazy imaging scenarios, where the data distribution differs from that of a natural training set. This problem challenges the open-world practical applications of accurate target tracking. In principle, visual tracking performance relies on the discriminative degree of features between the target and its surroundings, rather than the image-level visual quality. To this end, we design a feature restoration transformer that adaptively enhances the representation capability of the extracted visual features for robust tracking in both natural and hazy scenarios. Specifically, a feature restoration transformer is constructed with dedicated self-attention hierarchies for the refinement of potentially contaminated deep feature maps. We endow the feature extraction process with a refinement mechanism typically for hazy imaging scenarios, establishing a tracking system that is robust against foggy videos. In essence, the feature restoration transformer is jointly trained with a Siamese tracking transformer. Intuitively, the supervision for learning discriminative and salient features is facilitated by the entire restoration tracking system. The experimental results obtained on hazy imaging scenarios demonstrate the merits and superiority of the proposed restoration tracking system, with complementary restoration power to image-level dehazing. In addition, consistent advantages of our design can be observed when generalised to different video attributes, demonstrating its capacity to deal with open-world scenarios.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of Data and Materials
The datasets supporting the conclusions of this article are included within the article.
References
Berman, D., & Avidan, S. (2016). Non-local image dehazing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1674–1682).
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In Proceedings of the European conference on computer vision (pp. 850–865).
Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In IEEE international conference on computer vision (pp. 6182–6191).
Cai, B., Xu, X., Jia, K., et al. (2016). Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25(11), 5187–5198.
Cao, Z., Fu, C., Ye, J., Li, B., & Li, Y. (2021). Hift: Hierarchical feature transformer for aerial tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15457–15466).
Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., & Fu, C. (2022). Tctrack: Temporal contexts for aerial tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14798–14808)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Proceedings of the European conference on computer vision (pp. 213–229).
Charbonnier, P., Blanc-Feraud, L., Aubert, G., & Barlaud, M. (1994). Two deterministic half-quadratic regularization algorithms for computed imaging. IEEE International Conference on Image Processing, 2, 168–172.
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., & Lu, H. (2021). Transformer tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8126–8135).
Chen, Z., Zhong, B., Li, G., Zhang, S., & Ji, R. (2020). Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6668–6677).
Cui, Y., Jiang, C., Wang, L., & Wu, G. (2022a). Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13608–13618).
Cui, Y., Jiang, C., Wang, L., & Wu, G. M. (2022b). End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18–24).
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2017). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), 1561–1575.
Danelljan, M., Bhat, G., Khan, F. S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Danelljan, M., Gool, L. V., & Timofte, R. (2020). Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7183–7192)
Dosovitskiy, A., Beyer, L., Kolesnikov A, Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., & Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5374–5383).
Gao, S., Zhou, C., Ma, C., Wang, X., & Yuan, J. (2022). Aiatrack: Attention in attention for transformer visual tracking. In European conference on computer vision, Springer (pp. 146–164).
Guo, D., Wang, J., Cui, Y., Wang, Z., & Chen, S. (2020). Siamcar: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6269–6277).
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., & Shen, C. (2021). Graph attention tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9543–9552).
He, K., Sun, J., & Tang, X. (2010). Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2341–2353.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 770–778).
Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In Proceedings of the European conference on computer vision, Springer (pp. 749–765).
Hong, M., Xie, Y., Li, C., & Qu, Y. (2020). Distilling image dehazing with heterogeneous task imitation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3462–3471).
Huang, L., Zhao, X., & Huang, K. (2019). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1562.
Jiao, L., Wang, D., Bai, Y., Chen, P., & Liu, F. (2021). Deep learning in visual tracking: A review. IEEE Transactions on Neural Networks and Learning Systems, 34, 5497.
Kang, B., Chen, X., Wang, D., Peng, H., & Lu, H. (2023). Exploring lightweight hierarchical vision transformers for efficient visual tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9612–9621)
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., & Fernandez, G. (2018). The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision (ECCV) workshops pp 1–50.
Li, A., Lin, M., Wu, Y., Yang, M. H., & Yan, S. (2016). Nus-pro: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 335–349.
Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp 8971–8980).
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J,. & Yan, J. (2019). Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4282–4291).
Li, S., Danelljan, M., Ding, H., Huang, T. E., & Yu, F. (2022). Tracking every thing in the wild. In European conference on computer vision, Springer (pp. 498–515).
Li, S., Yang, Y., Zeng, D., & Wang, X. (2023). Adaptive and background-aware vision transformer for real-time UAV tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13989–14000).
Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 24(12), 5630–5644.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll‘ar, P., & Zitnick, C. L. (2014) Microsoft coco: Common objects in context. In Proceedings of the European conference on computer vision (pp 740–755).
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Mayer, C., Danelljan, M., Paudel, D. P., & Van Gool, L. (2021). Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13444–13454).
Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D. P., Yu, F., & Van Gool, L. (2022). Transforming model prediction for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8731–8740).
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (pp. 300–317).
Qin, X., Wang, Z., Bai, Y., Xie, X., & Jia, H. (2020). Ffa-net: Feature fusion attention network for single image dehazing. In AAAI conference on artificial intelligence (pp. 11908–11915).
Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., & Yang, M. H. (2016). Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European conference on computer vision (pp. 154–169).
Song, Y., He, Z., Qian, H., & Du, X. (2023). Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32, 1927–1941.
Tao, R., Gavves, E., & Smeulders, A. W. (2016). Siamese instance search for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (pp. 1420–1429).
Wang, N., Shi, J., Yeung, D. Y., & Jia, J. (2015). Understanding and diagnosing visual tracking systems. In IEEE international conference on computer vision (pp. 3101–3109).
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., & Torr, P. H. (2019). Fast online object tracking and segmentation: A unifying approach. In IEEE conference on computer vision and pattern recognition (pp. 1328–1338).
Wang, X., Tang, J., Luo, B., Wang, Y., Tian, Y., & Wu, F. (2021). Tracking by joint local and global search: A target-aware attention-based approach. IEEE Transactions on Neural Networks and Learning Systems, 33(11), 6931–6945.
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. (2021). Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22–31).
Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In IEEE Conference on computer vision and pattern recognition (pp. 2411–2418).
Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.
Xu, T., Feng, Z. H., Wu, X. J., & Kittler, J. (2019). Joint group feature selection and discriminative filter learning for robust visual object tracking. In Proceedings of the IEEE international conference on computer vision (pp. 7950–7960).
Xu, T., Feng, Z. H., Wu, X. J., & Kittler, J. (2020). Learning low-rank and sparse discriminative correlation filters for coarse-to-fine visual object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 30(10), 3727–3739.
Xu, T., Feng, Z., Wu, X. J., & Kittler, J. (2021). Adaptive channel selection for robust visual object tracking with discriminative correlation filters. International Journal of Computer Vision, 129, 1359–1375.
Xu, T., Feng, Z., Wu, X. J., & Kittler, J. (2023). Toward robust visual object tracking with independent target-agnostic detection and effective siamese cross-task interaction. IEEE Transactions on Image Processing, 32, 1541–1554.
Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G. (2020b). SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In AAAI conference on artificial intelligence (pp. 12549–12556).
Yan, B., Peng, H., Fu, J., Wang, D., & Lu, H. (2021). Learning spatio-temporal transformer for visual tracking. arXiv preprint arXiv:2103.17154
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., & Wei, Y. (2022). Motr: End-to-end multiple-object tracking with transformer. In European conference on computer vision, Springer (pp. 659–675).
Zhang, Y., Ding, L., & Sharma, G. (2017). Hazerd: An outdoor scene dataset and benchmark for single image dehazing. In ICIP (pp. 3205–3209).
Zhu, H., Peng, X., Chandrasekhar, V., Li, L., & Lim, J. H. (2018). Dehazegan: When image dehazing meets differential programming. In International joint conference on artificial intelligence (pp. 1234–1240).
Zhu, Q., Mai, J., & Shao, L. (2014). Single image dehazing using color attenuation prior. In BMVC.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (62106089, 62020106012, 62332008, 62336004), and in part by the the Engineering and Physical Sciences Research Council (EPSRC), U.K. (Grant EP/N007743/1, Grant MURI/EPSRC/DSTL, and Grant EP/R018456/1).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Communicated by Hong Liu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, T., Pan, Y., Feng, Z. et al. Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking. Int J Comput Vis 132, 6021–6038 (2024). https://doi.org/10.1007/s11263-024-02182-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-024-02182-9