Skip to main content
Log in

TransRA: transformer and residual attention fusion for single remote sensing image dehazing

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Haze seriously reduces the quality of optical remote sensing images, resulting in poor performance in many applications, such as remote sensing image change detection and classification. In recent years, deep learning models have achieved convincing performance in image dehazing, which has attracted more and more attention in haze removal of remote sensing images. However, the existing deep learning-based methods are less able to recover the fine details of remote sensing images that suffered from haze, especially the cases of nonhomogeneous haze. In this paper, we propose a two-branch neural network fused with Transformer and residual attention to dehaze a single remote sensing image. Specifically, our upper branch is a U-shaped encoder–decoder architecture, using an efficient multi-head self-attention Transformer for capturing long-range dependencies. The lower branch is an attention stack of residual channels to enhance fitting capability of models and complement fine-detailed features for upper branch. Finally, the features of the two branches are stacked and mapped to the haze-free remote sensing image by fusion block. Extensive experiments demonstrate that our TransRA achieves superior performance against other dehazing competitors both qualitatively and quantitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

  • Berman, D., & Avidan, S. (2016). Non-local image dehazing. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1674–1682).

  • Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., ...Amodei, A. (2020). Language models are few-shot learners. Preprint at arXiv:2005.14165.

  • Cai, B., Xu, X., Jia, K., Qing, C., & Tao, D. (2016). Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25(11), 5187–5198.

    Article  MathSciNet  MATH  Google Scholar 

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In European conference on computer vision (pp. 213–229). Springer.

  • Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., & Gao, W. (2020). Pre-trained image processing transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp 12,299–12,310).

  • Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. (2020). Generative pretraining from pixels. In International conference on machine learning, PMLR (pp. 1691–1703).

  • Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., & Kalantidis, Y. (2019). Graph-based global reasoning networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 433–442).

  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (pp 4171–4186).

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. Preprint at arXiv:2010.11929.

  • Fattal, R. (2015). Dehazing using color-lines. ACM Transactions on Graphics, 34(1), 1–14.

    Article  Google Scholar 

  • Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., & Lu, H. (2019). Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3146–3154).

  • Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).

  • Grohnfeldt, C., Schmitt, M., & Zhu, X. (2018). A conditional generative adversarial network to fuse SAR and multispectral optical data for cloud removal from sentinel-2 images. In IGARSS 2018–2018 IEEE international geoscience and remote sensing symposium (pp. 1726–1729). IEEE.

  • Guo, J., Yang, J., Yue, H., Tan, H., Hou, C., & Li, K. (2021). Rsdehazenet: Dehazing network with channel refinement for multispectral remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 59(3), 2535–2549. https://doi.org/10.1109/TGRS.2020.3004556

    Article  Google Scholar 

  • He, K., Sun, J., & Tang, X. (2011). Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2341–2353.

    Article  Google Scholar 

  • Huang, B., Zhi, L., Yang, C., Sun, F., & Song, Y. (2020). Single satellite optical imagery dehazing using SAR image prior based on conditional generative adversarial networks. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 1806–1813).

  • Jiang, P.-T., Hou, Q., Cao, Y., Cheng, M. M., Wei, Y., & Xiong, H. K. (2019). Integral object mining via online attention accumulation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2070–2079).

  • Jiang, Y., Chang, S., & Wang, Z. (2021). Transgan: Two transformers can make one strong GAN. arXiv preprint arXiv:2102.07074

  • Ke, L., Liao, P., Zhang, X., Chen, G., Zhu, K., Wang, Q., & Tan, X. (2019). Haze removal from a single remote sensing image based on a fully convolutional neural network. Journal of Applied Remote Sensing, 13(3), 036,505.

    Article  Google Scholar 

  • Kim, J.-H., Jang, W.-D., Sim, J.-Y., & Kim, C. S. (2013). Optimized contrast enhancement for real-time image and video dehazing. Journal of Visual Communication and Image Representation, 24(3), 410–425.

    Article  Google Scholar 

  • Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., & Houlsby, N. (2020). Big transfer (bit): General visual representation learning. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part V 16 (pp. 491–507). Springer.

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  • Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681–4690).

  • Li, B., Peng, X., Wang, Z., Xu, J., & Feng, D. (2017). Aod-net: All-in-one dehazing network. In 2017 IEEE international conference on computer vision (ICCV) (pp. 4780–4788).

  • Li, B., Ren, W., Fu, D., Tao, D., Feng, D., Zeng, W., & Wang, Z. (2019). Benchmarking single-image dehazing and beyond. IEEE Transactions on Image Processing, 28(1), 492–505.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, Y., & Chen, X. (2020). A coarse-to-fine two-stage attentive network for haze removal of remote sensing images. IEEE Geoscience and Remote Sensing Letters. https://doi.org/10.1109/LGRS.2020.3006533

    Article  Google Scholar 

  • Lin, D., Xu, G., Wang, X., Wang, Y., Sun, X., & Fu, K. (2019). A remote sensing image dataset for cloud removal. arXiv preprint arXiv:1901.00600.

  • Liu, C., Hu, J., Lin, Y., Wu, S., & Huang, W. (2011). Haze detection, perfection and removal for high spatial resolution satellite imagery. International Journal of Remote Sensing, 32(23), 8685–8697. https://doi.org/10.1080/01431161.2010.547884

    Article  Google Scholar 

  • Long, J., Shi, Z., Tang, W., & Zhang, C. (2014). Single remote sensing image dehazing. IEEE Geoscience and Remote Sensing Letters, 11(1), 59–63. https://doi.org/10.1109/LGRS.2013.2245857

    Article  Google Scholar 

  • Makarau, A., Richter, R., Muller, R., & Reinartz, P. (2014). Haze detection and removal in remotely sensed multispectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 52(9), 5895–5905.

    Article  Google Scholar 

  • Mehta, A., Sinha, H., Mandal, M., & Narang, P. (2021). Domain-aware unsupervised hyperspectral reconstruction for aerial image dehazing. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 413–422).

  • Meng, G., Wang, Y., Duan, J., Xiang, S., & Pan, C. (2013). Efficient image dehazing with boundary constraint and contextual regularization. In Proceedings of the IEEE international conference on computer vision (ICCV).

  • Oakley, J. P., & Satherley, B. L. (1998). Improving image quality in poor visibility conditions using a physical model for contrast degradation. IEEE Transactions on Image Processing, 7(2), 167–179.

    Article  Google Scholar 

  • Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In International conference on machine learning, PMLR (pp. 1310–1318).

  • Qin, M., Xie, F., Li, W., Shi, Z., & Zhang, H. (2018). Dehazing for multispectral remote sensing images based on a convolutional neural network with the residual architecture. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(5), 1645–1655.

    Article  Google Scholar 

  • Qin, X., Wang, Z., Bai, Y., Xie, X., & Jia, H. (2020). FFA-NET: Feature fusion attention network for single image dehazing. pp 11,908–11,915

  • Radford, A., Child, R. W., Luan, D., Wu, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.

    Google Scholar 

  • Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.

  • Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., & Yang, M. H. (2016). Single image dehazing via multi-scale convolutional neural networks. In European conference on computer vision (pp. 154–169). Springer.

  • Ren, W., Ma, L., Zhang, J., Pan, J., Cao, X., Liu, W., & Yang, M. H. (2018). Gated fusion network for single image dehazing. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3253–3261).

  • Shao, Y., Li, L., Ren, W., Gao, C., & Sang, N. (2020). Domain adaptation for image dehazing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2808–2817).

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Tan, H., & Bansal, M. (2019). Lxmert: Learning cross-modality encoder representations from transformers. Preprint at arXiv:1908.07490v1.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998–6008).

  • Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X. (2017). Residual attention network for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156–3164).

  • Wang, Q., Li, B., Bai, Y., Zhu, J., Li, C., Wong, D. F., & Chao, L. S. (2019). Learning deep transformer models for machine translation. Preprint at arXiv:1906.01787.

  • Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803).

  • Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., & Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677.

  • Xu, L., Zhao, D., Yan, Y., Kwong, S., Chen, J., & Duan, L. Y. (2019). Iders: Iterative dehazing method for single remote sensing image. Information Sciences, 489, 50–62.

    Article  Google Scholar 

  • Yang, H.-H., Yang, C.-H.H., & Tsai, Y.-C.J. (2020). Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2628–2632). IEEE.

  • Yu, Y., Liu, H., Fu, M., Chen, J., Wang, X., & Wang, K. (2021). A two-branch neural network for non-homogeneous dehazing via ensemble learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 193–202).

  • Yuan, Y., Guo, J. H., Zhang, C., Zhang, C., Chen, X., & Wang, J. (2018). Ocnet: Object context network for scene parsing. arXiv preprint arXiv:1809.00916.

  • Zhang, H., & Patel, V. M. (2018). Densely connected pyramid dehazing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3194–3203).

  • Zhang, Q., & Yang, Y. (2021). Rest: An efficient transformer for visual recognition. arXiv preprint arXiv:2105.13677.

  • Zhang, S., He, X., & Yan, S. (2019). Latentgnn: Learning efficient non-local relations for visual recognition. In International conference on machine learning, PMLR (pp. 7374–7383).

  • Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., & Fu, Y. (2018). Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 286–301).

  • Zhao, H., Jia, J., & Koltun, V. (2020). Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10,076–10,085).

  • Zhou, Z., Guo, M., Feng, Y., Feng, Y., & Zhao, M. (2020). Cggan: A context guided generative adversarial network for single image dehazing. CoRR, 14(15), 3982–3988.

  • Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).

  • Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159.

Download references

Acknowledgements

This work is supported by Higher Education Scientific Research Project of Ningxia (NGY2017009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, P., Wang, B. TransRA: transformer and residual attention fusion for single remote sensing image dehazing. Multidim Syst Sign Process 33, 1119–1138 (2022). https://doi.org/10.1007/s11045-022-00835-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-022-00835-x

Keywords

Navigation