Skip to main content
Log in

Global and edge enhanced transformer for semantic segmentation of remote sensing

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Global context information and edge information are the keys to remote sensing (RS) image semantic segmentation. However, the existing methods have limited ability to obtain global and edge information, and category edge blurring and efficiency problems in small-scale object recognition in remote sensing image semantic segmentation tasks. In this work, we propose a global and edge enhanced Transformer (GE-Swin) for the semantic segmentation of remote sensing images. To improve the sensitivity to edge information, we design dual decoders based on the parallel model. One is the main decoder, which extracts multi-level semantic information from multi-scale features. The other is an auxiliary decoder related to low-layer features with low resolution. Thus, the auxiliary decoder has better sensitivity to edge information. Then, the feature fusion module (FFM) is designed between the encoder and decoder to fuse the multilevel features, enhancing the model’s ability to obtain global features. Finally, to verify the performance of the proposed approach, we perform extensive experiments with the ISPRS and LoveDA datasets. The experimental results illustrate that the proposed model achieves superior performance compared to state-of-the-art methods.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

All relevant results and data are within the manuscript. The data that support the findings of this study are all openly available in reference number [38,39,40].

References

  1. Zhu Z, Zhang J, Yang Z, Aljaddani AH, Cohen WB, Qiu S, Zhou C (2020) Continuous monitoring of land disturbance based on landsat time series. Remote Sens Environ 238:111116

    Article  Google Scholar 

  2. Yu Y, Bao Y, Wang J, Chu H, Zhao N, He Y, Liu Y (2021) Crop row segmentation and detection in paddy fields based on treble-classification otsu and double-dimensional clustering method. Remote Sens 13(5):901

    Article  Google Scholar 

  3. Zhang J, Lin S, Ding L, Bruzzone L (2020) Multi-scale context aggregation for semantic segmentation of remote sensing images. Remote Sens 12(4):701

    Article  Google Scholar 

  4. Sun L, Zou H, Wei J, Li M, Cao X, He S, Liu S (2022) Semantic segmentation of high-resolution remote sensing images based on sparse self-attention. In: IGARSS 2022-2022 IEEE international geoscience and remote sensing symposium, IEEE, pp 3492–3495

  5. Jin J, Zhou W, Yang R, Ye L, Yu L (2023) Edge detection guide network for semantic segmentation of remote-sensing images. IEEE Geosci Remote Sens Lett 20:1–5

    Google Scholar 

  6. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929

  7. Wang W, Tang C, Wang X, Zheng B (2022) A vit-based multiscale feature fusion approach for remote sensing image segmentation. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  8. Zhong HF, Sun Q, Sun HM, Jia RS (2022) Nt-net: A semantic segmentation network for extracting lake water bodies from optical remote sensing images based on transformer. IEEE Trans Geosci Remote Sens 60:1–13

    Article  Google Scholar 

  9. Li Y, Ouyang S, Zhang Y (2022) Combining deep learning and ontology reasoning for remote sensing image semantic segmentation. Knowl-Based Syst 243:108469

    Article  Google Scholar 

  10. Wang L, Li R, Zhang C, Fang S, Duan C, Meng X, Atkinson PM (2022) Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J Photogramm Remote Sens 190:196–214

    Article  Google Scholar 

  11. Zhang C, Lu X, Ye Q, Wang C, Yang C, Wang Q (2022) Mfenet: Multi-feature extraction net for remote sensing semantic segmentation. In: 2022 7th International conference on intelligent computing and signal processing (ICSP), IEEE, pp 1986–1990

  12. Liu R, Mi L, Chen Z (2020) Afnet: Adaptive fusion network for remote sensing image semantic segmentation. IEEE Trans Geosci Remote Sens 59(9):7871–7886

    Article  Google Scholar 

  13. Zheng Z, Zhong Y, Wang J, Ma A (2020) Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4096–4105

  14. Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114

    Article  Google Scholar 

  15. Xiao T, Liu Y, Huang Y, Li M, Yang G (2023) Enhancing multiscale representations with transformer for remote sensing image semantic segmentation. IEEE Trans Geosci Remote Sens 61:1–16

    Google Scholar 

  16. Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890

  17. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  18. Ding L, Lin D, Lin S, Zhang J, Cui X, Wang Y, Tang H, Bruzzone L (2022) Looking outside the window: Wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Trans Geosci Remote Sens 60:1–13

    Google Scholar 

  19. Xu Z, Zhang W, Zhang T, Yang Z, Li J (2021) Efficient transformer for remote sensing image segmentation. Remote Sens 13(18):3585

    Article  Google Scholar 

  20. Zhang Y, Gao X, Duan Q, Yuan L, Gao X (2022) Dht: Deformable hybrid transformer for aerial image segmentation. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  21. Ye W, Zhang W, Lei W, Zhang W, Chen X, Wang Y (2023) Remote sensing image instance segmentation network with transformer and multi-scale feature representation. Expert Syst Appl 234:121007

    Article  Google Scholar 

  22. Gao L, Liu H, Yang M, Chen L, Wan Y, Xiao Z, Qian Y (2021) Stransfuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:10990–11003

    Article  Google Scholar 

  23. He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Trans Geosci Remote Sens 60:1–15

    Article  Google Scholar 

  24. Meng X, Yang Y, Wang L, Wang T, Li R, Zhang C (2022) Class-guided swin transformer for semantic segmentation of remote sensing imagery. IEEE Geosci Remote Sens Lett 1–5

  25. Feng D, Zhang Z, Yan K (2022) A semantic segmentation method for remote sensing images based on the swin transformer fusion gabor filter. IEEE Access 10:77432–77451

    Article  Google Scholar 

  26. Zhang C, Jiang W, Zhang Y, Wang W, Zhao Q, Wang C (2022) Transformer and cnn hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–20

    Google Scholar 

  27. Wang L, Li R, Duan C, Zhang C, Meng X, Fang S (2022) A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  28. Dong Z, Gao G, Liu T, Gu Y, Zhang X (2023) Distilling segmenters from cnns and transformers for remote sensing images semantic segmentation. IEEE Trans Geosci Remote Sens

  29. Chu X, Tian Z, Wang Y, Zhang B, Ren H, Wei X, Xia H, Shen C (2021) Twins: Revisiting the design of spatial attention in vision transformers. Adv Neural Inf Process Syst 34:9355–9366

    Google Scholar 

  30. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154

  31. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  32. Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: Transformer for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7262–7272

  33. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090

    Google Scholar 

  34. Nong Z, Su X, Liu Y, Zhan Z, Yuan Q (2021) Boundary-aware dual-stream network for vhr remote sensing images semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:5260–5268

    Article  Google Scholar 

  35. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  36. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  37. Jin Z, Yu D, Song L, Yuan Z, Yu L (2022) You should look at all objects. In: European conference on computer vision, Springer, pp 332–349

  38. Vaihingen I (2018) 2d semantic labeling dataset. Accessed: Apr

  39. Potsdam I (2018) 2d semantic labeling dataset. Accessed: Apr

  40. Wang J, Zheng Z, Ma A, Lu X, Zhong Y (2021) Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv:2110.08733

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization and methodology, Hengyou Wang and Xiao Li; Software, experiments, and validation, Xiao Li and Changmiao Hu; Writing-original draft preparation, Xiao Li and Changmiao Hu; Writing-review and editing, Hengyou Wang and Lianzhi Huo; Visualization, Xiao Li and Lianzhi Huo; Project administration, Hengyou Wang; Funding acquisition, Hengyou Wang and Lianzhi Huo. All authors have read and agreed to this version of the manuscript.

Corresponding author

Correspondence to Lianzhi Huo.

Ethics declarations

Ethical and informed consent for the data used

The authors declare no potential conflicts of interest or ethical problems relate to the data used. The data we used are all avaliable to researchers.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Natural Science Foundation of China (Nos. 62072024, 41971396), the outstanding Youth Program of Beijing University of Civil Engineering and Architecture(No.JDJQ20220805), the BUCEA Post Graduate Innovation Project.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Li, X., Huo, L. et al. Global and edge enhanced transformer for semantic segmentation of remote sensing. Appl Intell 54, 5658–5673 (2024). https://doi.org/10.1007/s10489-024-05457-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05457-3

Keywords