Abstract
Global context information and edge information are the keys to remote sensing (RS) image semantic segmentation. However, the existing methods have limited ability to obtain global and edge information, and category edge blurring and efficiency problems in small-scale object recognition in remote sensing image semantic segmentation tasks. In this work, we propose a global and edge enhanced Transformer (GE-Swin) for the semantic segmentation of remote sensing images. To improve the sensitivity to edge information, we design dual decoders based on the parallel model. One is the main decoder, which extracts multi-level semantic information from multi-scale features. The other is an auxiliary decoder related to low-layer features with low resolution. Thus, the auxiliary decoder has better sensitivity to edge information. Then, the feature fusion module (FFM) is designed between the encoder and decoder to fuse the multilevel features, enhancing the model’s ability to obtain global features. Finally, to verify the performance of the proposed approach, we perform extensive experiments with the ISPRS and LoveDA datasets. The experimental results illustrate that the proposed model achieves superior performance compared to state-of-the-art methods.
Graphical abstract










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
All relevant results and data are within the manuscript. The data that support the findings of this study are all openly available in reference number [38,39,40].
References
Zhu Z, Zhang J, Yang Z, Aljaddani AH, Cohen WB, Qiu S, Zhou C (2020) Continuous monitoring of land disturbance based on landsat time series. Remote Sens Environ 238:111116
Yu Y, Bao Y, Wang J, Chu H, Zhao N, He Y, Liu Y (2021) Crop row segmentation and detection in paddy fields based on treble-classification otsu and double-dimensional clustering method. Remote Sens 13(5):901
Zhang J, Lin S, Ding L, Bruzzone L (2020) Multi-scale context aggregation for semantic segmentation of remote sensing images. Remote Sens 12(4):701
Sun L, Zou H, Wei J, Li M, Cao X, He S, Liu S (2022) Semantic segmentation of high-resolution remote sensing images based on sparse self-attention. In: IGARSS 2022-2022 IEEE international geoscience and remote sensing symposium, IEEE, pp 3492–3495
Jin J, Zhou W, Yang R, Ye L, Yu L (2023) Edge detection guide network for semantic segmentation of remote-sensing images. IEEE Geosci Remote Sens Lett 20:1–5
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929
Wang W, Tang C, Wang X, Zheng B (2022) A vit-based multiscale feature fusion approach for remote sensing image segmentation. IEEE Geosci Remote Sens Lett 19:1–5
Zhong HF, Sun Q, Sun HM, Jia RS (2022) Nt-net: A semantic segmentation network for extracting lake water bodies from optical remote sensing images based on transformer. IEEE Trans Geosci Remote Sens 60:1–13
Li Y, Ouyang S, Zhang Y (2022) Combining deep learning and ontology reasoning for remote sensing image semantic segmentation. Knowl-Based Syst 243:108469
Wang L, Li R, Zhang C, Fang S, Duan C, Meng X, Atkinson PM (2022) Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J Photogramm Remote Sens 190:196–214
Zhang C, Lu X, Ye Q, Wang C, Yang C, Wang Q (2022) Mfenet: Multi-feature extraction net for remote sensing semantic segmentation. In: 2022 7th International conference on intelligent computing and signal processing (ICSP), IEEE, pp 1986–1990
Liu R, Mi L, Chen Z (2020) Afnet: Adaptive fusion network for remote sensing image semantic segmentation. IEEE Trans Geosci Remote Sens 59(9):7871–7886
Zheng Z, Zhong Y, Wang J, Ma A (2020) Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4096–4105
Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114
Xiao T, Liu Y, Huang Y, Li M, Yang G (2023) Enhancing multiscale representations with transformer for remote sensing image semantic segmentation. IEEE Trans Geosci Remote Sens 61:1–16
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Ding L, Lin D, Lin S, Zhang J, Cui X, Wang Y, Tang H, Bruzzone L (2022) Looking outside the window: Wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Trans Geosci Remote Sens 60:1–13
Xu Z, Zhang W, Zhang T, Yang Z, Li J (2021) Efficient transformer for remote sensing image segmentation. Remote Sens 13(18):3585
Zhang Y, Gao X, Duan Q, Yuan L, Gao X (2022) Dht: Deformable hybrid transformer for aerial image segmentation. IEEE Geosci Remote Sens Lett 19:1–5
Ye W, Zhang W, Lei W, Zhang W, Chen X, Wang Y (2023) Remote sensing image instance segmentation network with transformer and multi-scale feature representation. Expert Syst Appl 234:121007
Gao L, Liu H, Yang M, Chen L, Wan Y, Xiao Z, Qian Y (2021) Stransfuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:10990–11003
He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Trans Geosci Remote Sens 60:1–15
Meng X, Yang Y, Wang L, Wang T, Li R, Zhang C (2022) Class-guided swin transformer for semantic segmentation of remote sensing imagery. IEEE Geosci Remote Sens Lett 1–5
Feng D, Zhang Z, Yan K (2022) A semantic segmentation method for remote sensing images based on the swin transformer fusion gabor filter. IEEE Access 10:77432–77451
Zhang C, Jiang W, Zhang Y, Wang W, Zhao Q, Wang C (2022) Transformer and cnn hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–20
Wang L, Li R, Duan C, Zhang C, Meng X, Fang S (2022) A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5
Dong Z, Gao G, Liu T, Gu Y, Zhang X (2023) Distilling segmenters from cnns and transformers for remote sensing images semantic segmentation. IEEE Trans Geosci Remote Sens
Chu X, Tian Z, Wang Y, Zhang B, Ren H, Wei X, Xia H, Shen C (2021) Twins: Revisiting the design of spatial attention in vision transformers. Adv Neural Inf Process Syst 34:9355–9366
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: Transformer for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7262–7272
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090
Nong Z, Su X, Liu Y, Zhan Z, Yuan Q (2021) Boundary-aware dual-stream network for vhr remote sensing images semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:5260–5268
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Jin Z, Yu D, Song L, Yuan Z, Yu L (2022) You should look at all objects. In: European conference on computer vision, Springer, pp 332–349
Vaihingen I (2018) 2d semantic labeling dataset. Accessed: Apr
Potsdam I (2018) 2d semantic labeling dataset. Accessed: Apr
Wang J, Zheng Z, Ma A, Lu X, Zhong Y (2021) Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv:2110.08733
Author information
Authors and Affiliations
Contributions
Conceptualization and methodology, Hengyou Wang and Xiao Li; Software, experiments, and validation, Xiao Li and Changmiao Hu; Writing-original draft preparation, Xiao Li and Changmiao Hu; Writing-review and editing, Hengyou Wang and Lianzhi Huo; Visualization, Xiao Li and Lianzhi Huo; Project administration, Hengyou Wang; Funding acquisition, Hengyou Wang and Lianzhi Huo. All authors have read and agreed to this version of the manuscript.
Corresponding author
Ethics declarations
Ethical and informed consent for the data used
The authors declare no potential conflicts of interest or ethical problems relate to the data used. The data we used are all avaliable to researchers.
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by the National Natural Science Foundation of China (Nos. 62072024, 41971396), the outstanding Youth Program of Beijing University of Civil Engineering and Architecture(No.JDJQ20220805), the BUCEA Post Graduate Innovation Project.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Li, X., Huo, L. et al. Global and edge enhanced transformer for semantic segmentation of remote sensing. Appl Intell 54, 5658–5673 (2024). https://doi.org/10.1007/s10489-024-05457-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05457-3