Abstract
Attention mechanism is capable to capture long-range dependence. However, its independent calculation of correlations can hardly consider the complex background of remote sensing images, which causes noisy and ambiguous attention weights. To address this issue, we design a correlation attention module (CAM) to enhance appropriate correlations and suppress erroneous ones by seeking consensus among all correlation vectors, which facilitates feature aggregation. Simultaneously, we introduce the CAM into a local dynamic attention (LDA) branch and a global dynamic attention (GDA) branch to obtain the information on local texture details and global context, respectively. In addition, considering the different demands of complex and diverse geographical objects for both local texture details and global context, we devise a dynamic weighting mechanism to adaptively adjust the contributions of both branches, thereby constructing a more discriminative feature representation. Experimental results on three datasets suggest that the proposed dual-branch dynamic attention network (DBDAN), which integrates the CAM and both branches, can considerably improve the performance for semantic segmentation of remote sensing images and outperform representative state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801–818 (2018)
Ding, L., Tang, H., Bruzzone, L.: LANet: local attention embedding to improve the semantic segmentation of remote sensing images. TGARS 59(1), 426–435 (2020)
Feng, Y., et al.: Npaloss: neighboring pixel affinity loss for semantic segmentation in high-resolution aerial imagery. ISPRS Ann. 5(2), 475–482 (2020)
Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR, pp. 3146–3154 (2019)
Jin, Z., Liu, B., Chu, Q., Yu, N.: Isnet: integrate image-level and semantic-level context for semantic segmentation. In: ICCV, pp. 7189–7198 (2021)
Li, R., et al.: Multiattention network for semantic segmentation of fine-resolution remote sensing images. TGARS 60, 1–13 (2021)
Li, X., et al.: Dual attention deep fusion semantic segmentation networks of large-scale satellite remote-sensing images. IJRS 42(9), 3583–3610 (2021)
Li, X., Xu, F., Xia, R., Lyu, X., Gao, H., Tong, Y.: Hybridizing cross-level contextual and attentive representations for remote sensing imagery semantic segmentation. Remote Sens. 13(15), 2986 (2021)
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: CVPR, pp. 11976–11986 (2022)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Ma, X., et al.: Sacanet: scene-aware class attention network for semantic segmentation of remote sensing images. arXiv preprint arXiv:2304.11424 (2023)
Ma, X., et al.: Log-can: local-global class-aware network for semantic segmentation of remote sensing images. In: ICASSP2023, pp. 1–5. IEEE (2023)
Maboudi, M., Amini, J., Malihi, S., Hahn, M.: Integrating fuzzy object based image analysis and ant colony optimization for road extraction from remotely sensed images. ISPRS PRS 138, 151–163 (2018)
Marcos, D., Volpi, M., Kellenberger, B., Tuia, D.: Land cover mapping at very high resolution with rotation equivariant cnns: towards small yet accurate models. ISPRS PRS 145, 96–107 (2018)
Niu, R., Sun, X., Tian, Y., Diao, W., Chen, K., Fu, K.: Hybrid multiple attention network for semantic segmentation in aerial images. TGARS 60, 1–18 (2021)
Rottensteiner, F., et al.: International society for photogrammetry and remote sensing, 2d semantic labeling contest. Accessed 29 Oct 2020. https://www.isprs.org/education/benchmarks
Song, M., Li, B., Wei, P., Shao, Z., Wang, J., Huang, J.: DMF-CL: dense multi-scale feature contrastive learning for semantic segmentation of remote-sensing images. In: PRCV 2022, pp. 152–164. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-18916-6_13
Song, P., Li, J., An, Z., Fan, H., Fan, L.: CTMFNet: CNN and transformer multi-scale fusion network of remote sensing urban scene imagery. TGARS 61, 1–14 (2022)
Song, Q., Li, J., Li, C., Guo, H., Huang, R.: Fully attentional network for semantic segmentation. In: AAAI, vol. 36, pp. 2280–2288 (2022)
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: ICCV, pp. 7262–7272 (2021)
Wang, J., Zheng, Z., Ma, A., Lu, X., Zhong, Y.: Loveda: a remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv preprint arXiv:2110.08733 (2021)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: ECCV, pp. 3–19 (2018)
Xu, Y., Jiang, J.: High-resolution boundary-constrained and context-enhanced network for remote sensing image segmentation. Remote Sens. 14(8), 1859 (2022)
Yang, X., et al.: An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS PRS 177, 238–262 (2021)
Yu, W., et al.: Metaformer is actually what you need for vision. In: CVPR, pp. 10819–10829 (2022)
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 173–190. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_11
Zhang, Q., Seto, K.C.: Mapping urbanization dynamics at regional and global scales using multi-temporal DMSP/OLS nighttime light data. Remote Sens. Environ. 115(9), 2320–2329 (2011)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR, pp. 2881–2890 (2017)
Zheng, S., Lu, C., Wu, Y., Gupta, G.: Sapnet: segmentation-aware progressive network for perceptual contrastive deraining. In: WACV, pp. 52–62 (2022)
Zheng, Z., Zhong, Y., Wang, J., Ma, A.: Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: CVPR, pp. 4096–4105 (2020)
Zhu, L., Ji, D., Zhu, S., Gan, W., Wu, W., Yan, J.: Learning statistical texture for semantic segmentation. In: CVPR, pp. 12537–12546 (2021)
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: Biformer: vision transformer with bi-level routing attention. In: CVPR, pp. 10323–10333 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Che, R., Ma, X., Hong, T., Wang, X., Feng, T., Zhang, W. (2024). DBDAN: Dual-Branch Dynamic Attention Network for Semantic Segmentation of Remote Sensing Images. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14428. Springer, Singapore. https://doi.org/10.1007/978-981-99-8462-6_25
Download citation
DOI: https://doi.org/10.1007/978-981-99-8462-6_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8461-9
Online ISBN: 978-981-99-8462-6
eBook Packages: Computer ScienceComputer Science (R0)