Skip to main content
Log in

Semantic segmentation of remote sensing image based on bilateral branch network

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Due to the large intra-class differences between the same categories and the scale imbalance between different categories in the remote sensing image dataset, the semantic segmentation task presents the problem of small-scale object information loss, the imbalance between foreground and background, and simultaneously the background dominates, which seriously affects the performance of the network model. To solve the above problems, this paper proposes an efficient bilateral branch depth neural network model based on the U-Net depth neural network, named BBU-Net. Firstly, one branch of the network learns the distribution characteristics of the original data, and the other focuses on difficult samples. Then the two branches improve the representation and classification ability of the neural network by accumulating learning strategies. Finally, considering the geometric diversity of remote sensing images, this paper adopts test time augmentation and reflection padding strategies and proposes a balanced weighted loss function named CombineLoss to alleviate the imbalance in the training process. The depth neural network proposed in this paper was first tested on the Inria Aerial Image Labeling Dataset, and 87.53% of mean intersection over union and 97.4% of mean pixel accuracy were obtained, respectively. At the same time, to verify the model's complexity, the model proposed in this paper is compared with the neural network based on integrated learning. The comparison results show that the spatial complexity of the network proposed in this paper is much lower than the neural network obtained by integrated learning, and the parameters are also much smaller than the neural network based on integrated learning. Then use the satellite building dataset I in the WHU Building Dataset and mainstream semantic segmentation methods for multiple groups of comparative experiments. The experimental results show that the method proposed in this paper can effectively extract the semantic information of remote sensing images, significantly improve the imbalance of remote sensing image data, improve the performance of the network model, and achieve a good semantic segmentation effect, which fully proves the effectiveness of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Data Availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

  1. Tian, L., Zhong, X., Chen, M.: Semantic segmentation of remote sensing image based on GAN and FCN network model. Sci. Program. 2021, 1–11 (2021). https://doi.org/10.1155/2021/9491376

    Article  Google Scholar 

  2. Bayoudh, K., Knani, R., Hamdaoui, F., et al.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38, 2939–2970 (2022). https://doi.org/10.1007/s00371-021-02166-7

    Article  Google Scholar 

  3. Zhuang, H., Zhang, J., Liao, F.: A systematic review on application of deep learning in digestive system image processing. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02322-z

    Article  Google Scholar 

  4. Agrawal, T., Choudhary, P.: Segmentation and classification on chest radiography: a systematic survey. Vis. Comput. 39, 875–913 (2023). https://doi.org/10.1007/s00371-021-02352-7

    Article  Google Scholar 

  5. Cai, G., Zhu, Y., Wu, Y., et al.: A multimodal transformer to fuse images and metadata for skin disease classification. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02492-4

    Article  Google Scholar 

  6. Cheng, Z., Qu, A., He, X.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38, 749–762 (2022). https://doi.org/10.1007/s00371-021-02075-9

    Article  Google Scholar 

  7. Wang, B., Fan, D.L.: A summary of the research progress of deep learning in remote sensing image classification and recognition. Bull. Surv. Mapp. 503(2), 108–111 (2019)

    Google Scholar 

  8. Saxena, N., Raman, B., et al.: Semantic segmentation of multispectral images using Res-Seg-net model. In: 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pp. 154–157 (2020). https://doi.org/10.1109/ICSC.2020.00030

  9. Zheng, Z., Zhong, Y., Wang, J., et al.: Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4096–4105 (2020)

  10. Chen, L.C., Yang, Y., Wang, J., et al.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3640–3649 (2016)

  11. Chen, L.C., Zhu, Y., Papandreou, G., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018). https://link.springer.com/conference/eccv

  12. Li, X.Y., Sun, X.F., et al.: Dice Loss for Data-imbalanced NLP Tasks (2019). https://arxiv.org/abs/1911.02855

  13. Zhou, B.Y., Cui, Q., et al.: BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9716–9725 (2020). https://doi.org/10.1109/CVPR42600.2020.00974

  14. Farabet, C., Couprie, C., Najman, L., et al.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2012)

    Article  Google Scholar 

  15. Gupta, S., Girshick, R., Arbeláez, P., et al.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Cham (2014)

  16. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  17. Hu, F., Xia, G.S., Hu, J., et al.: Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 7(11), 14680–14707 (2015). https://doi.org/10.3390/rs71114680

    Article  Google Scholar 

  18. Wang, E.D., Qi, K., et al.: Semantic segmentation of remote sensing image based on neural network. Acta Optica Sinica 39(12), 93–104 (2019). ((In Chinese))

    Google Scholar 

  19. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015). arXiv:1505.04597v1

  20. Jia, F., Liu, J., Tai, X.C.: A regularized convolutional neural network for semantic image segmentation. Anal. Appl. 19(1), 147–165 (2021). https://doi.org/10.1142/S0219530519410148

    Article  MathSciNet  Google Scholar 

  21. Cui, X.N., Wang, Q.C., Dai, J.P., et al.: Intelligent crack detection based on attention mechanism in convolution neural network. Adv. Struct. Eng. 9(24), 1859–1868 (2021)

    Article  Google Scholar 

  22. Abdollahi, A., Pradhan, B., Alamri, A.M.: An ensemble architecture of deep convolutional Segnet and Unet networks for building semantic segmentation from high-resolution aerial images. Geocarto Int. 66, 1–16 (2020). https://doi.org/10.1080/10106049.2020.1856199

    Article  Google Scholar 

  23. Xie, H.B., Pan, Y.Z., Luan, J.H., et al.: Open-pit mining area segmentation of remote sensing images based on DUSegNet. J. Indian Soc. Remote Sens. 49(6), 1257–1270 (2021)

    Article  Google Scholar 

  24. Chen, X., Zhou, Y., Wu, D., et al.: Imagine by reasoning: a reasoning-based implicit semantic data augmentation for long-tailed classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36 (no. 1), pp. 356–364 (2022). https://doi.org/10.48550/arXiv.2112.07928

  25. Maggiori, E., Tarabalka, Y., Charpiat, G., Alliez, P.: Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 3226–3229 (2017). https://doi.org/10.1109/IGARSS.2017.8127684

  26. Ji, S., Wei, S., Lu, M.: Fully convolutional networks for multi-source building extraction from an open aerial and satellite imagery dataset. IEEE Trans. Geosci. Remote Sens. (2018). https://doi.org/10.1109/TGRS.2018.2858817

    Article  Google Scholar 

  27. Zhang, H.Y., Ciss, M., et al.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1710.09412v2 (2018)

  28. Wang, G.T., Li, W.Q., et al.: Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation. In: International MICCAI Brainlesion Workshop, pp. 61–72 (2018). https://doi.org/10.1007/978-3-030-11726-9_6

  29. Milletari, F., Navab, N., Ahmadi, S.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79

  30. Li, X., Sun, X., Meng, Y., et al.: Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv:1911.02855 (2019)

  31. Gowda, S.N., Yuan, C.: ColorNet: investigating the importance of color spaces for image classification. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (Eds.) Computer Vision—ACCV 2018. Lecture Notes in Computer Science, vol. 11364. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_36

  32. Wu, Y., He, K.: Group normalization. Int. J. Comput. Vis. 128(3), 66 (2020)

    Article  Google Scholar 

  33. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

  34. Zhu, J., Chen, Y., et al.: Building change detection from high-resolution remote sensing imagery based on Siam-UNet++. Appl. Res. Comput. 38(11), 3460–3465 (2021)

    Google Scholar 

  35. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., et al.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11, Springer, Cham (2018)

  36. Alom, M.Z., Hasan, M., Yakopcic, C., et al.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:1802.06955 (2018)

  37. Gu, Z., Cheng, J., Fu, H., et al.: Ce-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019)

    Article  Google Scholar 

  38. Nayem, A.B.S., Sarker, A., Paul, O., et al.: Lulc segmentation of RGB satellite image using FCN-8. arXiv preprint arXiv:2008.10736 (2020)

  39. Hassan, T., Akram, M.U., Werghi, N.: Exploiting the transferability of deep learning systems across multi-modal retinal scans for extracting retinopathy lesions. In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE). pp. 577–581 (2020)

Download references

Acknowledgements

This work was jointly funded by the project of Artificial Intelligence Key Laboratory of Sichuan Province (No. 2020RYJ02), Project of Key Laboratory of Pattern Recognition and Intelligent Information Processing of Sichuan (No. MSSB-2020-10), Project of Key Research and Development Program of Sichuan Department of Science and Technology in 2022(2022YFG0190), and Project of Information Materials and Devices Application Sichuan Key Laboratory (2022XXCL007) and supported by the Innovation Team of Chengdu Normal University Grant (No. CSCXTD2020B09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongyu Li.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Wang, H. & Liu, Y. Semantic segmentation of remote sensing image based on bilateral branch network. Vis Comput 40, 3069–3090 (2024). https://doi.org/10.1007/s00371-023-03011-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-03011-9

Keywords

Navigation