Abstract
Due to the large intra-class differences between the same categories and the scale imbalance between different categories in the remote sensing image dataset, the semantic segmentation task presents the problem of small-scale object information loss, the imbalance between foreground and background, and simultaneously the background dominates, which seriously affects the performance of the network model. To solve the above problems, this paper proposes an efficient bilateral branch depth neural network model based on the U-Net depth neural network, named BBU-Net. Firstly, one branch of the network learns the distribution characteristics of the original data, and the other focuses on difficult samples. Then the two branches improve the representation and classification ability of the neural network by accumulating learning strategies. Finally, considering the geometric diversity of remote sensing images, this paper adopts test time augmentation and reflection padding strategies and proposes a balanced weighted loss function named CombineLoss to alleviate the imbalance in the training process. The depth neural network proposed in this paper was first tested on the Inria Aerial Image Labeling Dataset, and 87.53% of mean intersection over union and 97.4% of mean pixel accuracy were obtained, respectively. At the same time, to verify the model's complexity, the model proposed in this paper is compared with the neural network based on integrated learning. The comparison results show that the spatial complexity of the network proposed in this paper is much lower than the neural network obtained by integrated learning, and the parameters are also much smaller than the neural network based on integrated learning. Then use the satellite building dataset I in the WHU Building Dataset and mainstream semantic segmentation methods for multiple groups of comparative experiments. The experimental results show that the method proposed in this paper can effectively extract the semantic information of remote sensing images, significantly improve the imbalance of remote sensing image data, improve the performance of the network model, and achieve a good semantic segmentation effect, which fully proves the effectiveness of this method.
Similar content being viewed by others
Data Availability
Some or all data, models, or code generated or used during the study are available from the corresponding author by request.
References
Tian, L., Zhong, X., Chen, M.: Semantic segmentation of remote sensing image based on GAN and FCN network model. Sci. Program. 2021, 1–11 (2021). https://doi.org/10.1155/2021/9491376
Bayoudh, K., Knani, R., Hamdaoui, F., et al.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38, 2939–2970 (2022). https://doi.org/10.1007/s00371-021-02166-7
Zhuang, H., Zhang, J., Liao, F.: A systematic review on application of deep learning in digestive system image processing. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02322-z
Agrawal, T., Choudhary, P.: Segmentation and classification on chest radiography: a systematic survey. Vis. Comput. 39, 875–913 (2023). https://doi.org/10.1007/s00371-021-02352-7
Cai, G., Zhu, Y., Wu, Y., et al.: A multimodal transformer to fuse images and metadata for skin disease classification. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02492-4
Cheng, Z., Qu, A., He, X.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38, 749–762 (2022). https://doi.org/10.1007/s00371-021-02075-9
Wang, B., Fan, D.L.: A summary of the research progress of deep learning in remote sensing image classification and recognition. Bull. Surv. Mapp. 503(2), 108–111 (2019)
Saxena, N., Raman, B., et al.: Semantic segmentation of multispectral images using Res-Seg-net model. In: 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pp. 154–157 (2020). https://doi.org/10.1109/ICSC.2020.00030
Zheng, Z., Zhong, Y., Wang, J., et al.: Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4096–4105 (2020)
Chen, L.C., Yang, Y., Wang, J., et al.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3640–3649 (2016)
Chen, L.C., Zhu, Y., Papandreou, G., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018). https://link.springer.com/conference/eccv
Li, X.Y., Sun, X.F., et al.: Dice Loss for Data-imbalanced NLP Tasks (2019). https://arxiv.org/abs/1911.02855
Zhou, B.Y., Cui, Q., et al.: BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9716–9725 (2020). https://doi.org/10.1109/CVPR42600.2020.00974
Farabet, C., Couprie, C., Najman, L., et al.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2012)
Gupta, S., Girshick, R., Arbeláez, P., et al.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Cham (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Hu, F., Xia, G.S., Hu, J., et al.: Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 7(11), 14680–14707 (2015). https://doi.org/10.3390/rs71114680
Wang, E.D., Qi, K., et al.: Semantic segmentation of remote sensing image based on neural network. Acta Optica Sinica 39(12), 93–104 (2019). ((In Chinese))
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015). arXiv:1505.04597v1
Jia, F., Liu, J., Tai, X.C.: A regularized convolutional neural network for semantic image segmentation. Anal. Appl. 19(1), 147–165 (2021). https://doi.org/10.1142/S0219530519410148
Cui, X.N., Wang, Q.C., Dai, J.P., et al.: Intelligent crack detection based on attention mechanism in convolution neural network. Adv. Struct. Eng. 9(24), 1859–1868 (2021)
Abdollahi, A., Pradhan, B., Alamri, A.M.: An ensemble architecture of deep convolutional Segnet and Unet networks for building semantic segmentation from high-resolution aerial images. Geocarto Int. 66, 1–16 (2020). https://doi.org/10.1080/10106049.2020.1856199
Xie, H.B., Pan, Y.Z., Luan, J.H., et al.: Open-pit mining area segmentation of remote sensing images based on DUSegNet. J. Indian Soc. Remote Sens. 49(6), 1257–1270 (2021)
Chen, X., Zhou, Y., Wu, D., et al.: Imagine by reasoning: a reasoning-based implicit semantic data augmentation for long-tailed classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36 (no. 1), pp. 356–364 (2022). https://doi.org/10.48550/arXiv.2112.07928
Maggiori, E., Tarabalka, Y., Charpiat, G., Alliez, P.: Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 3226–3229 (2017). https://doi.org/10.1109/IGARSS.2017.8127684
Ji, S., Wei, S., Lu, M.: Fully convolutional networks for multi-source building extraction from an open aerial and satellite imagery dataset. IEEE Trans. Geosci. Remote Sens. (2018). https://doi.org/10.1109/TGRS.2018.2858817
Zhang, H.Y., Ciss, M., et al.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1710.09412v2 (2018)
Wang, G.T., Li, W.Q., et al.: Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation. In: International MICCAI Brainlesion Workshop, pp. 61–72 (2018). https://doi.org/10.1007/978-3-030-11726-9_6
Milletari, F., Navab, N., Ahmadi, S.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79
Li, X., Sun, X., Meng, Y., et al.: Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv:1911.02855 (2019)
Gowda, S.N., Yuan, C.: ColorNet: investigating the importance of color spaces for image classification. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (Eds.) Computer Vision—ACCV 2018. Lecture Notes in Computer Science, vol. 11364. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_36
Wu, Y., He, K.: Group normalization. Int. J. Comput. Vis. 128(3), 66 (2020)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Zhu, J., Chen, Y., et al.: Building change detection from high-resolution remote sensing imagery based on Siam-UNet++. Appl. Res. Comput. 38(11), 3460–3465 (2021)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., et al.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11, Springer, Cham (2018)
Alom, M.Z., Hasan, M., Yakopcic, C., et al.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:1802.06955 (2018)
Gu, Z., Cheng, J., Fu, H., et al.: Ce-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019)
Nayem, A.B.S., Sarker, A., Paul, O., et al.: Lulc segmentation of RGB satellite image using FCN-8. arXiv preprint arXiv:2008.10736 (2020)
Hassan, T., Akram, M.U., Werghi, N.: Exploiting the transferability of deep learning systems across multi-modal retinal scans for extracting retinopathy lesions. In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE). pp. 577–581 (2020)
Acknowledgements
This work was jointly funded by the project of Artificial Intelligence Key Laboratory of Sichuan Province (No. 2020RYJ02), Project of Key Laboratory of Pattern Recognition and Intelligent Information Processing of Sichuan (No. MSSB-2020-10), Project of Key Research and Development Program of Sichuan Department of Science and Technology in 2022(2022YFG0190), and Project of Information Materials and Devices Application Sichuan Key Laboratory (2022XXCL007) and supported by the Innovation Team of Chengdu Normal University Grant (No. CSCXTD2020B09).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Z., Wang, H. & Liu, Y. Semantic segmentation of remote sensing image based on bilateral branch network. Vis Comput 40, 3069–3090 (2024). https://doi.org/10.1007/s00371-023-03011-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-03011-9