Skip to main content
Log in

HybridNet: Integrating Multiple Approaches for Aerial Semantic Segmentation

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In recent times, semantic segmentation for VHR aerial images has become an emerging research topic due to its widespread applications in disaster management, environmental monitoring, natural resource mapping, etc. The problem of semantic segmentation can be modeled as an image-to-image mapping problem where pixel-level classification is required. Pixel level classification is challenging for the high-resolution aerial image due to the presence of the tiny objects in low-frequency and more information details for such tiny objects required for dense semantic labeling. In general, encoder–decoder based architecture for semantic segmentation suffers from information loss due to the up and downsampling process. To handle this, we extend a high-resolution network with dense connection integration to preserve the original resolution and better parameter sharing. We also incorporate a lightweight self-attention module for positional attention, which results in better segmentation maps. Additionally, we use a generalized Hough transform based deep voting module for pixel dependencies extraction. Experimental results reveal that the proposed model achieves the best mean intersection over union and overall accuracy in local and benchmark evaluation on the Vaihingen and Potsdam datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of Data and Materials

The authors have used all publicly available benchmark datasets in this work. The implementation code will be shared using the GitHub link.

Notes

  1. https://github.com/chouhan-avinash/HybridNet.

References

  1. Abdollahi J, Mahmoudi L. An artificial intelligence system for detecting the types of the epidemic from X-rays: artificial intelligence system for detecting the types of the epidemic from X-rays. In: 2022 27th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, Islamic Republic of, 2022. p. 1–6. https://doi.org/10.1109/CSICC55295.2022.9780523.

  2. Li K, Wan G, Cheng G, Meng L, Han J. Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens. 2020;159:296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023.

    Article  Google Scholar 

  3. Cheng B, et al. HigherHRNet: scale-aware representation learning for bottom-up human pose estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA; 2020. p. 5385–94. https://doi.org/10.1109/CVPR42600.2020.00543

    Chapter  Google Scholar 

  4. Chouhan A, Sur A, Chutia D. Drmnet: difference image reconstruction enhanced multiresolution network for optical change detection. IEEE J Sel Top Appl Earth Obs Remote Sens. 2022;15:4014–26. https://doi.org/10.1109/JSTARS.2022.3174780.

    Article  Google Scholar 

  5. Fang S, Li K, Shao J, Li Z. Snunet-cd: a densely connected siamese network for change detection of vhr images. IEEE Geosci Remote Sens Lett. 2022;19:1–5. https://doi.org/10.1109/LGRS.2021.3056416.

    Article  Google Scholar 

  6. Noa Turnes J, Castro JDB, Torres DL, Vega PJS, Feitosa RQ, Happ PN. Atrous cgan for sar to optical image translation. IEEE Geosci Remote Sens Lett. 2022;19:1–5. https://doi.org/10.1109/LGRS.2020.3031199.

    Article  Google Scholar 

  7. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. CoRR. 2014. arXiv:1411.4038.

  8. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention (MICCAI). LNCS, vol. 9351. Springer; 2015. p. 234–41. arXiv:1505.04597 [cs.CV]. http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a.

  9. Chaurasia K, Nandy R, Pawar O, Singh RR, Ahire M. Semantic segmentation of high-resolution satellite images using deep learning. Earth Sci Inform. 2021;14:1–10. https://doi.org/10.1007/s12145-021-00674-7.

    Article  Google Scholar 

  10. Yu F, Koltun V, Funkhouser T. Dilated residual networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA; 2017. p. 636–44. https://doi.org/10.1109/CVPR.2017.75

    Google Scholar 

  11. Sun Y, Tian Y, Xu Y. Problems of encoder-decoder frameworks for high-resolution remote sensing image segmentation: structural stereotype and insufficient learning. Neurocomputing. 2019;330:297–304. https://doi.org/10.1016/j.neucom.2018.11.051.

    Article  Google Scholar 

  12. Volpi M, Tuia D. Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Trans Geosci Remote Sens. 2017;55(2):881–93.

    Article  Google Scholar 

  13. Liu Y, Minh Nguyen D, Deligiannis N, Ding W, Munteanu A. Hourglass-shapenetwork based semantic segmentation for high resolution aerial imagery. Remote Sens. 2017;9(6):522. https://doi.org/10.3390/rs9060522.

    Article  Google Scholar 

  14. Diakogiannis FI, Waldner F, Caccetta P, Wu C. Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens. 2020;162:94–114. https://doi.org/10.1016/j.isprsjprs.2020.01.013.

    Article  Google Scholar 

  15. Fourure D, Emonet R, Fromont É, Muselet D, Trémeau A, Wolf C. Residual conv-deconv grid network for semantic segmentation. 2017. arXiv preprint arXiv:abs/1707.07958.

  16. Pohlen T, Hermans A, Mathias M, Leibe B. Full-resolution residual networks for semantic segmentation in street scenes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA; 2017. p. 3309–18. https://doi.org/10.1109/CVPR.2017.353.

    Google Scholar 

  17. Sun K, Xiao B, Liu D, Wang J. Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA; 2019. p. 5686–96. https://doi.org/10.1109/CVPR.2019.00584.

    Chapter  Google Scholar 

  18. Wang J et al. Deep high-resolution representation learning for visual recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021;43(10):3349–64. https://doi.org/10.1109/TPAMI.2020.2983686.

    Article  Google Scholar 

  19. Zhang C, Liu J, Yu F, Wan S, Han Y, Wang J, Wang G. Segmentation model based on convolutional neural networks for extracting vegetation from Gaofen-2 images. J Appl Remote Sens. 2018;12(4):1–18. https://doi.org/10.1117/1.JRS.12.042804.

    Article  Google Scholar 

  20. Audebert N, Saux B, Lefèvre S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. 2017. p. 180–96. https://doi.org/10.1007/978-3-319-54181-5_12.

  21. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):834–48. https://doi.org/10.1109/TPAMI.2017.2699184.

    Google Scholar 

  22. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (2019).

  23. Liu Q, Kampffmeyer M, Jenssen R, Salberg A-B. Dense dilated convolutions’ merging network for land cover classification. IEEE Trans Geosci Remote Sens. 2020;58(9):6309–20.

    Article  Google Scholar 

  24. Yue K, Sun M, Yuan Y, Zhou F, Ding E, Xu F. Compact generalized non-local network. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). NY, USA: Curran Associates Inc.; 2018. p. 6511–20.

    Google Scholar 

  25. Li X, Zhang L, You A, Yang M, Yang K, Tong Y. Global aggregation then local distribution in fully convolutional networks. In: 30th British machine vision conference 2019, BMVC 2019, Cardiff, UK, September 9–12, 2019. BMVA Press; 2019. p. 244. https://bmvc2019.org/wp-content/uploads/papers/0432-paper.pdf.

  26. Liu S, Gao K, Qin J, Gong H, Wang H, Zhang L, Gong D. SE2Net: semantic segmentation of remote sensing images based on self-attention and edge enhancement modules. J Appl Remote Sens. 2021;15(2):1–16. https://doi.org/10.1117/1.JRS.15.026512.

    Article  Google Scholar 

  27. Chen G, Zhang X, Wang Q, Dai F, Gong Y, Zhu K. Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images. IEEE J Sel Top Appl Earth Obs Remote Sens. 2018;11(5):1633–44. https://doi.org/10.1109/JSTARS.2018.2810320.

    Article  Google Scholar 

  28. Liu Y, Piramanayagam S, Monteiro ST, Saber E. Semantic segmentation of multisensor remote sensing imagery with deep ConvNets and higher-order conditional random fields. J Appl Remote Sens. 2019;13(1):1–23. https://doi.org/10.1117/1.JRS.13.016501.

    Article  Google Scholar 

  29. Paisitkriangkrai S, Sherrah J, Janney P, Van-Den Hengel A. Effective semantic pixel labelling with convolutional networks and conditional random fields. In: 2015 IEEE conference on computer vision and pattern recognition workshops (CVPRW). 2015. p. 36–43. https://doi.org/10.1109/CVPRW.2015.7301381

  30. Chen Y, Ming D, Lv X. Superpixel based land cover classification of vhr satellite image combining multi-scale cnn and scale parameter estimation. Earth Sci Inform. 2019;12(3):341–63. https://doi.org/10.1007/s12145-019-00383-2. (Communicated by: H. Babaie).

    Article  Google Scholar 

  31. Samet N, Hicsonmez S, Akbas E. HoughNet: integrating near and long-range evidence for bottom-up object detection. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer Vision–ECCV 2020. ECCV, Lecture Notes in Computer Science, vol. 12370. Cham: Springer; 2020. p. 2020. https://doi.org/10.1007/978-3-030-58595-2_25.

    Google Scholar 

  32. Milletari F, Ahmadi S-A, Kroll C, Plate A, Rozanski V, Maiostre J, Levin J, Dietrich O, Ertl-Wagner B, Bötzel K, Navab N. Hough-cnn: deep learning for segmentation of deep brain regions in mri and ultrasound. Comput Vis Image Underst. 2017;164:92–102. https://doi.org/10.1016/j.cviu.2017.04.002. (Deep Learning for Computer Vision).

    Article  Google Scholar 

  33. Novotny D, Albanie S, Larlus D, Vedaldi A. Semi-convolutional operators for instance segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision–ECCV 2018. ECCV, Lecture Notes in Computer Science, vol. 11205. Cham: Springer; 2018. p. 2018. https://doi.org/10.1007/978-3-030-01246-5_6.

    Google Scholar 

  34. Qi CR, Litany O, He K, Guibas L. Deep hough voting for 3d object detection in point clouds. In: 2019 IEEE/CVF international conference on computer vision (ICCV). 2019. p. 9276–285. https://doi.org/10.1109/ICCV.2019.00937.

  35. Sheshkus A, Ingacheva A, Arlazarov V, Nikolaev D. HoughNet: neural network architecture for vanishing points detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, NSW, Australia; 2019. p. 844–9. https://doi.org/10.1109/ICDAR.2019.00140.

  36. Guo S, Pridmore T, Kong Y, Zhang X. An improved hough transform voting scheme utilizing surround suppression. Pattern Recognit Lett. 2009;30(13):1241–52. https://doi.org/10.1016/j.patrec.2009.05.003.

    Article  Google Scholar 

  37. Wollmann T, Rohr K. Deep residual Hough voting for mitotic cell detection in histopathology images. In: IEEE 14th International Symposium on Biomedical Imaging (ISBI2017). Melbourne, VIC, Australia; 2017. p. 341–4. https://doi.org/10.1109/ISBI.2017.7950533.

    Google Scholar 

  38. Liu Y, Fan B, Wang L, Bai J, Xiang S, Pan C. Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J Photogramm Remote Sens. 2018;145:78–95.

    Article  Google Scholar 

  39. Marcos D, Volpi M, Kellenberger B, Tuia D. Land cover mapping at very high resolution with rotation equivariant cnns: towards small yet accurate models. ISPRS J Photogramm Remote Sens. 2018;145:96–107. https://doi.org/10.1016/j.isprsjprs.2018.01.021. (Deep Learning RS Data).

    Article  Google Scholar 

  40. Yue K, Yang L, Li R, Hu W, Zhang F, Li W. Treeunet: adaptive tree convolutional neural networks for subdecimeter aerial image segmentation. ISPRS J Photogramm Remote Sens. 2019;156:1–13. https://doi.org/10.1016/j.isprsjprs.2019.07.007.

    Article  Google Scholar 

  41. Marmanis D, Schindler K, Wegner JD, Galliani S, Datcu M, Stilla U. Classification with an edge: improving semantic image segmentation with boundary detection. ISPRS J Photogramm Remote Sens. 2018;135:158–72. https://doi.org/10.1016/j.isprsjprs.2017.11.009.

    Article  Google Scholar 

  42. Audebert N, Saux BL, Lefèvre S. Beyond rgb: very high resolution urban remote sensing with multimodal deep networks. ISPRS J Photogramm Remote Sens. 2017. https://doi.org/10.1016/j.isprsjprs.2017.11.011.

    Article  Google Scholar 

  43. Maggiori E, Tarabalka Y, Charpiat G, Alliez P. High-resolution aerial image labeling with convolutional neural networks. IEEE Trans Geosci Remote Sens. 2017;55(12):7092–103. https://doi.org/10.1109/TGRS.2017.2740362.

    Article  Google Scholar 

  44. Sherrah J. Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. 2016. arXiv:abs/1606.02585.

  45. Bai H, Cheng J, Huang X, Liu S, Deng C. Hcanet: a hierarchical context aggregation network for semantic segmentation of high-resolution remote sensing images. IEEE Geosci Remote Sens Lett. 2021. https://doi.org/10.1109/LGRS.2021.3063799.

    Article  Google Scholar 

  46. Maggiori E, Tarabalka Y, Charpiat G, Alliez P. High-resolution aerial image labeling with convolutional neural networks. IEEE Trans Geosci Remote Sens. 2017;55(12):7092–103.

    Article  Google Scholar 

  47. Mou L, Hua Y, Zhu XX. Relation matters: relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Trans Geosci Remote Sens. 2020;58(11):7557–69. https://doi.org/10.1109/TGRS.2020.2979552.

    Article  Google Scholar 

  48. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R, editors. Advances in neural information processing systems, vol. 32. Curran Associates, Inc.; 2019. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.

  49. del Barrio E, Cuesta-Albertos JA, Matrán C. An optimal transportation approach for assessing almost stochastic order. In: Gil E, Gil E, Gil J, Gil M, editors. The Mathematics of the Uncertain. Studies in Systems, Decision and Control, vol. 142. Cham: Springer; 2018. https://doi.org/10.1007/978-3-319-73848-2_3.

    Google Scholar 

  50. Ulmer D, Hardmeier C, Frellsen J. Deep-significance-easy and meaningful statistical significance testing in the age of neural networks. arXiv preprint arXiv:2204.06815 (2022).

  51. Dror R, Shlomov S, Reichart R. Deep dominance—how to properly compare deep neural models. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics; 2019. p. 2773–785. https://doi.org/10.18653/v1/P19-1266. https://aclanthology.org/P19-1266

  52. Wambugu N, Chen Y, Xiao Z, Wei M, Aminu Bello S, Marcato Junior J, Li J. A hybrid deep convolutional neural network for accurate land cover classification. Int J Appl Earth Obs Geoinf. 2021;103: 102515. https://doi.org/10.1016/j.jag.2021.102515.

    Article  Google Scholar 

  53. Li R, Zheng S, Zhang C, Duan C, Su J, Wang L, Atkinson PM. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Trans Geosci Remote Sens. 2022;60:1–13. https://doi.org/10.1109/TGRS.2021.3093977.

    Article  Google Scholar 

  54. Li R, Wang L, Zhang C, Duan C, Zheng S. A2-fpn for semantic segmentation of fine-resolution remotely sensed images. Int J Remote Sens. 2022;43(3):1131–55. https://doi.org/10.1080/01431161.2022.2030071.

    Article  Google Scholar 

  55. Hazırbaş C, Ma L, Domokos C, Cremers D. Fusenet: incorporating depth into semantic segmentation via fusion-based cnn architecture. (2016). https://doi.org/10.1007/978-3-319-54181-5_14.

  56. Zhang C, Jiang W, Zhao Q. Semantic segmentation of aerial imagery via split-attention networks with disentangled nonlocal and edge supervision. Remote Sens. 2021. https://doi.org/10.3390/rs13061176.

    Article  Google Scholar 

  57. Bokhovkin A, Burnaev E. Boundary loss for remote sensing imagery semantic segmentation. In: Lu H, Tang H, Wang Z, editors. Advances in neural networks-ISNN 2019. Cham: Springer; 2019. p. 388–401.

    Chapter  Google Scholar 

  58. Jampani V, Sun D, Liu MY, Yang MH, Kautz J. Superpixel sampling networks. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision–ECCV, Lecture Notes in Computer Science, ECCV 2018, vol. 11211. Cham: Springer; 2018. https://doi.org/10.1007/978-3-030-01234-2_22

    Google Scholar 

  59. Zhao S, Wang Y, Yang Z, Cai D. Region mutual information loss for semantic segmentation. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article 997. NY, USA: Curran Associates Inc.; 2019. p. 11117–27.

    Google Scholar 

  60. Mi L, Chen Z. Superpixel-enhanced deep neural forest for remote sensing image semantic segmentation. ISPRS J Photogramm Remote Sens. 2020;159:140–52. https://doi.org/10.1016/j.isprsjprs.2019.11.006.

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the International Society for Photogrammetry and Remote Sensing (ISPRS) for sharing 2D semantic segmentation benchmark datasets.

Funding

No funding was obtained for this study.

Author information

Authors and Affiliations

Authors

Contributions

AC and AS have conceptualized the proposed scheme. The implementation and results are produced by AC. The validation of results is performed by AS and DJC. The manuscript has been prepared by AC and AS. The proofreading and finalization are completed by AS, DJC and SPA.

Corresponding author

Correspondence to Avinash Chouhan.

Ethics declarations

Conflict of interest

The authors do not have related financial or non-financial interests that need to be disclosed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chouhan, A., Sur, A., Chutia, D. et al. HybridNet: Integrating Multiple Approaches for Aerial Semantic Segmentation. SN COMPUT. SCI. 5, 133 (2024). https://doi.org/10.1007/s42979-023-02434-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02434-4

Keywords

Navigation