Skip to main content
Log in

Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The automatic extraction of roads or buildings from remote sensing imagery plays a significant role in many urban applications. Recently, due to the impressive performance of deep learning, various road segmentation methods based on the fully convolutional network (FCN) have been proposed for optical remote sensing images. However, the existing FCN-based high-fidelity remote sensing image segmentation methods still have some limitations. As the repeated convolution and pooling operations employed in an FCN reduce the feature resolution and lose some detailed information, FCNs have a limited capacity to mine long-range dependencies among features. To address this issue, a context information capture network (CM-FCN) for road segmentation is proposed. To capture and aggregate multiscale contextual information, a dilated convolution module is designed. Furthermore, to boost the long-range dependencies of features for road detection, two attention modules employing the attention mechanism to adaptively combine local features with their global dependencies are designed. The context features extracted from the dilated convolution module are then fused into the attention modules to further improve the segmentation performance. The proposed model is evaluated on three challenging remote sensing image road segmentation datasets and one building segmentation dataset, including a dataset with our own manual labels. Comparisons demonstrate the effectiveness of our proposed method. We conclude that our proposed CM-FCN has the potential to automatically segment roads and buildings from high-resolution remote sensing images with an accuracy that renders it a useful tool for practical application scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Huang X, Zhang L. Road centreline extraction from high-resolution imagery based on multiscale structural features and support vector machines. J Remote Sens. 2009;30(8):1977–87.

    Article  MathSciNet  Google Scholar 

  2. Mnih V, Hinton GE. Learning to detect roads in high-resolution aerial images. In: ECCV’10 Proceedings of the 11th European Conference on Computer Vision: Part VI. 2010. p. 210–223.

  3. Unsalan C, Sirmacek B. Road network detection using probabilistic and graph theoretical methods. IEEE Trans Geosci Remote Sens. 2012;50(11):4441–53.

    Article  Google Scholar 

  4. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.

    Article  Google Scholar 

  5. Mahmud M, Kaiser MS, McGinnity TM, Hussain A. Deep learning in mining biological data. Cogn Comput. 2020;1–33.

  6. Zhu Y, Liang Z, Yan J, Chen G, Wang X. E-d-net: Automatic building extraction from high-resolution aerial images with boundary information. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2021;PP(99):1–13.

  7. Paisitkriangkrai S, Sherrah J, Janney P, Hengel AV. Effective semantic pixel labelling with convolutional networks and conditional random fields. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2015. p. 36–43.

  8. Saito S, Yamashita T, Aoki Y. Multiple object extraction from aerial imagery with convolutional neural networks. J Imaging Sci Technol. 2016;60(1):10402.

    Article  Google Scholar 

  9. Zhu Y, Yan J, Wang C, Zhou Y. Road detection of remote sensing image based on convolutional neural network. In: International Conference on Image and Graphics. 2019. p. 106–118.

  10. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: ICLR 2016: International Conference on Learning Representations 2016. 2016.

  11. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. p. 833–851.

  12. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. p. 5998–6008.

  13. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. p. 3146–3154.

  14. Chandra N, Ghosh JK, Sharma A. A cognitive framework for road detection from high-resolution satellite images. Geocarto Int. 2019;34(8):909–924.

  15. Zhang Z, Liu Q, Wang Y. Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett. 2018;15(5):749–53.

    Article  Google Scholar 

  16. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015. p. 234–241.

  17. Zhong Y, Zhao J, Zhang L. A hybrid object-oriented conditional random field classification framework for high spatial resolution remote sensing imagery. IEEE Trans Geosci Remote Sens. 2014;52(11):7023–37.

    Article  Google Scholar 

  18. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):834–48.

    Article  Google Scholar 

  19. Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 6230–6239.

  20. Cheng G, Han J. A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens. 2016;117:11–28.

    Article  Google Scholar 

  21. Zhou J, Bischof WF, Caelli T. Road tracking in aerial images based on human computer interaction and Bayesian filtering. ISPRS J Photogramm Remote Sens. 2006;61(2):108–24.

    Article  Google Scholar 

  22. Huertas A, Nevatia R. Detecting buildings in aerial images. Graphical Models graphical Models and Image Processing computer Vision, Graphics, and Image Processing. 1988;41(2):131–52.

    Article  Google Scholar 

  23. Gong C, Han J, Lei G, Qian X, Zhou P, Yao X, Hu X. Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS J Photogramm Remote Sens. 85(nov.):32–43.

  24. Zhao L-J, Tang P, Huo L-Z. Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2014;7(12):4620–31.

    Article  Google Scholar 

  25. Yokoya N, Iwasaki A. Object detection based on sparse representation and hough voting for optical remote sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2015;8(5):2053–62.

    Article  Google Scholar 

  26. Camps-Valls G, Bruzzone L. Kernel-based methods for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2005;43(6):1351–62.

    Article  Google Scholar 

  27. Bishop CM. Neural Networks For Pattern Recognition. 1995.

  28. Paola JD, Schowengerdt RA. A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. IEEE Trans Geosci Remote Sens. 1995;33(4):981–96.

    Article  Google Scholar 

  29. Romero A, Gatta C, Camps-Valls G. Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans Geosci Remote Sens. 2016;54(3):1349–62.

    Article  Google Scholar 

  30. Wei Y, Wang Z, Mai X. Road structure refined cnn for road extraction in aerial image. IEEE Geosci Remote Sens Lett. 2017;14(5):709–13.

    Article  Google Scholar 

  31. Maggiori E, Tarabalka Y, Charpiat G, Alliez P. Fully convolutional neural networks for remote sensing image classification. In: 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2016. p. 5071–5074.

  32. Maggiori E, Tarabalka Y, Charpiat G, Alliez P. Can semantic labeling methods generalize to any city? The INRIA aerial image labeling benchmark. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2017. p. 3226–3229.

  33. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. p. 3431–3440.

  34. Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G. Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018. p. 1451–1460.

  35. Chandra S, Kokkinos I. Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFS. In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision ECCV 2016. Springer: Cham, Switzerland; 2016. p. 402–418.

  36. Arnab A, Jayasumana S, Zheng S, Torr PHS. Higher order potentials in end-to-end trainable conditional random fields. arXiv: Computer Vision and Pattern Recognition. 2015.

  37. Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. p. 7794–7803.

  38. Lin G, Shen C, van den Hengel A, Reid I. Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 3194–3203.

  39. Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y. A structured self-attentive sentence embedding. In: ICLR 2017: International Conference on Learning Representations 2017. 2017.

  40. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: ICLR 2015: International Conference on Learning Representations 2015. 2015.

  41. Peng C, Zhang X, Yu G, Luo G, Sun J. Large kernel matters – improve semantic segmentation by global convolutional network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 1743–1751.

  42. Cheng G, Wang Y, Shibiao X, Wang H, Xiang S, Pan C. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Trans Geosci Remote Sens. 2017;55(6):3322–37.

    Article  Google Scholar 

  43. Liu G, Sun X, Kun F, Wang H. Interactive geospatial object extraction in high resolution remote sensing images using shape-based global minimization active contour model. Pattern Recogn Lett. 2013;34(10):1186–95.

    Article  Google Scholar 

  44. Huang B, Lu K, Audeberr N, Khalel A, Tarabalka Y, Malof J, Boulch A, Le Saux B, Collins L, Bradbury K, Lefevre S, El-Saban M. Large-scale semantic classification: Outcome of the first year of INRIA aerial image labeling benchmark. In: IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium. 2018. p. 6947–6950.

Download references

Funding

This work was supported by the National Natural Science Foundation of China [No. 41976174], [No. 62071499] and the Natural Science Foundation of Guangdong Province, China [No. 2020A1515010869]. The authors thank @INRIA, @Mnih, and @Cheng for kindly providing the aerial image labelling dataset. The author thanks my colleagues at the Guangdong Provincial Key Laboratory of Image Processing, who spent a substantial amount of effort manually labelling the INRIA road dataset.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jingwen Yan or Xiaoqing Wang.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Long, L., Wang, J. et al. Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network. Cogn Comput 14, 780–793 (2022). https://doi.org/10.1007/s12559-021-09980-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09980-0

Keywords

Navigation