Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network

Zhu, Yuting; Long, Lihong; Wang, Jinjie; Yan, Jingwen; Wang, Xiaoqing

doi:10.1007/s12559-021-09980-0

Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network

Published: 15 January 2022

Volume 14, pages 780–793, (2022)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Yuting Zhu ORCID: orcid.org/0000-0001-9789-4028¹,
Lihong Long²,
Jinjie Wang¹,
Jingwen Yan² &
…
Xiaoqing Wang¹

507 Accesses
5 Citations
Explore all metrics

Abstract

The automatic extraction of roads or buildings from remote sensing imagery plays a significant role in many urban applications. Recently, due to the impressive performance of deep learning, various road segmentation methods based on the fully convolutional network (FCN) have been proposed for optical remote sensing images. However, the existing FCN-based high-fidelity remote sensing image segmentation methods still have some limitations. As the repeated convolution and pooling operations employed in an FCN reduce the feature resolution and lose some detailed information, FCNs have a limited capacity to mine long-range dependencies among features. To address this issue, a context information capture network (CM-FCN) for road segmentation is proposed. To capture and aggregate multiscale contextual information, a dilated convolution module is designed. Furthermore, to boost the long-range dependencies of features for road detection, two attention modules employing the attention mechanism to adaptively combine local features with their global dependencies are designed. The context features extracted from the dilated convolution module are then fused into the attention modules to further improve the segmentation performance. The proposed model is evaluated on three challenging remote sensing image road segmentation datasets and one building segmentation dataset, including a dataset with our own manual labels. Comparisons demonstrate the effectiveness of our proposed method. We conclude that our proposed CM-FCN has the potential to automatically segment roads and buildings from high-resolution remote sensing images with an accuracy that renders it a useful tool for practical application scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

DDCAttNet: Road Segmentation Network for Remote Sensing Images

DSMSA-Net: Deep Spatial and Multi-scale Attention Network for Road Extraction in High Spatial Resolution Satellite Images

Article 21 July 2022

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

Article 05 March 2024

References

Huang X, Zhang L. Road centreline extraction from high-resolution imagery based on multiscale structural features and support vector machines. J Remote Sens. 2009;30(8):1977–87.
Article MathSciNet Google Scholar
Mnih V, Hinton GE. Learning to detect roads in high-resolution aerial images. In: ECCV’10 Proceedings of the 11th European Conference on Computer Vision: Part VI. 2010. p. 210–223.
Unsalan C, Sirmacek B. Road network detection using probabilistic and graph theoretical methods. IEEE Trans Geosci Remote Sens. 2012;50(11):4441–53.
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.
Article Google Scholar
Mahmud M, Kaiser MS, McGinnity TM, Hussain A. Deep learning in mining biological data. Cogn Comput. 2020;1–33.
Zhu Y, Liang Z, Yan J, Chen G, Wang X. E-d-net: Automatic building extraction from high-resolution aerial images with boundary information. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2021;PP(99):1–13.
Paisitkriangkrai S, Sherrah J, Janney P, Hengel AV. Effective semantic pixel labelling with convolutional networks and conditional random fields. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2015. p. 36–43.
Saito S, Yamashita T, Aoki Y. Multiple object extraction from aerial imagery with convolutional neural networks. J Imaging Sci Technol. 2016;60(1):10402.
Article Google Scholar
Zhu Y, Yan J, Wang C, Zhou Y. Road detection of remote sensing image based on convolutional neural network. In: International Conference on Image and Graphics. 2019. p. 106–118.
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: ICLR 2016: International Conference on Learning Representations 2016. 2016.
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. p. 833–851.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. p. 5998–6008.
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. p. 3146–3154.
Chandra N, Ghosh JK, Sharma A. A cognitive framework for road detection from high-resolution satellite images. Geocarto Int. 2019;34(8):909–924.
Zhang Z, Liu Q, Wang Y. Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett. 2018;15(5):749–53.
Article Google Scholar
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015. p. 234–241.
Zhong Y, Zhao J, Zhang L. A hybrid object-oriented conditional random field classification framework for high spatial resolution remote sensing imagery. IEEE Trans Geosci Remote Sens. 2014;52(11):7023–37.
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):834–48.
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 6230–6239.
Cheng G, Han J. A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens. 2016;117:11–28.
Article Google Scholar
Zhou J, Bischof WF, Caelli T. Road tracking in aerial images based on human computer interaction and Bayesian filtering. ISPRS J Photogramm Remote Sens. 2006;61(2):108–24.
Article Google Scholar
Huertas A, Nevatia R. Detecting buildings in aerial images. Graphical Models graphical Models and Image Processing computer Vision, Graphics, and Image Processing. 1988;41(2):131–52.
Article Google Scholar
Gong C, Han J, Lei G, Qian X, Zhou P, Yao X, Hu X. Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS J Photogramm Remote Sens. 85(nov.):32–43.
Zhao L-J, Tang P, Huo L-Z. Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2014;7(12):4620–31.
Article Google Scholar
Yokoya N, Iwasaki A. Object detection based on sparse representation and hough voting for optical remote sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2015;8(5):2053–62.
Article Google Scholar
Camps-Valls G, Bruzzone L. Kernel-based methods for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2005;43(6):1351–62.
Article Google Scholar
Bishop CM. Neural Networks For Pattern Recognition. 1995.
Paola JD, Schowengerdt RA. A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. IEEE Trans Geosci Remote Sens. 1995;33(4):981–96.
Article Google Scholar
Romero A, Gatta C, Camps-Valls G. Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans Geosci Remote Sens. 2016;54(3):1349–62.
Article Google Scholar
Wei Y, Wang Z, Mai X. Road structure refined cnn for road extraction in aerial image. IEEE Geosci Remote Sens Lett. 2017;14(5):709–13.
Article Google Scholar
Maggiori E, Tarabalka Y, Charpiat G, Alliez P. Fully convolutional neural networks for remote sensing image classification. In: 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2016. p. 5071–5074.
Maggiori E, Tarabalka Y, Charpiat G, Alliez P. Can semantic labeling methods generalize to any city? The INRIA aerial image labeling benchmark. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2017. p. 3226–3229.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. p. 3431–3440.
Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G. Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018. p. 1451–1460.
Chandra S, Kokkinos I. Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFS. In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision ECCV 2016. Springer: Cham, Switzerland; 2016. p. 402–418.
Arnab A, Jayasumana S, Zheng S, Torr PHS. Higher order potentials in end-to-end trainable conditional random fields. arXiv: Computer Vision and Pattern Recognition. 2015.
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. p. 7794–7803.
Lin G, Shen C, van den Hengel A, Reid I. Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 3194–3203.
Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y. A structured self-attentive sentence embedding. In: ICLR 2017: International Conference on Learning Representations 2017. 2017.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: ICLR 2015: International Conference on Learning Representations 2015. 2015.
Peng C, Zhang X, Yu G, Luo G, Sun J. Large kernel matters – improve semantic segmentation by global convolutional network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 1743–1751.
Cheng G, Wang Y, Shibiao X, Wang H, Xiang S, Pan C. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Trans Geosci Remote Sens. 2017;55(6):3322–37.
Article Google Scholar
Liu G, Sun X, Kun F, Wang H. Interactive geospatial object extraction in high resolution remote sensing images using shape-based global minimization active contour model. Pattern Recogn Lett. 2013;34(10):1186–95.
Article Google Scholar
Huang B, Lu K, Audeberr N, Khalel A, Tarabalka Y, Malof J, Boulch A, Le Saux B, Collins L, Bradbury K, Lefevre S, El-Saban M. Large-scale semantic classification: Outcome of the first year of INRIA aerial image labeling benchmark. In: IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium. 2018. p. 6947–6950.

Download references

Funding

This work was supported by the National Natural Science Foundation of China [No. 41976174], [No. 62071499] and the Natural Science Foundation of Guangdong Province, China [No. 2020A1515010869]. The authors thank @INRIA, @Mnih, and @Cheng for kindly providing the aerial image labelling dataset. The author thanks my colleagues at the Guangdong Provincial Key Laboratory of Image Processing, who spent a substantial amount of effort manually labelling the INRIA road dataset.

Author information

Authors and Affiliations

School of Electronic and Communication Engineering, Sun Yat-sen University, Guangzhou, China
Yuting Zhu, Jinjie Wang & Xiaoqing Wang
Department of Electronic Engineering, Shantou University, Shantou, Guangdong, China
Lihong Long & Jingwen Yan

Authors

Yuting Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Long
View author publications
You can also search for this author in PubMed Google Scholar
Jinjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jingwen Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jingwen Yan or Xiaoqing Wang.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Y., Long, L., Wang, J. et al. Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network. Cogn Comput 14, 780–793 (2022). https://doi.org/10.1007/s12559-021-09980-0

Download citation

Received: 09 May 2020
Accepted: 26 November 2021
Published: 15 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s12559-021-09980-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network

Abstract

Access this article

Similar content being viewed by others

DDCAttNet: Road Segmentation Network for Remote Sensing Images

DSMSA-Net: Deep Spatial and Multi-scale Attention Network for Road Extraction in High Spatial Resolution Satellite Images

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Ethical approval

Conflicts of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network

Abstract

Access this article

Similar content being viewed by others

DDCAttNet: Road Segmentation Network for Remote Sensing Images

DSMSA-Net: Deep Spatial and Multi-scale Attention Network for Road Extraction in High Spatial Resolution Satellite Images

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Ethical approval

Conflicts of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation