Abstract
The automatic extraction of roads or buildings from remote sensing imagery plays a significant role in many urban applications. Recently, due to the impressive performance of deep learning, various road segmentation methods based on the fully convolutional network (FCN) have been proposed for optical remote sensing images. However, the existing FCN-based high-fidelity remote sensing image segmentation methods still have some limitations. As the repeated convolution and pooling operations employed in an FCN reduce the feature resolution and lose some detailed information, FCNs have a limited capacity to mine long-range dependencies among features. To address this issue, a context information capture network (CM-FCN) for road segmentation is proposed. To capture and aggregate multiscale contextual information, a dilated convolution module is designed. Furthermore, to boost the long-range dependencies of features for road detection, two attention modules employing the attention mechanism to adaptively combine local features with their global dependencies are designed. The context features extracted from the dilated convolution module are then fused into the attention modules to further improve the segmentation performance. The proposed model is evaluated on three challenging remote sensing image road segmentation datasets and one building segmentation dataset, including a dataset with our own manual labels. Comparisons demonstrate the effectiveness of our proposed method. We conclude that our proposed CM-FCN has the potential to automatically segment roads and buildings from high-resolution remote sensing images with an accuracy that renders it a useful tool for practical application scenarios.
Similar content being viewed by others
References
Huang X, Zhang L. Road centreline extraction from high-resolution imagery based on multiscale structural features and support vector machines. J Remote Sens. 2009;30(8):1977–87.
Mnih V, Hinton GE. Learning to detect roads in high-resolution aerial images. In: ECCV’10 Proceedings of the 11th European Conference on Computer Vision: Part VI. 2010. p. 210–223.
Unsalan C, Sirmacek B. Road network detection using probabilistic and graph theoretical methods. IEEE Trans Geosci Remote Sens. 2012;50(11):4441–53.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.
Mahmud M, Kaiser MS, McGinnity TM, Hussain A. Deep learning in mining biological data. Cogn Comput. 2020;1–33.
Zhu Y, Liang Z, Yan J, Chen G, Wang X. E-d-net: Automatic building extraction from high-resolution aerial images with boundary information. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2021;PP(99):1–13.
Paisitkriangkrai S, Sherrah J, Janney P, Hengel AV. Effective semantic pixel labelling with convolutional networks and conditional random fields. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2015. p. 36–43.
Saito S, Yamashita T, Aoki Y. Multiple object extraction from aerial imagery with convolutional neural networks. J Imaging Sci Technol. 2016;60(1):10402.
Zhu Y, Yan J, Wang C, Zhou Y. Road detection of remote sensing image based on convolutional neural network. In: International Conference on Image and Graphics. 2019. p. 106–118.
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: ICLR 2016: International Conference on Learning Representations 2016. 2016.
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. p. 833–851.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. p. 5998–6008.
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. p. 3146–3154.
Chandra N, Ghosh JK, Sharma A. A cognitive framework for road detection from high-resolution satellite images. Geocarto Int. 2019;34(8):909–924.
Zhang Z, Liu Q, Wang Y. Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett. 2018;15(5):749–53.
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015. p. 234–241.
Zhong Y, Zhao J, Zhang L. A hybrid object-oriented conditional random field classification framework for high spatial resolution remote sensing imagery. IEEE Trans Geosci Remote Sens. 2014;52(11):7023–37.
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):834–48.
Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 6230–6239.
Cheng G, Han J. A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens. 2016;117:11–28.
Zhou J, Bischof WF, Caelli T. Road tracking in aerial images based on human computer interaction and Bayesian filtering. ISPRS J Photogramm Remote Sens. 2006;61(2):108–24.
Huertas A, Nevatia R. Detecting buildings in aerial images. Graphical Models graphical Models and Image Processing computer Vision, Graphics, and Image Processing. 1988;41(2):131–52.
Gong C, Han J, Lei G, Qian X, Zhou P, Yao X, Hu X. Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS J Photogramm Remote Sens. 85(nov.):32–43.
Zhao L-J, Tang P, Huo L-Z. Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2014;7(12):4620–31.
Yokoya N, Iwasaki A. Object detection based on sparse representation and hough voting for optical remote sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2015;8(5):2053–62.
Camps-Valls G, Bruzzone L. Kernel-based methods for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2005;43(6):1351–62.
Bishop CM. Neural Networks For Pattern Recognition. 1995.
Paola JD, Schowengerdt RA. A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. IEEE Trans Geosci Remote Sens. 1995;33(4):981–96.
Romero A, Gatta C, Camps-Valls G. Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans Geosci Remote Sens. 2016;54(3):1349–62.
Wei Y, Wang Z, Mai X. Road structure refined cnn for road extraction in aerial image. IEEE Geosci Remote Sens Lett. 2017;14(5):709–13.
Maggiori E, Tarabalka Y, Charpiat G, Alliez P. Fully convolutional neural networks for remote sensing image classification. In: 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2016. p. 5071–5074.
Maggiori E, Tarabalka Y, Charpiat G, Alliez P. Can semantic labeling methods generalize to any city? The INRIA aerial image labeling benchmark. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2017. p. 3226–3229.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. p. 3431–3440.
Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G. Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018. p. 1451–1460.
Chandra S, Kokkinos I. Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFS. In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision ECCV 2016. Springer: Cham, Switzerland; 2016. p. 402–418.
Arnab A, Jayasumana S, Zheng S, Torr PHS. Higher order potentials in end-to-end trainable conditional random fields. arXiv: Computer Vision and Pattern Recognition. 2015.
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. p. 7794–7803.
Lin G, Shen C, van den Hengel A, Reid I. Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 3194–3203.
Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y. A structured self-attentive sentence embedding. In: ICLR 2017: International Conference on Learning Representations 2017. 2017.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: ICLR 2015: International Conference on Learning Representations 2015. 2015.
Peng C, Zhang X, Yu G, Luo G, Sun J. Large kernel matters – improve semantic segmentation by global convolutional network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 1743–1751.
Cheng G, Wang Y, Shibiao X, Wang H, Xiang S, Pan C. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Trans Geosci Remote Sens. 2017;55(6):3322–37.
Liu G, Sun X, Kun F, Wang H. Interactive geospatial object extraction in high resolution remote sensing images using shape-based global minimization active contour model. Pattern Recogn Lett. 2013;34(10):1186–95.
Huang B, Lu K, Audeberr N, Khalel A, Tarabalka Y, Malof J, Boulch A, Le Saux B, Collins L, Bradbury K, Lefevre S, El-Saban M. Large-scale semantic classification: Outcome of the first year of INRIA aerial image labeling benchmark. In: IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium. 2018. p. 6947–6950.
Funding
This work was supported by the National Natural Science Foundation of China [No. 41976174], [No. 62071499] and the Natural Science Foundation of Guangdong Province, China [No. 2020A1515010869]. The authors thank @INRIA, @Mnih, and @Cheng for kindly providing the aerial image labelling dataset. The author thanks my colleagues at the Guangdong Provincial Key Laboratory of Image Processing, who spent a substantial amount of effort manually labelling the INRIA road dataset.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, Y., Long, L., Wang, J. et al. Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network. Cogn Comput 14, 780–793 (2022). https://doi.org/10.1007/s12559-021-09980-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09980-0