Elsevier

Neurocomputing

Volume 356, 3 September 2019, Pages 131-141
Neurocomputing

Recognizing road from satellite images by structured neural network

https://doi.org/10.1016/j.neucom.2019.05.007Get rights and content

Highlights

  • We have collected a large labeled road extraction dataset with varying region distribution, which includes labels for both roads and their orientations.

  • We further designed a structured deep neural network for road extraction. The specially designed cascade network and direction module are quite effective to capture the linear structure of the road. The experimental results reveal the effectiveness of our structured road extraction network.

  • We designed a novel road performance evaluation metric that considers both the pixel-level aspect and number-level aspect. To our best knowledge, it is the first time to introduce the number-level metric for the road detection task.

Abstract

Recognizing and extracting roads accurately are significant for auto-driving cars and map providers. Thanks to the power of deep learning, it is possible to achieve high accuracy with a large amount of labeled data. However, as far as we know, there is not enough public data for road recognition from satellite images, especially for the urban scene. To provide sufficient data for training a neural network, we collect a large dataset for road recognition task, which covers varieties of road scenes and contains large-size images from the satellite view. Inspired by the unique road structure, we propose a structured deep neural network to obtain smooth and continuous road skeleton. The proposed network incorporates the road segmentation result and direction result together. Based on the shape prior of the road, the predicted direction information can facilitate road extraction in an end-to-end learning network. Then, a cascade skeleton network is proposed to achieve smooth, continuous and equal-width road skeleton. We also design an evaluation metric which measures both per pixel accuracy and per road accuracy. Our structured road extraction network outperforms the state-of-the-art approaches and the baseline without road prior.

Introduction

Understanding land information, such as roads, buildings and other important objects is essential for up-to-date navigation providers, government statistical departments and auto-driving car services. Traditional methods for collecting and updating road information are labor intensive, including actively detecting each object by human or keeping tracking of navigation signals reported from numerous users. With the fast development of satellite imagery, taking snapshot of the large target area in a few hours and updating land information timely and accurately become possible. Every day we can easily obtain millions of high quality satellite images. But how to efficiently extract useful, complete and correct information from them becomes an important problem.

As Fig. 1 shows, extracting road from high-resolution urban images is much difficult than that from rural images. The main difficulities lie in the following aspects: (1) Some objects, such as buildings and parking lots, have similar spectural and structural appearance to the road. It is hard to distinguish them only with spectral feature or structural feature. (2) There are some big occlusions of road from tall buildings and trees, which makes it hard to extract smooth and complete road area. Previous algorithms predict road by designing tailor-made features and algorithms [1], [2], [3], [4], [5], which cannot provide acceptable result in real circumstances. Recently, the rapid development of deep neural network [6], [7] indicates great potential for real production to save human labor and improve efficiency.

However, we are facing two challenges when applying deep neural network to our task. The first one is the lack of large labeled dataset. As we know, a large amount of data are needed to optimize millions of parameters in the neural network. Second, although general model as semantic segmentation is able to predict the category for each pixel in an image and provides a good baseline for many practical problems, they overlook the intrinsic prior for some particular objects. As shown in Fig. 1(c), some predicted roads are uncontinuous with baseline methods, which do not fit the practical standard, while our structural nural network can achieve more smooth and complete road structures.

To have sufficient satellite images for our task, we collect a larger road dataset with roughly 15,000 labeled satellite images with resolution of 2048x2048 pixels. These images spread over 6 cities in Asia. In addition to the common roads in city, lots of field trails in rural-urban fringe are also included in our dataset. Due to different city planing histories, the roads in our dataset are more complex than those in U.S and Europe. This increases the variety of roads and make detection more difficult.

Moreover, we propose a structured deep learning framework for road extraction. We start from a fully convolution network (FCN) [8] for road area segmentation. Then, we introduce a road direction module, which can further predict the direction information by the network as an intermediate product, to reinforce the road area segmentation result along the road direction. Both the road segmentation information and direction information are fused together to achieve much more complete road structures. Finally, a skeleton module is introduced to achive smooth, complete and equal-width road area, which can be easily vectorized for the product usage. The road segmentation network, direction network, and road skeleton network are optimized together.

We have designed two evaluation metrics for the road centerline representation, which is the standard representation used in industry. One of our metrics is the F1 score based on pixel accuracy and the other is the F1 score based on accuracy for the number of roads. Our experiments demonstrate the effectiveness of our framework, especially the road direction and road skeleton module.

To summarize, our contributions are listed as follows:

  • We have designed a structured deep neural network for road extraction. The specially designed cascade network and direction module are quite effective to capture the linear structure of the road. The experimental results reveal the effectiveness of our structured road extraction network.

  • We designed a novel road performance evaluation metric that considers both the pixel-level aspect and number-level aspect. To our best knowledge, it is the first time to introduce the number-level metric for the road detection task.

The remainder of this paper is organized as follows: Section 2 presents the related work. In Section 3, we introduce the complex road dataset used in this paper. The proposed structured road extraction network is described in Section 4. Section 5 validates our approach experimentally, followed the conclusions and discussions in Section 6.

Section snippets

Related work

Recently, the revolution of deep neural network has pushed nearly all directions of classical computer vision problem to a new stage, including classification, detection, semantic segmentation, etc. [8], [9], [10], [11], [12], [13], [14], [15], [16], [17]. With the well established state-of-the-art approaches for these standard and classical problems, the interest of the community is moving to some specified problems where special domain knowledge or object prior is overlooked in the classical

Satellite road Dataset

Existing aerial dataset have been designed for different tasks. Aerial Image Segmentation Dataset [34] and VEDAI [35] consist of satellite images but with labels for other objects, such as car, city, warehouse, plants, etc.. KITTI road dataset [36] and DIPLODOC road stereo sequence dataset [37] are used to train model to predict road and landmark by images from car viewpoint. In TorontoCity dataset [38], road surfaces are generated by fusing different sources, which are not labeled directly. As

Structured road extraction network

In this section, we introduce our structured road extraction network as shown in Fig. 5. It consists of three components, feature extraction network, information fusion network, as well as final road skeleton network. The feature extraction module in Fig. 5 (1) adopts the encoder-decoder settings to obtain the high-level feature maps with the same size of input image. To our best knowledge, some separated and uncontinues roads are generated only with region segmentation module. To overcome

Experiments

In this section, we will explain the implementation details, the evaluation metric, as well as the visual and quantitive result between our designed structured road extraction network and the state-of-the-art approaches.

Conclusions and discussions

In this paper, we have proposed a novel structured road extraction network for road recognition in satellite images, which consists of road feature extraction module, road information fusion module and road skeleton extraction module. The three modules work together and take advantage of the intrinsic road prior. Our model is able to detect continuous roads and outperforms the baseline significantly. To our best knowledge, it is the first work that incorporates the road prior information in the

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China (Grant No. 2016YFB0502503), and National Science and Technology Major Project of China (Grant No. 30-Y20A04-9001-17/18), and Hainan Key Research and Development Program (Grant No. ZDYF2018001).

Guangliang Cheng is currently a Postdoc researcher in the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, China. Before that he received his Ph.D. degree with the Institute of Automation, Chinese Academy of Sciences, Beijing. His research interests include autonomous driving, scene understanding and remote sensing image processing.

References (51)

  • H. Mayer et al.

    A test of automatic road extraction approaches

    (2006)
  • M. Zanin et al.

    Diplodoc road stereo sequence

    Technical Reports

    (2013)
  • ChenL.-C et al.

    Rethinking atrous convolution for semantic image segmentation

    ArXiv

    (2017)
  • LinY. et al.

    Road detection and tracking from aerial desert imagery

    J. Intell. Robot. Syst

    (2012)
  • L. Mengmeng et al.

    Region-based urban road extraction from VHR satellite images using binary partition tree

    Int. J. Appl. Earth Obs. Geoinf.

    (2016)
  • Z. Hailing et al.

    Efficient road detection and tracking for unmanned aerial vehicle

    IEEE Trans. Intell. Transp. Syst.

    (2015)
  • J.A. Montoya-Zegarra et al.

    Mind the gap: modeling local and global context in (road) networks

    Proceedings of the German Conference on Pattern Recognition, pp. 212–223

    (2014)
  • G. Mattyus et al.

    Enhancing road maps by parsing aerial images around the world

    Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    (2015)
  • HeK. et al.

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

    Proceedings of the IEEE International Conference on Computer Vision (CVPR)

    (2015)
  • C. Lu et al.

    Surpassing human-level face verification performance on LFW with GaussianFace

    Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

    (2015)
  • LongJ. et al.

    Fully convolutional networks for semantic segmentation

    Proceedings of the IEEE International Conference on Computer Vision (CVPR)

    (2015)
  • K. Simonyan et al.

    Very deep convolutional networks for large-scale image recognition

    3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

    (2015)
  • C. Szegedy et al.

    Going deeper with convolutions

    Proceedings of the Computer Vision and Pattern Recognition (CVPR)

    (2015)
  • HeK. et al.

    Deep residual learning for image recognition

    2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016

    (2016)
  • LinD. et al.

    Scribblesup: scribble-supervised convolutional networks for semantic segmentation

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • J. Dai et al.

    Instance-sensitive fully convolutional networks

    Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VI

    (2016)
  • R. Girshick

    Fast R-CNN

    Proceedings of the International Conference on Computer Vision (ICCV)

    (2015)
  • S. Ren et al.

    Faster r-CNN: towards real-time object detection with region proposal networks

    Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

    (2016)
  • LiuW. et al.

    SSD: single shot multibox detector

    Proceedings of the European Conference on Computer Vision

    (2016)
  • J. Redmon et al.

    You only look once: unified, real-time object detection

    Proceedings of the Conference on Computer Vision and Pattern Recognition

    (2016)
  • T. Ishii et al.

    Detection by classification of buildings in multispectral satellite imagery

    Proceedings of the ICPR

    (2016)
  • E. Simo-Serra et al.

    Learning to simplify: fully convolutional networks for rough sketch cleanup

    ACM Trans. Graph.

    (2016)
  • HuangZ. et al.

    Building extraction from multi-source remote sensing images via deep deconvolution neural networks

    Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2016, Beijing, China

    (2016)
  • LiuW. et al.

    A survey of deep neural network architectures and their applications

    Neurocomputing

    (2017)
  • YuanQ. et al.

    Hyperspectral image denoising employing a spatial-spectral deep residual convolutional neural network

    IEEE Trans. Geosci. Remote Sens.

    (2019)
  • Cited by (19)

    • Road extraction from satellite images with iterative cross-task feature enhancement

      2022, Neurocomputing
      Citation Excerpt :

      Methods [16,22,21,19] belonging to the first category are shown to capture large spatial context better. Among others, [4] proposes a structured deep neural network which incorporates the road segmentation result and orientation result together for smooth and continuous road extraction. Recently, [14] uses CNN models to learn features from raw data automatically, leading to edge-preserving filters that can preserve the edges as well as the details of the road.

    • Leveraging optical and SAR data with a UU-Net for large-scale road extraction

      2021, International Journal of Applied Earth Observation and Geoinformation
      Citation Excerpt :

      The spectral and shape information of the optical dataset benefit for road extraction while the geometric and physical information of SAR also shows its potential. Optical data for road extraction is limited to cloud contamination, shadows, and intra-/inter- class confusions (Cheng et al. 2019). With all-day and all-weather characteristics, SAR performs well in different weather conditions, which provides a fundamental data source for earth observation archives.

    • Multimodal image registration using histogram of oriented gradient distance and data-driven grey wolf optimizer

      2020, Neurocomputing
      Citation Excerpt :

      In the future, the following directions will be investigated. First, since it is difficult to extract the highly repeatable shared features between multimodal images, we will use deep learning networks [59–62] to extract features. Second, to further reduce computational time, graphics processing unit (GPU) will be used to accelerate the calculation process of HDO.

    • Automated Road Extraction from Remotely Sensed Imagery using ConnectNet

      2023, Journal of the Indian Society of Remote Sensing
    • DuARE: Automatic Road Extraction with Aerial Images and Trajectory Data at Baidu Maps

      2022, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    View all citing articles on Scopus

    Guangliang Cheng is currently a Postdoc researcher in the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, China. Before that he received his Ph.D. degree with the Institute of Automation, Chinese Academy of Sciences, Beijing. His research interests include autonomous driving, scene understanding and remote sensing image processing.

    Chongruo Wu is currently a Ph.D. student at the University of California, Davis. He received a master’s degree from the University of Michigan, and a bachelor’s degree from Zhejiang University. His research focus is on deep learning and its applications.

    Qingqing Huang is an assistant professor at Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, where she received her Ph.D. degree in cartography and geographical information system in 2013. She received her BS and MS degrees in control theory and control engineering from Beijing Institute of Technology in 2004 and 2006, respectively. Her current research interests include remote sensing image fusion, land monitoring and change detection.

    Yu Meng received her BS degree in information processing from Wuhan University in 2004 and her Ph.D. in signal and information processing in 2009. She is a professor at the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences. She is the author of more than 10 journal papers and has written one book chapter. Her current research interests include remote sensing image processing, change detection.

    Jianping Shi is a Research Director at SenseTime. Currently her team works on developing algorithms for autonomous driving, scene understanding, mobile applications, remote sensing, etc. She got her Ph.D. degree in Computer Science and Engineering Department in the Chinese University of Hong Kong in 2015 under the supervision of Prof. Jiaya Jia. Before that, she received the B. Eng degree from Computer Science and Technology Department in Zhejiang University in 2011.

    Jiansheng Chen received the Ph.D. degree from the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, China, in 2012.He is currently an assistant professor with the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, China. His research interests include object detection and remote sensing image processing.

    Dongmei Yan is a professor at Sanya Institute of Remote Sensing. She received her Ph.D. degree in cartography and geographical information system from the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences in 2004. Her current research interests include remote sensing image data mining, mainly including image feature extraction, land-use/cover automatic classification, change detection, and object detection and recognition.

    1

    These authors contributed equally to this work.

    View full text