A road segmentation method based on the deep auto-encoder with supervised learning

https://doi.org/10.1016/j.compeleceng.2018.04.003Get rights and content

Abstract

Road environment perception is a key technique for unmanned vehicles. Segmentation of road images is an important method of determining the driving area. The segmentation precisions of existing methods are not high, and some are not in real-time. To solve these problems, we design a supervised deep auto-encoder (AE) model to complete the semantic segmentation of road environment images. By adding a supervised layer to a classical AE, and using the segmentation image of training samples as the supervised information, the model can learn the useful features to complete the semantic segmentation. Next, the multilayer stacking method of the supervised AE is designed to build the supervised deep AE, since the deep network has more abundant and diversified features. Finally, we verified the method using CamVid. Compared with Convolutional Neural Networks(CNN) and Fully Convolutional Networks(FCN), the road segmentation performance, such as precision and speed were improved.

Introduction

The road environment perception of unmanned vehicles has always been a popular research area, and the method based on machine vision is one of the important research methods. It utilizes the camera in the vehicle to acquire the road image of driving vehicle, and uses image process and pattern recognition to segment the image and determine the available driving area.

Traditional image segmentations extracts the low-level visual cues to get the segmentation results. These methods are not complex, but the results are often unsatisfactory on the difficult segmentation tasks such as the segmentation of road environment images. With the continued warming of the deep learning, it is proved that the Deep Convolutional Neural Networks (Deep Convolutional Neural Networks) has a great advantage of image feature extraction, but how to use the CNN to segment the image and improve the segmentation performance still need to make a thorough study. Many researchers [1], [2], [3], [4] used DCNN to segment images. But the DCNN-based methods have to generate proposal which are time-consuming. Long et al. [5] introduced the image semantic segmentation, based on Fully Convolutional Networks (FCN) model in 2015. Shortly thereafter, many FCN-based, semantic segmentation methods were proposed, improving the performance of segmentation. However, the FCN model is very complex, and the fine-tuning of million parameters is a lengthy task, requiring several weeks to train the networks on high-performance GPU. More time is required to train the model, which also needs high-performance system hardware. All of the aforementioned problems affect its application to road detection. To solve these problems, we design a supervised deep auto-encoder (AE) model to complete the semantic segmentation of road environment images.

As a classical model of deep learning, auto-encoders (AEs) learn the important features from samples, using unsupervised self-learning, and reconstruct the data information through concise expression. This paper presents the design of the supervised deep AE, which is successfully applied on the semantic segmentation of the road environment.

This paper provides several contributions. First, a new semantic segmentation method is proposed, by adding a supervised layer to a classical AE to learn useful features for image segmentation. Next, because of the supervised layer of the AE, the traditional stacking method is unsuitable. We design the multilayer stacking method for the supervised AE, and build a supervised deep AE that has better feature-extraction performance than single-layer, supervised AE. Finally, the road segmentation performance of the proposed method is proven to be effective and simple, using CamVid. When compared with other methods, the segmentation precision of the road region and the real time are better than others.

The rest of the paper is organized as follows. Section 2 reviews the related work on recent approaches of semantic segmentation. Section 3 introduces our method of road segmentation. In Section 4, we elaborate the model of the supervised Auto-Encoder. Experiments are discussed and evaluated in Section 5. A summary in Section 6 concludes this paper.

Section snippets

Related work

There are several methods for road environment perception [6], [7]. Those using deep learning [1], [8], [9], [10], [11], [12] have been of significant interest. Alvarez et al. [8] used the CNN to learn features of road environment. An image patch was classified by CNN and the probability belonging to the sky, surroundings, and road was given to complete the road segmentation. Brust et al. [9] built the Convolutional Patch Networks incorporating spatial information to realize the pixel-wise road

The proposed method

A supervised layer is added to the traditional AE, and the segmentation image of the road environment is used as the supervised information. Because of the supervised layer, AEs learn features useful for image segmentation, and complete the semantic segmentation of the road environment. The research shows that deep networks learn more abstract and diversified features. As a result, a deep network is built, to extract deep features of the road environment to complete the semantic segmentation.

Supervised single-layer auto-encoder

In 1986, Rumelhart [21] proposed the concept of auto-encoder and applied it to high-dimensional and complex data processing. The goal of the single-layer AE is to minimize the average reconstruction error between the input data X and the reconstructed data Z. The objective function of AE is shown asJ(W,b)=1mi=1m(12ZiXi2)

The classical AE can be used to reconstruct the input data through minimizing the average reconstruction error between the input data X and the reconstructed data Z.

In order

Experiments and analysis

The effectiveness of the proposed method was verified using the Cambridge-driving Labeled Video Database (CamVid), using 701 pixel-wise semantic segmentation images. We extracted 600 samples and mirrored them horizontally to obtain 1200 training samples. Next, 100 samples of the rest images were extracted and mirrored horizontally, to obtain 200 testing samples. To verify the effectiveness of the algorithm, the image was sampled from 540 × 720 to 60 × 80. The three models were trained

Conclusion

With the development of the unmanned vehicles technology, the road environment perception has become an active research field. The method based on machine vision which uses image process and pattern recognition to segment the road environments semantically is especially the focus of the research. The semantic segmentation methods of the road environment were discussed in this paper and there were a lot of problems in the existing methods such as CNN and FCN. For example, the model of the FCN

Acknowledgments

This work is supported by the National Key Research and Development Program of China(No. 2016YFC0802904) and the National Natural Science Foundation of China (No.61472444, 61671470). This work is also supported by the Henan province Science and Technology Research Project of China (No.172102210370) and the Key scientific research projects in Henan colleges and Universities of China (No.17B520019).

Xiaona Song is a Ph.D. student at Army Engineering University of PLA. Her research interests include machine learning, object recognition, image content analysis and classification.

References (21)

  • J. Liu et al.

    Detection-guided deconvolutional network for hierarchical feature learning

    Pattern Recognit

    (2015)
  • H. Lu et al.

    Brain intelligence: go beyond artificial intelligence

    Mobile Netw Appl

    (2017)
  • M. Teichmann, M. Webe, M. Zoellner, R. Cipolla, R. Urtasun. MultiNet: real-time joint semantic reasoning for autonomous...
  • R. Mohan

    Deep deconvolutional networks for scene parsing

    Comput Sci

    (2014)
  • R. Girshick et al.

    Rich feature hierarchies for accurate object detection and semantic segmentation

  • J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. CVPR...
  • Y. Nakayama et al.

    Environment recognition for navigation of autonomous wheelchair from a video image

  • J.M. Alvarez et al.

    Road scene segmentation from a single image

  • C.A. Brust et al.

    Convolutional patch networks with spatial prior for road detection and urban scene understanding

  • G.L. Oliveira et al.

    Efficient deep models for monocular road segmentation

There are more references available in the full text version of this article.

Cited by (13)

View all citing articles on Scopus

Xiaona Song is a Ph.D. student at Army Engineering University of PLA. Her research interests include machine learning, object recognition, image content analysis and classification.

Ting Rui received his M.S. degree and Ph.D. from PLA University of Science and Technology, Nanjing, China, in 1998 and 2001, respectively. Ting Rui is a Professor of the Information Technology Department in Army Engineering University of PLA. His research interests include computer vision, machine learning, multimedia, and video surveillance. He has authored and co-authored more than 80 scientific articles.

Sai Zhang is pursuing his M.S. degree at Army Engineering University of PLA. His research interests include computer vision and machine learning.

Jianchao Fei is pursuing his Ph.D. at Army Engineering University of PLA. His research interests include computer vision and machine learning.

Xinqing Wang is a Professor of Army Engineering University of PLA. His research interests include unmanned vehicles, deep learning and machine learning.

Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. Huimin Lu.

View full text