A road segmentation method based on the deep auto-encoder with supervised learning☆
Introduction
The road environment perception of unmanned vehicles has always been a popular research area, and the method based on machine vision is one of the important research methods. It utilizes the camera in the vehicle to acquire the road image of driving vehicle, and uses image process and pattern recognition to segment the image and determine the available driving area.
Traditional image segmentations extracts the low-level visual cues to get the segmentation results. These methods are not complex, but the results are often unsatisfactory on the difficult segmentation tasks such as the segmentation of road environment images. With the continued warming of the deep learning, it is proved that the Deep Convolutional Neural Networks (Deep Convolutional Neural Networks) has a great advantage of image feature extraction, but how to use the CNN to segment the image and improve the segmentation performance still need to make a thorough study. Many researchers [1], [2], [3], [4] used DCNN to segment images. But the DCNN-based methods have to generate proposal which are time-consuming. Long et al. [5] introduced the image semantic segmentation, based on Fully Convolutional Networks (FCN) model in 2015. Shortly thereafter, many FCN-based, semantic segmentation methods were proposed, improving the performance of segmentation. However, the FCN model is very complex, and the fine-tuning of million parameters is a lengthy task, requiring several weeks to train the networks on high-performance GPU. More time is required to train the model, which also needs high-performance system hardware. All of the aforementioned problems affect its application to road detection. To solve these problems, we design a supervised deep auto-encoder (AE) model to complete the semantic segmentation of road environment images.
As a classical model of deep learning, auto-encoders (AEs) learn the important features from samples, using unsupervised self-learning, and reconstruct the data information through concise expression. This paper presents the design of the supervised deep AE, which is successfully applied on the semantic segmentation of the road environment.
This paper provides several contributions. First, a new semantic segmentation method is proposed, by adding a supervised layer to a classical AE to learn useful features for image segmentation. Next, because of the supervised layer of the AE, the traditional stacking method is unsuitable. We design the multilayer stacking method for the supervised AE, and build a supervised deep AE that has better feature-extraction performance than single-layer, supervised AE. Finally, the road segmentation performance of the proposed method is proven to be effective and simple, using CamVid. When compared with other methods, the segmentation precision of the road region and the real time are better than others.
The rest of the paper is organized as follows. Section 2 reviews the related work on recent approaches of semantic segmentation. Section 3 introduces our method of road segmentation. In Section 4, we elaborate the model of the supervised Auto-Encoder. Experiments are discussed and evaluated in Section 5. A summary in Section 6 concludes this paper.
Section snippets
Related work
There are several methods for road environment perception [6], [7]. Those using deep learning [1], [8], [9], [10], [11], [12] have been of significant interest. Alvarez et al. [8] used the CNN to learn features of road environment. An image patch was classified by CNN and the probability belonging to the sky, surroundings, and road was given to complete the road segmentation. Brust et al. [9] built the Convolutional Patch Networks incorporating spatial information to realize the pixel-wise road
The proposed method
A supervised layer is added to the traditional AE, and the segmentation image of the road environment is used as the supervised information. Because of the supervised layer, AEs learn features useful for image segmentation, and complete the semantic segmentation of the road environment. The research shows that deep networks learn more abstract and diversified features. As a result, a deep network is built, to extract deep features of the road environment to complete the semantic segmentation.
Supervised single-layer auto-encoder
In 1986, Rumelhart [21] proposed the concept of auto-encoder and applied it to high-dimensional and complex data processing. The goal of the single-layer AE is to minimize the average reconstruction error between the input data X and the reconstructed data Z. The objective function of AE is shown as
The classical AE can be used to reconstruct the input data through minimizing the average reconstruction error between the input data X and the reconstructed data Z.
In order
Experiments and analysis
The effectiveness of the proposed method was verified using the Cambridge-driving Labeled Video Database (CamVid), using 701 pixel-wise semantic segmentation images. We extracted 600 samples and mirrored them horizontally to obtain 1200 training samples. Next, 100 samples of the rest images were extracted and mirrored horizontally, to obtain 200 testing samples. To verify the effectiveness of the algorithm, the image was sampled from 540 × 720 to 60 × 80. The three models were trained
Conclusion
With the development of the unmanned vehicles technology, the road environment perception has become an active research field. The method based on machine vision which uses image process and pattern recognition to segment the road environments semantically is especially the focus of the research. The semantic segmentation methods of the road environment were discussed in this paper and there were a lot of problems in the existing methods such as CNN and FCN. For example, the model of the FCN
Acknowledgments
This work is supported by the National Key Research and Development Program of China(No. 2016YFC0802904) and the National Natural Science Foundation of China (No.61472444, 61671470). This work is also supported by the Henan province Science and Technology Research Project of China (No.172102210370) and the Key scientific research projects in Henan colleges and Universities of China (No.17B520019).
Xiaona Song is a Ph.D. student at Army Engineering University of PLA. Her research interests include machine learning, object recognition, image content analysis and classification.
References (21)
- et al.
Detection-guided deconvolutional network for hierarchical feature learning
Pattern Recognit
(2015) - et al.
Brain intelligence: go beyond artificial intelligence
Mobile Netw Appl
(2017) - M. Teichmann, M. Webe, M. Zoellner, R. Cipolla, R. Urtasun. MultiNet: real-time joint semantic reasoning for autonomous...
Deep deconvolutional networks for scene parsing
Comput Sci
(2014)- et al.
Rich feature hierarchies for accurate object detection and semantic segmentation
- J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. CVPR...
- et al.
Environment recognition for navigation of autonomous wheelchair from a video image
- et al.
Road scene segmentation from a single image
- et al.
Convolutional patch networks with spatial prior for road detection and urban scene understanding
- et al.
Efficient deep models for monocular road segmentation
Cited by (13)
Estimation of energy consumption of electric vehicles using Deep Convolutional Neural Network to reduce driver's range anxiety
2020, ISA TransactionsCitation Excerpt :Recent advances in computing power and fast learning algorithms have made training deep learning architectures feasible. Due to this, deep learning architectures have gained a lot of interest in the automotive sector also and have been successfully applied in numerous problems like image classification, object detection, traffic flow prediction etc [20–24]. Also, the nonlinearity and complexity induced by the combination of all the influencing parameters make the problem of energy consumption estimation more suitable for a deep learning approach, in contrast to other regression techniques.
Towards the design of vision-based intelligent vehicle system: methodologies and challenges
2023, Evolutionary IntelligenceA Semantic Segmentation Method for Road Environment Images Based on Hybrid Convolutional Auto-Encoder
2022, Traitement du SignalDeep Learning Inspired Object Consolidation Approaches Using LiDAR Data for Autonomous Driving: A Review
2022, Archives of Computational Methods in EngineeringAutomated extraction of borehole breakout properties from acoustic televiewer (ATV) data
2022, 56th U.S. Rock Mechanics/Geomechanics SymposiumCNN adversarial attack mitigation using perturbed samples training
2021, Multimedia Tools and Applications
Xiaona Song is a Ph.D. student at Army Engineering University of PLA. Her research interests include machine learning, object recognition, image content analysis and classification.
Ting Rui received his M.S. degree and Ph.D. from PLA University of Science and Technology, Nanjing, China, in 1998 and 2001, respectively. Ting Rui is a Professor of the Information Technology Department in Army Engineering University of PLA. His research interests include computer vision, machine learning, multimedia, and video surveillance. He has authored and co-authored more than 80 scientific articles.
Sai Zhang is pursuing his M.S. degree at Army Engineering University of PLA. His research interests include computer vision and machine learning.
Jianchao Fei is pursuing his Ph.D. at Army Engineering University of PLA. His research interests include computer vision and machine learning.
Xinqing Wang is a Professor of Army Engineering University of PLA. His research interests include unmanned vehicles, deep learning and machine learning.
- ☆
Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. Huimin Lu.