Geometric Constrained Joint Lane Segmentation and Lane Boundary Detection

Zhang, Jie; Xu, Yi; Ni, Bingbing; Duan, Zhenyu

doi:10.1007/978-3-030-01246-5_30

Jie Zhang¹⁷,
Yi Xu¹⁷,
Bingbing Ni¹⁷ &
…
Zhenyu Duan¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11205))

Included in the following conference series:

European Conference on Computer Vision

4914 Accesses
38 Citations

Abstract

Lane detection is playing an indispensable role in advanced driver assistance systems. The existing approaches for lane detection can be categorized as lane area segmentation and lane boundary detection. Most of these methods abandon a great quantity of complementary information, such as geometric priors, when exploiting the lane area and the lane boundaries alternatively. In this paper, we establish a multiple-task learning framework to segment lane areas and detect lane boundaries simultaneously. The main contributions of the proposed framework are highlighted in two facets: (1) We put forward a multiple-task learning framework with mutually interlinked sub-structures between lane segmentation and lane boundary detection to improve overall performance. (2) A novel loss function is proposed with two geometric constraints considered, as assumed that the lane boundary is predicted as the outer contour of the lane area while the lane area is predicted as the area integration result within the lane boundary lines. With an end-to-end training process, these improvements extremely enhance the robustness and accuracy of our approach on several metrics. The proposed framework is evaluated on KITTI dataset, CULane dataset and RVD dataset. Compared with the state of the arts, our approach achieves the best performance on the metrics and a robust detection in varied traffic scenes.

You have full access to this open access chapter, Download conference paper PDF

RCLane: Relay Chain Prediction for Lane Detection

Polynomial Regression Network for Variable-Number Lane Detection

Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection

Keywords

1 Introduction

Trajectory planning [20] of autonomous driving is an challenging task in the field of computer vision. Lane area segmentation is a crucial issue in trajectory planning which classifies different lanes and generates definite driving areas.

Texture-based approaches are proposed in the early works. Texture features from different color spaces are aggregated to enhance the robustness of lane detection [3, 22, 24]. Generally, there are homogeneous regions in the lane area, so it is difficult to establish distinguishable feature descriptors for these regions.

Without adequate texture information to rely on, the supplement of lane boundaries is critical to precise detection of lane area. Traditional approaches extract boundary information to tackle the problem of homogeneous regions in lane areas, where the high-pass filters are dominantly used [4, 7, 12, 18]. With boundary information extracted, final lane areas are sketched by lane boundaries [9]. However, boundary information is frequently missing due to occlusions and dashing markers. Severely, surrounding shadows and vehicles often introduce irrelevant information that withers the detection performance. In recent years, fully convolutional neural networks (FCN) are put forward [6, 10, 23, 25], where contextual features are self-learned with an encoder-decoder structure to enhance lane segmentation. FCN has achieved much better performance than traditional approaches [3, 22, 24]. Under ill-defined conditions, it is hard for FCN to effectively provide a significant, unique representation of lane areas.

In fact, there exists a geometric relationship between a lane area and its boundaries: lane areas always lie between lane boundaries, while a lane boundary consists of the outer contour of the lane area. To make use of this relationship, some prior models are advanced through a sequential processing strategy [2, 30, 35]. Some models extract lane boundaries to greatly reduce the search range for lane detection [8]. Given the lane boundaries, segmentation algorithms are applied to the bounded regions to refine lane labels. Reversely, some models segment lane areas first. Boundary information is then extracted by high-pass filters around segmented lane areas with a tolerance range [2, 35]. However, these models treated lane area segmentation and lane boundary detection as two separate subprocesses, which share no information with each other, leading to a loss of geometrical dependency. Moreover, extremely poor performance could happen when the first sub-process is severely interfered by outliers.

To address abovementioned problems, we are motivated to provide a unified solution of lane area segmentation and lane boundary detection with a multi-task learning framework. Rather than simply fuse outputs of different tasks at final decision stage, we apply one shared encoder to the neural network for integrating complementary information of two tasks. Due to the lack of priors and loss of information, single-task approaches cannot achieve desired performance. Additionally, a novel structure called link encoder is appended, which can implicitly extract interrelationship information between lane area and its boundaries. Therefore, the flowed information between two tasks refines the performance for each other. At the classifier layer, the result is generated by the superposition of such a refinement over the original output. As shown in Fig. 1, segmentation is interfered severely by outliers and it is unable to recover from error segmentation. In our approach, when lane area segmentation fails to segment hard examples, the other task of lane boundary detection with a good performance could provide valid features and recover the segmentation task from failure. It is the same vice versa. Furthermore, two geometric prior constraints are proposed in our model to regularize the problem of lane detection into well-posed formulation. Given lane boundary detection, we predict the lane area as the area integration result with the lane boundaries as the upper bound and the lower bound. Given extracted lane areas, lane boundaries are predicted as the outer contours. The differences between the prediction results and the ground truth are then formulated as two loss terms to emphasize geometric priors during model training. The geometric constraints are differentiable due to pixel-wise convolution. The overall network is capable of joint-training as an end-to-end implementation. Experimental results on benchmarks demonstrate that our approach outperforms other state-of-art approaches under several metrics.

2 Related Work

Traditional methods of lane segmentation dominantly utilize pixel-level and super-pixel level features. Among the pixel-level features [2, 3, 15, 33], color features robust to shadow interference are extracted for lane segmentation using region growing [3]. Texture features from varied color space are described by histogram peaks and temporal filter responses, then lane areas are generated within flat regions [15]. Alon et al. compute dominant edges based on pixel-wise gradient map and form them as the lane boundaries [33]. With the guidance of these boundaries, a color-based region growing is followed to generate lane areas. Valente et al. [2] extract pixel-level color features and classify them into lane areas first. Then boundaries are introduced to constrain the refinement of lane areas. To handle outlier situations, super-pixel features are preferably used. Li et al. [22] train an AdaBoost classifier with super-pixel color features extracted by Orthogonal Matching Pursuit algorithm to enhance lane detection.

With the development of deep learning methods, semantic segmentation has got impressive results [5, 23, 27]. Several modified single-task networks are proposed, focusing on embedding extra knowledge into networks [6, 10]. Gao et al. [10] advance contour priors and location priors to segment lane region elaborately. Due to the lack of priors and loss of information, multi-task approaches are proposed to tackle this problem by introducing more surrounding constraints. Oliveira et al. [25] train a joint classification, detection and semantic segmentation network with a shared encoder. With a joint-training manner, the final lane area is generated by better features containing more surrounding details. However, inherent connections are not mined between multiple outputs, making it difficult to explain the mechanism behind the network structure.

Plenty of works utilize high-pass filters for boundary detection [4, 12]. Haloi et al. [12] combine responses from 2nd- and 4th-order filters to obtain lane boundary features with adaptive thresholding. Aly et al. filter inverse perspective mapping (IPM) image with 2D Gaussian kernels [4]. Kortli et al. [18] and Bergasa et al. [7] detect edges with a canny kernel and the Otsu method.

However, the precision of lane boundary detection suffers from illumination variations, noises and cluttered background. Some deep learning techniques are developed to improve the performance. Overfeat detector is proposed to integrate recognition, localization and detection using convolution [29]. Later, Huval et al. [14] modify Overfeat structure to handle lane boundary detection and vehicle detection at the same time. However, these methods are sensitive to surrounding objects. So, Li et al. [21] feed the features extracted by convolution layers into a recurrent neuron layer as a sequence, where the spatial continuity constraint is used to regularize the result of lane detection. Kim et al. [16] propose a simpler but effective network structure. They finetune the network with a pretrained VGG network to generate detection results. These approaches require extra data for sufficient pretraining, and they are sensitive to cluttered background.

3 Methodology

3.1 Overview of Multi-task Framework

For human perception, lane area is always inseparable from the judgment of its boundaries. However, the existing lane detection methods dominantly rely on a single-task network to independently train lane segmentation and lane boundary detection, completely ignoring the inherent geometric constraints between two tasks. Simple multi-task networks like MultiNet [32] are developed to combine tasks together, such as classification, detection and segmentation, without investigating the inherent relationship between tasks. There are two major problems in these state-of-the-art methods: the loss of interrelationship between multiple training tasks and the lack of geometric priors for well-posed formulation. As a result, they are always stuck in detection failure on hard examples.

Inspired by this observation, we propose a multi-task learning framework to provide a unified solution of lane segmentation and lane boundary detection. The network architecture is illustrated in Fig. 2, which consists of an encoder network and a decoder network, as a kind of fully convolutional network (FCN). Rather than conducting segmentation and detection with two separate networks, the proposed framework conducts two tasks with one shared encoder network and two separated decoders. To classify pixels into binary labels, each decoder is followed by a sigmoid classifier. Specifically, each decoder is connected to a link encoder to stream complementary information between two tasks and thus the features of two decoders could be reciprocally refined.

To achieve well-posed formulation of the multi-task learning framework, we propose a novel loss function by introducing the inherent geometric priors between tasks, as assumed that the lane boundary is predicted as the outer contour of the lane area while the lane area is predicted as the area integration result within the lane boundaries. These geometric priors are critical to find a consistent solution of lane segmentation and lane boundary detection. With an end-to-end training process, these improvements extremely enhance the robustness and accuracy of our approach on several metrics

3.2 Critical Feature Extraction Using a Shared Encoder

To illustrate the activated regions by encoders of lane segmentation and lane boundary detection in a single task learning framework respectively, we visualize each activation map using a heat map. As shown in Fig. 3, the lanes with the similar textures are all emphasized by lane segmentation [19], which incurs the ambiguity problem of lane detection. Also, some background regions are activated. In contrast, the edges in background incur a more severe outlier problem for lane boundary detection.

A shared encoder is proposed to greatly reduce the ambiguity problem and outliers, because the features critical to the performance improvement of both tasks have been emphasized during network training process. As compared in Fig. 3, a clearer lane extraction is obtained while using the shared encoder.

3.3 Complementary Feature Extraction Using an Inter-link Encoder

The shared encoder puts much attention to the features critical to the overall performance. However, some important features for one task might be suppressed once they are not so critical in the other task. For example, lane area segmentation puts much emphasis on the fine-grained texture features for accurate pixel-wise label, while lane boundary detection prefers edge-like features.

An inter-link encoder is put forward to stream complementary information between two tasks and thus the features of two decoders could be reciprocally refined. As shown in Fig. 2, decoders initially receive the feature f and output preliminary results as the inputs of inter-link encoders. Then these decoders generate final results with the refined features, where features f are complemented with inter-link encoders outputs $l_1$ and $l_2$ using simple concatenation. Thus decoders actually do forward pass twice. These refined features enhance the representation of lanes. It is expected to improve the performance of lane segmentation and lane boundary detection in a unified way.

At the first row in Fig. 4, segmentation results are generated without inter-link encoders, where the red regions indicate false positive and false negative results. The bottom row shows absolute difference image before and after refinements from inter-link encoders, where pixels are highlighted using heat map to indicate the absolute difference of segmentation confidence. As emphasized with warm color, with addition information provided by inter-link encoders, it is notable that the originally false positive results are then effectively suppressed, and the originally missed lane pixels are well restored.

3.4 Geometry Constrained Structure Loss

Boundary Aware Loss for Lane Area Segmentation. With cross-entropy set up as a loss function for lane area segmentation, it results in groups of pixels with false labels due to high ambiguity. We introduce a boundary-aware loss for lane area segmentation, assuming that there exists a consistency between the boundaries of segmented lane areas with the ground truth of lane boundaries.

It is noted that a slight deviation of lane boundaries from ground truth could produce an extremely large loss with pixel-wise comparison, as illustrated in Fig. 5. Therefore, we employ IoU loss [26] to measure boundary inconsistency. Accordingly a slight deviation would results in a small IoU loss, which ensures convergence. Let $\mathcal {I}$ denote the set of pixels in the image. For every pixel p in the pixel set $\mathcal {I}$, $y_p$ corresponds to its output probability. And $g=\{0,1\}^{M\times N}$ is the ground truth for the set $\mathcal {I}$. Here, M and N are the height and width of the image. By masking lane segmentation results with the lane area bounded by the ground truth of lane boundaries, our boundary-aware loss $l_{ba}$ can be defined as:

$$\begin{aligned} \mathrm{IoU}=\frac{\sum _{p\in \mathcal {I}}(y_p \times g_p)}{\sum _{p\in \mathcal {I}}(y_p+g_p-y_p \times g_p)}, \end{aligned}$$

(1)

$$\begin{aligned} l_{ba}=1-IoU, \end{aligned}$$

(2)

where $\times $ denotes a pixel-wise multiplication.

Two consistency constraints are imposed to enhance the results of lane segmentation. The cross entropy loss term $l_{lce}$ measures the consistency between the segmented area and its ground truth. Additionally,the loss term $l_{ba}$ measures the consistency between the boundaries of the segmented area and the lane boundary ground truth. Correspondingly, the loss function $l_{lt}$ to measure the total error of lane segmentation is updated as:

$$\begin{aligned} l_{lt}=l_{lce}+\lambda _1 \times l_{ba}, \end{aligned}$$

(3)

where $\lambda _1$ is a constant for balancing two losses. Here we set $\lambda _1$ as 0.5. With only pixel-wise linear calculation involved, $l_{ba}$ is fully differentiable.

Area-aware Loss for Lane Boundary. Compared with lane area segmentation, lane boundary detection suffers more from the higher missing rate due to the lower Signal Noise Ratio around boundaries. Motivated by the geometric prior that the lane area is the area integration result with lane boundaries as the upper and lower bounds, an area-aware loss is proposed to measure the difference between the lane area restored from detected lane boundary and lane area ground truth.

Our area-aware loss function is expressed as:

$$\begin{aligned} l_{aa}=\sum _{\mathcal {G}(p)=1}[1-I_r(p)], \end{aligned}$$

(4)

where $\mathcal {G}$ is the pixel-wise label set of lane area ground truth, and $\mathcal {G}(p)=1$ denotes that pixel p belongs to the lane area, and $I_r(p)$ is the calculated probability of pixel p belonging to the restored lane area. The loss function $l_{mt}$ to measure the error of lane boundary detection is defined as

$$\begin{aligned} l_{mt}=l_{mce}+\lambda _2 \times l_{aa}, \end{aligned}$$

(5)

where $l_{mce}$ is the cross-entropy loss measuring the consistency between the detected lane boundary and its ground truth in a complementary way.

Pixels with strong spatial correlation always present similar intensity distribution, therefore we estimate pixel intensities in the restored lane area directly from the closest pixels on lane boundaries. Denote pixels of two boundary lane boundaries as pixel set $\mathcal {B}$. For pixel p between lane boundary ground truth, its probability belonging to lane area is equal to the probability of the closest pixel on lane boundaries, which is computed as:

$$\begin{aligned} I_r(p)=\frac{1}{n}\sum _{j=1}^{n}I_b(v_j), \end{aligned}$$

(6)

$$\begin{aligned} v_j=\mathop {\arg \min }_{m_i}\left[ d(p,m_i)\right] \quad m_i \in \mathcal {B}, \end{aligned}$$

(7)

where d(x, y) is the Euclidean distance between pixels x and y, $I_b(v)$ is the pixel probability in boundary detection map.

Computing the restored lane area from Eqs. 6 and 7, we reform the loss function $l_{aa}$ as:

$$\begin{aligned} l_{aa}=\sum _{\mathcal {G}(p)=1}[1-I_r(p)]=\sum _{\mathcal {G}(p)=1}\left\{ 1-\frac{1}{n}\sum _{j=1}^{n}I_b\{\mathop {\arg \min }_{m_i}\left[ d(p,m_i)\right] \}\right\} \quad m_i\in \mathcal {B}, \end{aligned}$$

(8)

Thus, the integrated loss function is finally formulated as below:

$$\begin{aligned} l=l_{lce}+\lambda _1\times l_{ba}+l_{mce}+\lambda _2\times l_{aa}. \end{aligned}$$

(9)

3.5 Training Details

Our framework is designed to be fully-convolutional and differentiable, thus it could be trained in an end-to-end manner. In this section, we mainly focus on implementation details of training process.

The shared encoder network is initialized by ImageNet [28] with VGG structure [31]. First, we start with training single lane segmentation subnetwork. Secondly, we turn to both subnetworks of lane area segmentation and boundary detection, which is trained without inter-link encoder structure. Finally, our multi-task learning framework is overall retrained with inter-link encoders added.

We concatenate an all-zero tensor to the output of the shared encoder, so that the input feature dimension of decoders remains the same during the iterative training procedure. The overall framework utilizes a batch normalization with the batch size of 3. To avoid overfitting, a dropout layer [13] is adopted with a rate of 0.2. We use the Adam optimizer [17] and pretrain the lane segmentation and lane boundary detection subnetworks with a learning rate of $10^{-3}$. For multi-task framework training process, learning rate is set as $10^{-4}$ until convergence.

4 Experiment

We evaluate our approach on two lane segmentation datasets: KITTI dataset [11], Road-Vehicle dataset (RVD) [8] and CULane dataset [34]. Approaches are coded and evaluated by Tensorflow [1]. Processing time is evaluated on GeForce GTX TITAN with $160*320$ input images.

4.1 Dataset and Evaluation

The KITTI dataset contains 289 training images and 290 testing images, including four subsets of road scenes: urban marked road (UM), urban multiple marked road (UMM), urban unmarked road (UU) and URBAN ROAD (the union of the former three). UM is defined as marked roads with two lanes, while UMM consists of the roads with multiple lanes. UU stands for roads without lane markings and contains one lane only.

The RVD dataset contains more than 10 h of traffic scenarios with multiple sensors under different weather and road conditions, including highway scenes, night scenes and rain scenes. There are over 10,000 manually labeled images in this dataset, which are divided into different scenes with respect to surrounding conditions such as weather and illumination.

The CULane dataset contains 133,235 images extracted from 55 h of traffic videos, which is divided into 88,880 images for training set, 9,675 for validation set and 34,680 for testing set. The test set is split into 8 subsets based on their scenes to demonstrate the robustness of different network structures. This newly released dataset contains lane boundary ground truth only, so we generate lane area ground truth according to bounded areas of lane boundaries.

To evaluate lane segmentation results, we follow the classical pixel-wise segmentation metrics with precision (P), recall (R), F1-measure and IoU score. The metrics with the removal of foreshortening effects are not considerd, because inverse perspective mapping incurs distortions in the ground truth.

For lane boundary detection, we evaluate the performance with a pixel-wise metric. On KITTI dataset, when the distance of detected lane boundary and ground truth is smaller than a threshold (1.5% of image diagonal), the detected lane boundary is regarded as a true positive (TP). While on the CULane dataset, we follow its metric for fair comparison. When the IoU of detected lane boundary and ground truth is larger than 0.5 threshold, the detected boundary is regarded as a true positive (TP) [34]. The same for all the methods for comparison. The final results are evaluated with precision (P), recall (R) and F1-measure.

4.2 Results and Discussion

Our experiments are designed as two parts. First, we compare our lane area segmentation approach with state-of-the-art methods on KITTI, CULane and RVD dataset. Then, to demonstrate the effectiveness of our multi-task structure, lane boundary detection results are evaluated on KITTI and CULane dataset.

Lane Segmentation Results on KITTI. The proposed network is first compared with state-of-the-art approaches (including the SegNet [5], the U-Net [27] and the Up-Conv-Poly [25]) on KITTI dataset. Table 1 shows the overall results. Compared with the baseline approach [5], our methods are superior to it. Joint-training improves performance even without an inter-link encoder and structure loss functions. Benefited from the investigation of inherent inter-relationship between tasks, our multi-task framework obtains a better feature representation than a single-task network and boosts performance further.

Table 1. Lane segmentation results on URBAN_ROAD KITTI dataset. ‘multi-task’, ‘loss’, ‘link’ and ‘link+loss’ denote networks without losses or link structure, with losses only, with link structure only and with both losses and link structure.

Full size table

Note that our approach also outperforms U-Net and Up-Conv-Poly with a gain of 4.1% and 2.0% on IoU score. Both approaches connect encoder layers with decoder layers, which make decoders receive the same scale information from encoders directly. Our multi-task network better captures the dependency of geometric structure of lanes and markers. We also evaluate approaches on several different traffic scenes in Table 2. Results show that our approach is robust to scenario changes.

Table 2. Lane segmentation results on KITTI subsets(UM/UMM/UU)

Full size table

We study the influence of inter-link encoders and structure loss functions in our model. Note that we achieve 86.1% IoU score on our structure-loss-only approach and 86.5% IoU score in our inter-link-only approach. With inter-link encoders and losses added, our final approach (link+loss) achieves 87.4% on IoU and 93.3% on F1-measure. The individually applied structure loss and inter-link encoders play a crucial role in promoting segmentation results. Figure 7 shows some lane area segmentation results obtained by our approach, Up-Conv-Net and U-Net approaches on the KITTI dataset. Our approach effectively handles hard cases such as vanishing boundaries on the first two columns of Fig. 7.

To demonstrate the efficiency of structure loss, our approaches with and without structure loss are evaluated by several examples of single images in Fig. 6. We randomly pick 100 images in KITTI dataset and calculate the IoU score for both approaches. The evaluation results reveal that, the introduction of structure loss structure loss presents higher robustness to disturbance.

Lane Segmentation Results on CULane Dataset. We also evaluate lane segmentation on a newly published CULane dataset. The test set is divided into 8 different scenes: Arrow, Crowded, Curve, Dazzle light, Night, No line, Normal and Shadow. The overall performance is also shown in the last column.

Experiment results are shown in Table 3. It is significant that our method outperforms state-of-the-art methods on all 8 subsets and achieves 90.2% F1-measure and 82.4% IoU on the overall dataset, demonstrating that our method is more robust to handle various traffic scenes than state-of-the-art methods. Also, our method achieves a remarkable improvement on 4 subsets (Arrow, Crowd, Shadow and Normal). This is because our method could capture lane boundary structure from cluttered backgrounds. The well-extracted boundary features provide complementary information to effectively suppress error segmentation.

Table 3. Lane segmentation results on CULane dataset

Full size table

Lane Segmentation Results on RVD Dataset. Furthermore, we evaluate lane segmentation on the RVD dataset. As mentioned in 4.1, this dataset contains three different scenes: Highway, Night and Rainy & Snowy Day. Besides SegNet, U-Net and Up-Conv-Poly, we evaluate the performance of CMA method [8].

Table 4. Lane segmentation results on RVD dataset

Full size table

Overall results are presented in Table 4. Note that CMA only extracts endpoints of two lane boundaries to segment lane area. It enforces a rigid geometric assumption, and thus fails to segment curve lanes. In contrast, the geometric priors introduced in our network are more applicable in various scenes, achieving a significant improvement over all the metrics. Although the performance on Highway is similar due to clear background, we dramatically improve the performance in other scenes, especially in night scenarios. With a better representation of boundary information and geometry constraints, illumination variation and image degradation are well-handled by our approach.

Lane Boundary Detection Results. In addition to lane segmentation, we also evaluate the effectiveness of lane boundary detection on KITTI and CULane dataset. We test several approaches with manually labeled lane boundary ground truth, and present the performance of SegNet [5], SegNet-Ego-Lane [16] and SCNN [34]. We also report our approach only with the cross-entropy loss to emphasize the effectiveness of structure loss function on KITTI dataset. Some lane boundary detection results are also shown in Fig. 7.

Table 5. Lane boundary results on KITTI dataset

Full size table

The results on KITTI dataset are provided in Table 5. As for precision, SegNet yields slightly better than us. However, SegNet has a extremely low recall rate, which indicates that SegNet misses plenty of true positives. Our approach achieves the highest recall rate than other approaches, as well as the F1 measure. The ablation analysis of our approaches indicates that the lane area-aware loss function dramatically improves the performance of lane boundary detection. We gain 3.2% on precision, 5.3% on recall and 4.3% on F1 measure.

Table 6. Lane boundary results on CULane dataset (F1-measure)

Full size table

The results on CULane dataset are shown in Table 6. Note that our method outperforms state-of-the-art methods on 7 subsets. State-of-the-art methods have worse performance mainly due to the image degradation and the missing of lane boundaries. As for the image degradation problem, our subnetwork is able to extracts better boundary features with our area-aware loss. So we dramatically improve performance on Night, Dazzle Light and Shadow subsets, where image quality is affected severely by illumination conditions. For unseen lane boundary problem, it is extremely difficult to extract enough boundary features for boundary detection. Although SCNN introduces context information for boundary detection, various scenes contain extremely different context information, resulting in inaccuracy of lane detection results. Meanwhile, our inter-link structure utilizes more robust geometric relationship between lane areas and boundaries, which constrains each other for better performance.

4.3 Parameter Study

To choose optimal parameters $\lambda _1$ and $\lambda _2$, parameter study is performed on 10-fold cross validation set. The performance of different $\lambda _1$ values is compared by IoU score of lane area segmentation while $\lambda _2$ performance are evaluated by F1-measure of lane boundary detection. The final results are shown in Fig. 8, where both parameters are chosen at regular intervals. Although $\lambda _1$ larger than 0.5 achieves similar IoU score, the experiment shows large $\lambda _1$ is sensitive to hyper-parameters. So $\lambda _1$ is set to 0.5. And $\lambda _2$ is set to 1.0 for the best performance.

5 Conclusion

We propose a multi-task learning framework to jointly address the problems of lane segmentation and lane boundary detection. In this framework, a shared encoder and an inter-link encoder structure are proposed, whose benefits for the boost of detection precision have been proved by experiments. In addition, we come up with two novel loss functions which are established to be applicable to a more general traffic scene. The proposed method is compared with state-of-the-art ones on KITTI and RVD dataset, and shows a leading performance.

References

Abadi, M., et a.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/
Alon, Y., Ferencz, A., Shashua, A.: Off-road path following using region classification and geometric projection constraints. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), New York, NY, USA, pp. 689–696, 17–22 June 2006. https://doi.org/10.1109/CVPR.2006.213
Álvarez, J.M., López, A.M., Baldrich, R.: Shadow resistant road segmentation from a mobile monocular system. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007, Part II. LNCS, vol. 4478, pp. 9–16. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72849-8_2
Chapter Google Scholar
Aly, M.: Real time detection of lane markers in urban streets. CoRR abs/1411.7113 (2014). http://arxiv.org/abs/1411.7113
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2017)
Google Scholar
Barnes, D., Maddern, W., Posner, I.: Find your own way: weakly-supervised segmentation of path proposals for urban autonomy. In: 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, pp. 203–210, May 29–June 3 2017. https://doi.org/10.1109/ICRA.2017.7989025
Bergasa, L.M., Almeria, D., Almazán, J., Torres, J.J.Y., Arroyo, R.: DriveSafe: An app for alerting inattentive drivers and scoring driving behaviors. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, pp. 240–245, 8–11 June 2014. https://doi.org/10.1109/IVS.2014.6856461
Chen, S., Zhang, S., Shang, J., Chen, B., Zheng, N.: Brain inspired cognitive model with attention for self-driving cars. CoRR abs/1702.05596 (2017). http://arxiv.org/abs/1702.05596
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Gao, J., Wang, Q., Yuan, Y.: Embedding structured contour and location prior in siamesed fully convolutional networks for road detection. In: 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, pp. 219–224 29 May–3 June 2017. https://doi.org/10.1109/ICRA.2017.7989027
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013). https://doi.org/10.1177/0278364913491297
Article Google Scholar
Haloi, M., Jayagopi, D.B.: Vehicle local position estimation system. CoRR abs/1503.06648 (2015). http://arxiv.org/abs/1503.06648
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012). http://arxiv.org/abs/1207.0580
Huval, B., et al.: An empirical evaluation of deep learning on highway driving. CoRR abs/1504.01716 (2015). http://arxiv.org/abs/1504.01716
Katramados, I., Crumpler, S., Breckon, T.P.: Real-time traversable surface detection by colour space fusion and temporal analysis. In: Fritz, M., Schiele, B., Piater, J.H. (eds.) ICVS 2009. LNCS, vol. 5815, pp. 265–274. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04667-4_27
Chapter Google Scholar
Kim, J., Park, C.: End-to-end ego lane estimation based on sequential transfer learning for self-driving cars. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, Honolulu, HI, USA, pp. 1194–1202, 21–26 July 2017. https://doi.org/10.1109/CVPRW.2017.158
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Kortli, Y., Marzougui, M., Bouallegue, B., Bose, J.S.C., Rodrigues, P., Atri, M.: A novel illumination-invariant lane detection system. In: 2017 2nd International Conference on Anti-Cyber Crimes (ICACC), pp. 166–171, March 2017
Google Scholar
Kotikalapudi, R., contributors: keras-vis (2017). https://github.com/raghakot/keras-vis
Levinson, J., et al.: Towards fully autonomous driving: systems and algorithms. In: 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, pp. 163–168, 5–9 June 2011
Google Scholar
Li, J., Mei, X., Prokhorov, D.V., Tao, D.: Deep neural network for structural prediction and lane detection in traffic scene. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 690–703 (2017). https://doi.org/10.1109/TNNLS.2016.2522428
Article Google Scholar
Li, J., Jin, L., Fei, S., Ma, J.: Robust urban road image segmentation. In: Proceedings of the 11th World Congress on Intelligent Control and Automation, pp. 2923–2928, June 2014
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038 (2014). http://arxiv.org/abs/1411.4038
Lu, K., Li, J., An, X., He, H.: A hierarchical approach for road detection. In: 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, pp. 517–522, 31 May–7 June 2014. https://doi.org/10.1109/ICRA.2014.6906904
Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016, Daejeon, South Korea, pp. 4885–4891, 9–14 October 2016. https://doi.org/10.1109/IROS.2016.7759717
Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural networks for image segmentation. In: Bebis, G. (ed.) ISVC 2016. LNCS, vol. 10072, pp. 234–244. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_22
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28. http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a. arXiv:1505.04597 [cs.CV]
Chapter Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. CoRR abs/1312.6229 (2013). http://arxiv.org/abs/1312.6229
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. CoRR abs/1312.6034 (2013). http://arxiv.org/abs/1312.6034
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
Teichmann, M., Weber, M., Zöllner, J.M., Cipolla, R., Urtasun, R.: MultiNet: real-time joint semantic reasoning for autonomous driving. CoRR abs/1612.07695 (2016). http://arxiv.org/abs/1612.07695
Valente, M., Stanciulescu, B.: Real-time method for general road segmentation. In: IEEE Intelligent Vehicles Symposium, IV 2017, Los Angeles, CA, USA, pp. 443–447, 11–14 June 2017. https://doi.org/10.1109/IVS.2017.7995758
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: Spatial CNN for traffic scene understanding. In: AAAI Conference on Artificial Intelligence (AAAI), February 2018
Google Scholar
Zhang, G., Zheng, N., Cui, C., Yang, G.: An efficient road detection method in noisy urban environment. In: Intelligent Vehicles Symposium, pp. 556–561 (2009)
Google Scholar

Download references

Acknowledgement

This work was supported by National Science Foundation of China 61671298 and STCSM 17511105400, 18DZ2270700. This work was supported by SJTU-UCLA Joint Center for Machine Perception and Inference. The work was also partially supported by NSFC (U1611461, 61502301, 61521062), China’s Thousand Youth Talents Plan, the 111 project B07022 and MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China.

Author information

Authors and Affiliations

SJTU-UCLA Joint Center for Machine Perception and Inference, Shanghai Jiao Tong University, Shanghai, 200240, China
Jie Zhang, Yi Xu, Bingbing Ni & Zhenyu Duan

Authors

Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Bingbing Ni
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Xu .

Editor information

Editors and Affiliations

Google Research, Zurich, Switzerland
Vittorio Ferrari
Carnegie Mellon University, Pittsburgh, PA, USA
Martial Hebert
Google Research, Zurich, Switzerland
Cristian Sminchisescu
Hebrew University of Jerusalem, Jerusalem, Israel
Yair Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Xu, Y., Ni, B., Duan, Z. (2018). Geometric Constrained Joint Lane Segmentation and Lane Boundary Detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11205. Springer, Cham. https://doi.org/10.1007/978-3-030-01246-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-01246-5_30
Published: 06 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01245-8
Online ISBN: 978-3-030-01246-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Geometric Constrained Joint Lane Segmentation and Lane Boundary Detection

Abstract

Similar content being viewed by others

RCLane: Relay Chain Prediction for Lane Detection

Polynomial Regression Network for Variable-Number Lane Detection

Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection

Keywords

1 Introduction

2 Related Work