Keywords

1 Introduction

Trajectory planning [20] of autonomous driving is an challenging task in the field of computer vision. Lane area segmentation is a crucial issue in trajectory planning which classifies different lanes and generates definite driving areas.

Texture-based approaches are proposed in the early works. Texture features from different color spaces are aggregated to enhance the robustness of lane detection [3, 22, 24]. Generally, there are homogeneous regions in the lane area, so it is difficult to establish distinguishable feature descriptors for these regions.

Without adequate texture information to rely on, the supplement of lane boundaries is critical to precise detection of lane area. Traditional approaches extract boundary information to tackle the problem of homogeneous regions in lane areas, where the high-pass filters are dominantly used [4, 7, 12, 18]. With boundary information extracted, final lane areas are sketched by lane boundaries [9]. However, boundary information is frequently missing due to occlusions and dashing markers. Severely, surrounding shadows and vehicles often introduce irrelevant information that withers the detection performance. In recent years, fully convolutional neural networks (FCN) are put forward [6, 10, 23, 25], where contextual features are self-learned with an encoder-decoder structure to enhance lane segmentation. FCN has achieved much better performance than traditional approaches [3, 22, 24]. Under ill-defined conditions, it is hard for FCN to effectively provide a significant, unique representation of lane areas.

In fact, there exists a geometric relationship between a lane area and its boundaries: lane areas always lie between lane boundaries, while a lane boundary consists of the outer contour of the lane area. To make use of this relationship, some prior models are advanced through a sequential processing strategy [2, 30, 35]. Some models extract lane boundaries to greatly reduce the search range for lane detection [8]. Given the lane boundaries, segmentation algorithms are applied to the bounded regions to refine lane labels. Reversely, some models segment lane areas first. Boundary information is then extracted by high-pass filters around segmented lane areas with a tolerance range [2, 35]. However, these models treated lane area segmentation and lane boundary detection as two separate subprocesses, which share no information with each other, leading to a loss of geometrical dependency. Moreover, extremely poor performance could happen when the first sub-process is severely interfered by outliers.

To address abovementioned problems, we are motivated to provide a unified solution of lane area segmentation and lane boundary detection with a multi-task learning framework. Rather than simply fuse outputs of different tasks at final decision stage, we apply one shared encoder to the neural network for integrating complementary information of two tasks. Due to the lack of priors and loss of information, single-task approaches cannot achieve desired performance. Additionally, a novel structure called link encoder is appended, which can implicitly extract interrelationship information between lane area and its boundaries. Therefore, the flowed information between two tasks refines the performance for each other. At the classifier layer, the result is generated by the superposition of such a refinement over the original output. As shown in Fig. 1, segmentation is interfered severely by outliers and it is unable to recover from error segmentation. In our approach, when lane area segmentation fails to segment hard examples, the other task of lane boundary detection with a good performance could provide valid features and recover the segmentation task from failure. It is the same vice versa. Furthermore, two geometric prior constraints are proposed in our model to regularize the problem of lane detection into well-posed formulation. Given lane boundary detection, we predict the lane area as the area integration result with the lane boundaries as the upper bound and the lower bound. Given extracted lane areas, lane boundaries are predicted as the outer contours. The differences between the prediction results and the ground truth are then formulated as two loss terms to emphasize geometric priors during model training. The geometric constraints are differentiable due to pixel-wise convolution. The overall network is capable of joint-training as an end-to-end implementation. Experimental results on benchmarks demonstrate that our approach outperforms other state-of-art approaches under several metrics.

2 Related Work

Traditional methods of lane segmentation dominantly utilize pixel-level and super-pixel level features. Among the pixel-level features [2, 3, 15, 33], color features robust to shadow interference are extracted for lane segmentation using region growing [3]. Texture features from varied color space are described by histogram peaks and temporal filter responses, then lane areas are generated within flat regions [15]. Alon et al. compute dominant edges based on pixel-wise gradient map and form them as the lane boundaries [33]. With the guidance of these boundaries, a color-based region growing is followed to generate lane areas. Valente et al. [2] extract pixel-level color features and classify them into lane areas first. Then boundaries are introduced to constrain the refinement of lane areas. To handle outlier situations, super-pixel features are preferably used. Li et al. [22] train an AdaBoost classifier with super-pixel color features extracted by Orthogonal Matching Pursuit algorithm to enhance lane detection.

With the development of deep learning methods, semantic segmentation has got impressive results [5, 23, 27]. Several modified single-task networks are proposed, focusing on embedding extra knowledge into networks [6, 10]. Gao et al. [10] advance contour priors and location priors to segment lane region elaborately. Due to the lack of priors and loss of information, multi-task approaches are proposed to tackle this problem by introducing more surrounding constraints. Oliveira et al. [25] train a joint classification, detection and semantic segmentation network with a shared encoder. With a joint-training manner, the final lane area is generated by better features containing more surrounding details. However, inherent connections are not mined between multiple outputs, making it difficult to explain the mechanism behind the network structure.

Plenty of works utilize high-pass filters for boundary detection [4, 12]. Haloi et al. [12] combine responses from 2nd- and 4th-order filters to obtain lane boundary features with adaptive thresholding. Aly et al. filter inverse perspective mapping (IPM) image with 2D Gaussian kernels [4]. Kortli et al. [18] and Bergasa et al. [7] detect edges with a canny kernel and the Otsu method.

However, the precision of lane boundary detection suffers from illumination variations, noises and cluttered background. Some deep learning techniques are developed to improve the performance. Overfeat detector is proposed to integrate recognition, localization and detection using convolution [29]. Later, Huval et al. [14] modify Overfeat structure to handle lane boundary detection and vehicle detection at the same time. However, these methods are sensitive to surrounding objects. So, Li et al. [21] feed the features extracted by convolution layers into a recurrent neuron layer as a sequence, where the spatial continuity constraint is used to regularize the result of lane detection. Kim et al. [16] propose a simpler but effective network structure. They finetune the network with a pretrained VGG network to generate detection results. These approaches require extra data for sufficient pretraining, and they are sensitive to cluttered background.

Fig. 1.
figure 1

Reciprocal constraints with geometric relation. Left: With an image input, traditional methods generate a binary segmentation mask for lane areas (green) or lane boundaries (red), which are severely affected by outlier situations. Right: Our approach introduces a geometric constraint into a multi-task network, which is capable to restore the missing lane area and lane boundaries (blue) mutually. (Color figure online)

3 Methodology

3.1 Overview of Multi-task Framework

For human perception, lane area is always inseparable from the judgment of its boundaries. However, the existing lane detection methods dominantly rely on a single-task network to independently train lane segmentation and lane boundary detection, completely ignoring the inherent geometric constraints between two tasks. Simple multi-task networks like MultiNet [32] are developed to combine tasks together, such as classification, detection and segmentation, without investigating the inherent relationship between tasks. There are two major problems in these state-of-the-art methods: the loss of interrelationship between multiple training tasks and the lack of geometric priors for well-posed formulation. As a result, they are always stuck in detection failure on hard examples.

Inspired by this observation, we propose a multi-task learning framework to provide a unified solution of lane segmentation and lane boundary detection. The network architecture is illustrated in Fig. 2, which consists of an encoder network and a decoder network, as a kind of fully convolutional network (FCN). Rather than conducting segmentation and detection with two separate networks, the proposed framework conducts two tasks with one shared encoder network and two separated decoders. To classify pixels into binary labels, each decoder is followed by a sigmoid classifier. Specifically, each decoder is connected to a link encoder to stream complementary information between two tasks and thus the features of two decoders could be reciprocally refined.

Fig. 2.
figure 2

The proposed multi-task framework. Input images are fed into a shared encoder, which extracts the critical features for lane segmentation and lane boundary detection. Two inter-link encoders connected to each decoder provide complementary information for tasks. The overall performance is enhanced by introducing a structure loss, assuming that the lane boundary is predicted as the outer contour of the lane area while the lane area is predicted as the area integration result within the lane boundaries.

To achieve well-posed formulation of the multi-task learning framework, we propose a novel loss function by introducing the inherent geometric priors between tasks, as assumed that the lane boundary is predicted as the outer contour of the lane area while the lane area is predicted as the area integration result within the lane boundaries. These geometric priors are critical to find a consistent solution of lane segmentation and lane boundary detection. With an end-to-end training process, these improvements extremely enhance the robustness and accuracy of our approach on several metrics

Fig. 3.
figure 3

Activation map of encoders. Activation maps are generated from final convolution layers of encoders. Pixel color indicates task-related saliency with respect to input images. (Color figure online)

3.2 Critical Feature Extraction Using a Shared Encoder

To illustrate the activated regions by encoders of lane segmentation and lane boundary detection in a single task learning framework respectively, we visualize each activation map using a heat map. As shown in Fig. 3, the lanes with the similar textures are all emphasized by lane segmentation [19], which incurs the ambiguity problem of lane detection. Also, some background regions are activated. In contrast, the edges in background incur a more severe outlier problem for lane boundary detection.

A shared encoder is proposed to greatly reduce the ambiguity problem and outliers, because the features critical to the performance improvement of both tasks have been emphasized during network training process. As compared in Fig. 3, a clearer lane extraction is obtained while using the shared encoder.

3.3 Complementary Feature Extraction Using an Inter-link Encoder

The shared encoder puts much attention to the features critical to the overall performance. However, some important features for one task might be suppressed once they are not so critical in the other task. For example, lane area segmentation puts much emphasis on the fine-grained texture features for accurate pixel-wise label, while lane boundary detection prefers edge-like features.

Fig. 4.
figure 4

Top: Original image and FP+FN area (red) generated by the initial network. Bottom: The image of absolute difference before and after the refinements by inter-link encoders. As noted from the heat maps, the initial false positive results are effectively suppressed and the originally missed lane pixels are well restored, as emphasized in warm color. (Color figure online)

An inter-link encoder is put forward to stream complementary information between two tasks and thus the features of two decoders could be reciprocally refined. As shown in Fig. 2, decoders initially receive the feature f and output preliminary results as the inputs of inter-link encoders. Then these decoders generate final results with the refined features, where features f are complemented with inter-link encoders outputs \(l_1\) and \(l_2\) using simple concatenation. Thus decoders actually do forward pass twice. These refined features enhance the representation of lanes. It is expected to improve the performance of lane segmentation and lane boundary detection in a unified way.

At the first row in Fig. 4, segmentation results are generated without inter-link encoders, where the red regions indicate false positive and false negative results. The bottom row shows absolute difference image before and after refinements from inter-link encoders, where pixels are highlighted using heat map to indicate the absolute difference of segmentation confidence. As emphasized with warm color, with addition information provided by inter-link encoders, it is notable that the originally false positive results are then effectively suppressed, and the originally missed lane pixels are well restored.

3.4 Geometry Constrained Structure Loss

Boundary Aware Loss for Lane Area Segmentation. With cross-entropy set up as a loss function for lane area segmentation, it results in groups of pixels with false labels due to high ambiguity. We introduce a boundary-aware loss for lane area segmentation, assuming that there exists a consistency between the boundaries of segmented lane areas with the ground truth of lane boundaries.

Fig. 5.
figure 5

Boundary-aware loss and area-aware loss. Left: An illustration of our boundary-aware loss. The blue area indicates boundary inconsistency. Right: An illustration of our area-aware loss. Different intensities in prediction areas indicate different prediction confidence. The difference between restored area and ground truth indicates the area aware loss. (Color figure online)

It is noted that a slight deviation of lane boundaries from ground truth could produce an extremely large loss with pixel-wise comparison, as illustrated in Fig. 5. Therefore, we employ IoU loss [26] to measure boundary inconsistency. Accordingly a slight deviation would results in a small IoU loss, which ensures convergence. Let \(\mathcal {I}\) denote the set of pixels in the image. For every pixel p in the pixel set \(\mathcal {I}\), \(y_p\) corresponds to its output probability. And \(g=\{0,1\}^{M\times N}\) is the ground truth for the set \(\mathcal {I}\). Here, M and N are the height and width of the image. By masking lane segmentation results with the lane area bounded by the ground truth of lane boundaries, our boundary-aware loss \(l_{ba}\) can be defined as:

$$\begin{aligned} \mathrm{IoU}=\frac{\sum _{p\in \mathcal {I}}(y_p \times g_p)}{\sum _{p\in \mathcal {I}}(y_p+g_p-y_p \times g_p)}, \end{aligned}$$
(1)
$$\begin{aligned} l_{ba}=1-IoU, \end{aligned}$$
(2)

where \(\times \) denotes a pixel-wise multiplication.

Two consistency constraints are imposed to enhance the results of lane segmentation. The cross entropy loss term \(l_{lce}\) measures the consistency between the segmented area and its ground truth. Additionally,the loss term \(l_{ba}\) measures the consistency between the boundaries of the segmented area and the lane boundary ground truth. Correspondingly, the loss function \(l_{lt}\) to measure the total error of lane segmentation is updated as:

$$\begin{aligned} l_{lt}=l_{lce}+\lambda _1 \times l_{ba}, \end{aligned}$$
(3)

where \(\lambda _1\) is a constant for balancing two losses. Here we set \(\lambda _1\) as 0.5. With only pixel-wise linear calculation involved, \(l_{ba}\) is fully differentiable.

Area-aware Loss for Lane Boundary. Compared with lane area segmentation, lane boundary detection suffers more from the higher missing rate due to the lower Signal Noise Ratio around boundaries. Motivated by the geometric prior that the lane area is the area integration result with lane boundaries as the upper and lower bounds, an area-aware loss is proposed to measure the difference between the lane area restored from detected lane boundary and lane area ground truth.

Our area-aware loss function is expressed as:

$$\begin{aligned} l_{aa}=\sum _{\mathcal {G}(p)=1}[1-I_r(p)], \end{aligned}$$
(4)

where \(\mathcal {G}\) is the pixel-wise label set of lane area ground truth, and \(\mathcal {G}(p)=1\) denotes that pixel p belongs to the lane area, and \(I_r(p)\) is the calculated probability of pixel p belonging to the restored lane area. The loss function \(l_{mt}\) to measure the error of lane boundary detection is defined as

$$\begin{aligned} l_{mt}=l_{mce}+\lambda _2 \times l_{aa}, \end{aligned}$$
(5)

where \(l_{mce}\) is the cross-entropy loss measuring the consistency between the detected lane boundary and its ground truth in a complementary way.

Pixels with strong spatial correlation always present similar intensity distribution, therefore we estimate pixel intensities in the restored lane area directly from the closest pixels on lane boundaries. Denote pixels of two boundary lane boundaries as pixel set \(\mathcal {B}\). For pixel p between lane boundary ground truth, its probability belonging to lane area is equal to the probability of the closest pixel on lane boundaries, which is computed as:

$$\begin{aligned} I_r(p)=\frac{1}{n}\sum _{j=1}^{n}I_b(v_j), \end{aligned}$$
(6)
$$\begin{aligned} v_j=\mathop {\arg \min }_{m_i}\left[ d(p,m_i)\right] \quad m_i \in \mathcal {B}, \end{aligned}$$
(7)

where d(xy) is the Euclidean distance between pixels x and y, \(I_b(v)\) is the pixel probability in boundary detection map.

Computing the restored lane area from Eqs. 6 and 7, we reform the loss function \(l_{aa}\) as:

$$\begin{aligned} l_{aa}=\sum _{\mathcal {G}(p)=1}[1-I_r(p)]=\sum _{\mathcal {G}(p)=1}\left\{ 1-\frac{1}{n}\sum _{j=1}^{n}I_b\{\mathop {\arg \min }_{m_i}\left[ d(p,m_i)\right] \}\right\} \quad m_i\in \mathcal {B}, \end{aligned}$$
(8)

Thus, the integrated loss function is finally formulated as below:

$$\begin{aligned} l=l_{lce}+\lambda _1\times l_{ba}+l_{mce}+\lambda _2\times l_{aa}. \end{aligned}$$
(9)

3.5 Training Details

Our framework is designed to be fully-convolutional and differentiable, thus it could be trained in an end-to-end manner. In this section, we mainly focus on implementation details of training process.

The shared encoder network is initialized by ImageNet [28] with VGG structure [31]. First, we start with training single lane segmentation subnetwork. Secondly, we turn to both subnetworks of lane area segmentation and boundary detection, which is trained without inter-link encoder structure. Finally, our multi-task learning framework is overall retrained with inter-link encoders added.

We concatenate an all-zero tensor to the output of the shared encoder, so that the input feature dimension of decoders remains the same during the iterative training procedure. The overall framework utilizes a batch normalization with the batch size of 3. To avoid overfitting, a dropout layer [13] is adopted with a rate of 0.2. We use the Adam optimizer [17] and pretrain the lane segmentation and lane boundary detection subnetworks with a learning rate of \(10^{-3}\). For multi-task framework training process, learning rate is set as \(10^{-4}\) until convergence.

4 Experiment

We evaluate our approach on two lane segmentation datasets: KITTI dataset [11], Road-Vehicle dataset (RVD) [8] and CULane dataset [34]. Approaches are coded and evaluated by Tensorflow [1]. Processing time is evaluated on GeForce GTX TITAN with \(160*320\) input images.

4.1 Dataset and Evaluation

The KITTI dataset contains 289 training images and 290 testing images, including four subsets of road scenes: urban marked road (UM), urban multiple marked road (UMM), urban unmarked road (UU) and URBAN ROAD (the union of the former three). UM is defined as marked roads with two lanes, while UMM consists of the roads with multiple lanes. UU stands for roads without lane markings and contains one lane only.

The RVD dataset contains more than 10 h of traffic scenarios with multiple sensors under different weather and road conditions, including highway scenes, night scenes and rain scenes. There are over 10,000 manually labeled images in this dataset, which are divided into different scenes with respect to surrounding conditions such as weather and illumination.

The CULane dataset contains 133,235 images extracted from 55 h of traffic videos, which is divided into 88,880 images for training set, 9,675 for validation set and 34,680 for testing set. The test set is split into 8 subsets based on their scenes to demonstrate the robustness of different network structures. This newly released dataset contains lane boundary ground truth only, so we generate lane area ground truth according to bounded areas of lane boundaries.

To evaluate lane segmentation results, we follow the classical pixel-wise segmentation metrics with precision (P), recall (R), F1-measure and IoU score. The metrics with the removal of foreshortening effects are not considerd, because inverse perspective mapping incurs distortions in the ground truth.

For lane boundary detection, we evaluate the performance with a pixel-wise metric. On KITTI dataset, when the distance of detected lane boundary and ground truth is smaller than a threshold (1.5% of image diagonal), the detected lane boundary is regarded as a true positive (TP). While on the CULane dataset, we follow its metric for fair comparison. When the IoU of detected lane boundary and ground truth is larger than 0.5 threshold, the detected boundary is regarded as a true positive (TP) [34]. The same for all the methods for comparison. The final results are evaluated with precision (P), recall (R) and F1-measure.

4.2 Results and Discussion

Our experiments are designed as two parts. First, we compare our lane area segmentation approach with state-of-the-art methods on KITTI, CULane and RVD dataset. Then, to demonstrate the effectiveness of our multi-task structure, lane boundary detection results are evaluated on KITTI and CULane dataset.

Lane Segmentation Results on KITTI. The proposed network is first compared with state-of-the-art approaches (including the SegNet [5], the U-Net [27] and the Up-Conv-Poly [25]) on KITTI dataset. Table 1 shows the overall results. Compared with the baseline approach [5], our methods are superior to it. Joint-training improves performance even without an inter-link encoder and structure loss functions. Benefited from the investigation of inherent inter-relationship between tasks, our multi-task framework obtains a better feature representation than a single-task network and boosts performance further.

Table 1. Lane segmentation results on URBAN_ROAD KITTI dataset. ‘multi-task’, ‘loss’, ‘link’ and ‘link+loss’ denote networks without losses or link structure, with losses only, with link structure only and with both losses and link structure.

Note that our approach also outperforms U-Net and Up-Conv-Poly with a gain of 4.1% and 2.0% on IoU score. Both approaches connect encoder layers with decoder layers, which make decoders receive the same scale information from encoders directly. Our multi-task network better captures the dependency of geometric structure of lanes and markers. We also evaluate approaches on several different traffic scenes in Table 2. Results show that our approach is robust to scenario changes.

Table 2. Lane segmentation results on KITTI subsets(UM/UMM/UU)

We study the influence of inter-link encoders and structure loss functions in our model. Note that we achieve 86.1% IoU score on our structure-loss-only approach and 86.5% IoU score in our inter-link-only approach. With inter-link encoders and losses added, our final approach (link+loss) achieves 87.4% on IoU and 93.3% on F1-measure. The individually applied structure loss and inter-link encoders play a crucial role in promoting segmentation results. Figure 7 shows some lane area segmentation results obtained by our approach, Up-Conv-Net and U-Net approaches on the KITTI dataset. Our approach effectively handles hard cases such as vanishing boundaries on the first two columns of Fig. 7.

Fig. 6.
figure 6

The IoU metric evaluated with a single image in KITTI dataset. The blue line is our approach with structure loss functions while the orange line is our approach without structure loss. (Color figure online)

To demonstrate the efficiency of structure loss, our approaches with and without structure loss are evaluated by several examples of single images in Fig. 6. We randomly pick 100 images in KITTI dataset and calculate the IoU score for both approaches. The evaluation results reveal that, the introduction of structure loss structure loss presents higher robustness to disturbance.

Lane Segmentation Results on CULane Dataset. We also evaluate lane segmentation on a newly published CULane dataset. The test set is divided into 8 different scenes: Arrow, Crowded, Curve, Dazzle light, Night, No line, Normal and Shadow. The overall performance is also shown in the last column.

Experiment results are shown in Table 3. It is significant that our method outperforms state-of-the-art methods on all 8 subsets and achieves 90.2% F1-measure and 82.4% IoU on the overall dataset, demonstrating that our method is more robust to handle various traffic scenes than state-of-the-art methods. Also, our method achieves a remarkable improvement on 4 subsets (Arrow, Crowd, Shadow and Normal). This is because our method could capture lane boundary structure from cluttered backgrounds. The well-extracted boundary features provide complementary information to effectively suppress error segmentation.

Table 3. Lane segmentation results on CULane dataset

Lane Segmentation Results on RVD Dataset. Furthermore, we evaluate lane segmentation on the RVD dataset. As mentioned in 4.1, this dataset contains three different scenes: Highway, Night and Rainy & Snowy Day. Besides SegNet, U-Net and Up-Conv-Poly, we evaluate the performance of CMA method [8].

Table 4. Lane segmentation results on RVD dataset

Overall results are presented in Table 4. Note that CMA only extracts endpoints of two lane boundaries to segment lane area. It enforces a rigid geometric assumption, and thus fails to segment curve lanes. In contrast, the geometric priors introduced in our network are more applicable in various scenes, achieving a significant improvement over all the metrics. Although the performance on Highway is similar due to clear background, we dramatically improve the performance in other scenes, especially in night scenarios. With a better representation of boundary information and geometry constraints, illumination variation and image degradation are well-handled by our approach.

Lane Boundary Detection Results. In addition to lane segmentation, we also evaluate the effectiveness of lane boundary detection on KITTI and CULane dataset. We test several approaches with manually labeled lane boundary ground truth, and present the performance of SegNet [5], SegNet-Ego-Lane [16] and SCNN [34]. We also report our approach only with the cross-entropy loss to emphasize the effectiveness of structure loss function on KITTI dataset. Some lane boundary detection results are also shown in Fig. 7.

Fig. 7.
figure 7

Lane area segmentation and lane boundary detection results on KITTI dataset. Green corresponds to true positives, blue to false positives and red to false negatives. (Color figure online)

Table 5. Lane boundary results on KITTI dataset

The results on KITTI dataset are provided in Table 5. As for precision, SegNet yields slightly better than us. However, SegNet has a extremely low recall rate, which indicates that SegNet misses plenty of true positives. Our approach achieves the highest recall rate than other approaches, as well as the F1 measure. The ablation analysis of our approaches indicates that the lane area-aware loss function dramatically improves the performance of lane boundary detection. We gain 3.2% on precision, 5.3% on recall and 4.3% on F1 measure.

Table 6. Lane boundary results on CULane dataset (F1-measure)
Fig. 8.
figure 8

Parameters \(\lambda _1\) and \(\lambda _2\) validation experiments on KITTI

The results on CULane dataset are shown in Table 6. Note that our method outperforms state-of-the-art methods on 7 subsets. State-of-the-art methods have worse performance mainly due to the image degradation and the missing of lane boundaries. As for the image degradation problem, our subnetwork is able to extracts better boundary features with our area-aware loss. So we dramatically improve performance on Night, Dazzle Light and Shadow subsets, where image quality is affected severely by illumination conditions. For unseen lane boundary problem, it is extremely difficult to extract enough boundary features for boundary detection. Although SCNN introduces context information for boundary detection, various scenes contain extremely different context information, resulting in inaccuracy of lane detection results. Meanwhile, our inter-link structure utilizes more robust geometric relationship between lane areas and boundaries, which constrains each other for better performance.

4.3 Parameter Study

To choose optimal parameters \(\lambda _1\) and \(\lambda _2\), parameter study is performed on 10-fold cross validation set. The performance of different \(\lambda _1\) values is compared by IoU score of lane area segmentation while \(\lambda _2\) performance are evaluated by F1-measure of lane boundary detection. The final results are shown in Fig. 8, where both parameters are chosen at regular intervals. Although \(\lambda _1\) larger than 0.5 achieves similar IoU score, the experiment shows large \(\lambda _1\) is sensitive to hyper-parameters. So \(\lambda _1\) is set to 0.5. And \(\lambda _2\) is set to 1.0 for the best performance.

5 Conclusion

We propose a multi-task learning framework to jointly address the problems of lane segmentation and lane boundary detection. In this framework, a shared encoder and an inter-link encoder structure are proposed, whose benefits for the boost of detection precision have been proved by experiments. In addition, we come up with two novel loss functions which are established to be applicable to a more general traffic scene. The proposed method is compared with state-of-the-art ones on KITTI and RVD dataset, and shows a leading performance.