Abstract:
Machine learning components are becoming popular for the automotive industry. More and more data sets become available for training machine learning components. All of th...Show MoreMetadata
Abstract:
Machine learning components are becoming popular for the automotive industry. More and more data sets become available for training machine learning components. All of them provide ground truth labels for images. The labeling process is expensive and potentially error-prone. At the same time, label correctness defines the business value of a data set. In this paper, we use N-Version approach to assess the label quality in a data set. The approach combines N state-of-the-art neural networks and aggregates their results in a single verdict using majority voting. We analyze this majority vote against the ground truth label and compute the percentage of disagreeing pixels along with other metrics, enabling the automated and detailed analysis of label quality on data sets. We evaluate our methodology by classifying the BDD100K drivable area data set. The evaluation shows that the approach identifies misclassified scenes or inconsistencies between label semantics for similar scenes.
Date of Conference: 20-23 September 2020
Date Added to IEEE Xplore: 24 December 2020
ISBN Information: