Noisy Label Detection and Counterfactual Correction | IEEE Journals & Magazine | IEEE Xplore

Noisy Label Detection and Counterfactual Correction


Impact Statement:The accuracy of machine learning models depends on training data quality. Quite unsurprisingly then, it drops dramatically (up to 53% in our experiments) as the percentag...Show More

Abstract:

Data quality is of paramount importance to the training of any machine learning model. Recently proposed approaches for noisy learning focus on detecting noisy labeled da...Show More
Impact Statement:
The accuracy of machine learning models depends on training data quality. Quite unsurprisingly then, it drops dramatically (up to 53% in our experiments) as the percentage of noisy labels increases. The method presented here is shown to maintain high performance even in the presence of highly corrupted data (i.e., 80% noisy labels) by performing joint noisy detection and correction. Specifically, the proposed method increases the accuracy rate of noisy label detection (up to 25%), while achieving a high noisy correction rate (up to 72%). When presented with severe label noise (i.e., 80% noisy labels), the proposed method lowers the noise rate to 52.5%. Beyond improving the accuracy of machine learning models that are trained with noisy label data, this research highlights the need to treat (as opposed to discard) noisy label instances during the training process.

Abstract:

Data quality is of paramount importance to the training of any machine learning model. Recently proposed approaches for noisy learning focus on detecting noisy labeled data instances by using a fixed loss value threshold and excluding detected noisy data instances in subsequent training steps. However, a predefined, fixed loss value threshold may not be optimal for detecting noisy labeled data, whereas excluding the detected noisy data instances can reduce the size of the training set to such an extent that accuracy can be negatively affected. In this article, we propose Noisy label Detection and Counterfactual Correction (NDCC), a new approach that automatically selects a loss value threshold to identify noisy labeled data instances, and uses counterfactual learning to correct the noisy labels. To the best of our knowledge, NDCC is the first work to explore the use of counterfactual learning in the noisy learning domain. We demonstrate the performance of NDCC on several datasets under...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 2, February 2024)
Page(s): 763 - 775
Date of Publication: 01 May 2023
Electronic ISSN: 2691-4581

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.