On the Impact of Noisy Labels on Supervised Classification Models

Dubel, Rafał; Wijata, Agata M.; Nalepa, Jakub

doi:10.1007/978-3-031-36021-3_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14074))

Included in the following conference series:

International Conference on Computational Science

1117 Accesses
2 Citations

Abstract

The amount of data generated daily grows tremendously in virtually all domains of science and industry, and its efficient storage, processing and analysis pose significant practical challenges nowadays. To automate the process of extracting useful insights from raw data, numerous supervised machine learning algorithms have been researched so far. They benefit from annotated training sets which are fed to the training routine which elaborates a model that is further deployed for a specific task. The process of capturing real-world data may lead to acquring noisy observations, ultimately affecting the models trained from such data. The impact of the label noise is, however, under-researched, and the robustness of classic learners against such noise remains unclear. We tackle this research gap and not only thoroughly investigate the classification capabilities of an array of widely-adopted machine learning models over a variety of contamination scenarios, but also suggest new metrics that could be utilized to quantify such models’ robustness. Our extensive computational experiments shed more light on the impact of training set contamination on the operational behavior of supervised learners.

AMW was supported by the Silesian University of Technology, Faculty of Biomedical Engineering grant (07/010/BK_23/1023). JN was supported by the Silesian University of Technology Rector’s grant (02/080/RGJ22/0026).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Awasthi, P., Balcan, M.F., Haghtalab, N., Urner, R.: Efficient learning of linear separators under bounded noise (2015)
Google Scholar
Balcan, M.F., Haghtalab, N.: Noise in classification (2020)
Google Scholar
Beinecke, J., Heider, D.: Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making. BioData Min. 14(1), 49 (2021)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)
Google Scholar
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6 (2020)
Google Scholar
Dhar, S., Guo, J., Liu, J.J., Tripathi, S., Kurup, U., Shah, M.: A survey of on-device machine learning: an algorithms and learning theory perspective. ACM Trans. Internet Things 2(3), 3450494 (2021)
Google Scholar
Duarte, J.M., Berton, L.: A review of semi-supervised learning for text classification. Artif. Intell. Rev. 56, 1–69 (2023). https://doi.org/10.1007/s10462-023-10393-8
Es-sakali, N., Cherkaoui, M., Mghazli, M.O., Naimi, Z.: Review of predictive maintenance algorithms applied to HVAC systems. Energy Rep. 8, 1003–1012 (2022)
Article Google Scholar
Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE TNNLS 25(5), 845–869 (2014)
MATH Google Scholar
Gupta, S., Gupta, A.: Dealing with noise problem in machine learning data-sets: a systematic review. Procedia Comput. Sci. 161, 466–474 (2019)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE WCCI, pp. 1322–1328 (2008)
Google Scholar
Kawulok, M., Nalepa, J.: Towards robust SVM training from weakly labeled large data sets. In: Proceedings of IAPR ACPR, pp. 464–468 (2015)
Google Scholar
Kotowski, K., Kucharski, D., et al.: Detecting liver cirrhosis in computed tomography scans using clinically-inspired and radiomic features. Comput. Biol. Med. 152, 106378 (2023)
Article Google Scholar
Leung, T., Song, Y., Zhang, J.: Handling label noise in video classification via multiple instance learning. In: Proceedings of IEEE ICCV, pp. 2056–2063 (2011)
Google Scholar
Nalepa, J., Kotowski, K., et al.: Deep learning automates bidimensional and volumetric tumor burden measurement from MRI in pre- and post-operative glioblastoma patients. Comput. Biol. Med. 154, 106603 (2023)
Article Google Scholar
Nalepa, J., Myller, M., Kawulok, M.: Training- and test-time data augmentation for hyperspectral image segmentation. IEEE Geosci. Remote Sens. Lett. 17(2), 292–296 (2020)
Article Google Scholar
Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(85), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2020)
Google Scholar
Pradana, W.A., Adiwijaya, K., Wisesty, U.N.: Implementation of support vector machine for classification of speech marked Hijaiyah letters based on Mel frequency cepstrum coefficient feature extraction. J. Phys. Conf. Ser. 971(1), 012050 (2018)
Google Scholar
Sáez, J.A., Galar, M., Luengo, J., Herrera, F.: Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition. Knowl. Inf. Syst. 38(1), 179–206 (2012). https://doi.org/10.1007/s10115-012-0570-1
Wijata, A.M., Nalepa, J.: Unbiased validation of the algorithms for automatic needle localization in ultrasound-guided breast biopsies. In: Proceedings of IEEE ICIP, pp. 3571–3575 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Automatic Control, Electronics and Computer Science, Department of Algorithmics and Software, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Rafał Dubel & Jakub Nalepa
Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800, Zabrze, Poland
Agata M. Wijata
KP Labs, Konarskiego 18C, 44-100, Gliwice, Poland
Agata M. Wijata & Jakub Nalepa

Authors

Rafał Dubel
View author publications
You can also search for this author in PubMed Google Scholar
Agata M. Wijata
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Nalepa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Agata M. Wijata or Jakub Nalepa .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jiří Mikyška
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 39 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dubel, R., Wijata, A.M., Nalepa, J. (2023). On the Impact of Noisy Labels on Supervised Classification Models. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham. https://doi.org/10.1007/978-3-031-36021-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-36021-3_8
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36020-6
Online ISBN: 978-3-031-36021-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Impact of Noisy Labels on Supervised Classification Models