Skip to main content
Log in

Maximum a posteriori estimation and filtering algorithm for numerical label noise

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Data quality, especially label quality, may have a significant impact on the prediction accuracy in supervised learning. Training on datasets with label noise causes a degradation in performance and a reduction in prediction accuracy. To overcome the numerical label noise problem in regression, we estimate the posterior distribution of the true label through the Gaussian mixture model (GMM). Then, label noise estimation is proposed by integrating the idea of maximum a posteriori (MAP) estimation with the posterior distribution. Besides, a noise filtering algorithm with MAP estimation (MAPNF) is designed by combining the optimal sample selection framework with the estimator. Extensive experiments are carried out on benchmark datasets and an age estimation dataset to verify the effectiveness of MAPNF. The results on benchmark datasets show that MAPNF outperforms other latest filtering algorithms in improving the generalization performance of different regression models, including noise-sensitive models and noise-robust models. The model error can be reduced by 29.7% to 69.6%. Our proposed approach can also identify erroneous labels in an age estimation dataset (total of 18424). The model trained on the filtered dataset (19% of the data removed) achieves a reduced test error on the dataset by at least 2.68%. The results demonstrate a less-is-better effect by achieving lower prediction errors with fewer high-quality samples. It can be concluded that MAPNF can effectively identify label noise and optimize the data quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Data will be made available on request.

References

  1. Guo K, Cao R, Kui X et al (2019) LCC: towards efficient label completion and correction for supervised medical image learning in smart diagnosis. J Netw Comput Appl 133:51–59

    Article  Google Scholar 

  2. Yang B, Wu J, Ikeda K et al (2022) Face-mask-aware facial expression recognition based on face parsing and vision transformer. Pattern Recognit Lett 164:173–182

    Article  Google Scholar 

  3. Hossain MR, Hoque MM, Siddique N (2023) Leveraging the meta-embedding for text classification in a resource-constrained language. Eng Appl Artif Intell 124:106586

    Article  Google Scholar 

  4. Mallikarjuna C, Sivanesan S (2022) Question classification using limited labelled data. Inf Process & Manag 59(6):103094

    Article  Google Scholar 

  5. Ma B, Li C, Jiang L (2022) A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning. Appl Intell 52:17784–17796

    Article  Google Scholar 

  6. Wang K, Yang M, Yang W et al (2022) Dual-scale correlation analysis for robust multi-label classification. Appl Intell 52:16382–16397

    Article  Google Scholar 

  7. Sabzevari M, Martínez-Muñoz G, Suárez A (2018) Vote-boosting ensembles. Pattern Recognit 83:119–133

    Article  Google Scholar 

  8. Liu Y, Chen H, Li T et al (2023) A robust graph based multi-label feature selection considering feature-label dependency. Appl Intell 53(1):837–863

    Article  Google Scholar 

  9. Shi J, Cao Z, Wu J (2022) Meta joint optimization: a holistic framework for noisy-labeled visual recognition. Appl Intell 52(1):875–888

    Article  Google Scholar 

  10. Karimi D, Dou H, Warfield SK et al (2020) Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med Image Anal 65:101759

    Article  Google Scholar 

  11. Cano JR, Luengo J, García S (2019) Label noise filtering techniques to improve monotonic classification. Neurocomputing 353:83–95

    Article  Google Scholar 

  12. Kadhim AI (2019) Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 52(1):273–292

    Article  MathSciNet  Google Scholar 

  13. Tsai CF, Lin WC, Hu YH et al (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54

    Article  Google Scholar 

  14. Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl-Based Syst 184:104895

    Article  Google Scholar 

  15. Zhang A, Yu H, Huan Z et al (2022) SMOTE-RkNN: a hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors. Inf Sci 595:70–88

    Article  Google Scholar 

  16. Vuttipittayamongkol P, Elyan E, Petrovski A (2021) On the class overlap problem in imbalanced data classification. Knowl-Based Syst 212:106631

    Article  Google Scholar 

  17. Algan G, Ulusoy I (2021) Image classification with deep learning in the presence of noisy labels: a survey. Knowl-Based Syst 215:106771

  18. Jiang GX, Fan RX, Wang WJ (2020) Label noise filtering via perception of nearest neighbors. Pattern Recognit Artif Intell 33(6):518–529

    Google Scholar 

  19. Blachnik M, Kordos M (2020) Comparison of instance selection and construction methods with various classifiers. Appl Sci 10(11):3933

    Article  Google Scholar 

  20. Kordos M, Blachnik M, Scherer R (2022) Fuzzy clustering decomposition of genetic algorithm-based instance selection for regression problems. Inf Sci 587:23–40

    Article  Google Scholar 

  21. Li C, Mao Z (2023) A label noise filtering method for regression based on adaptive threshold and noise score. Expert Syst Appl 228:120422

    Article  Google Scholar 

  22. Yao J, Wang Z, Wang L et al (2022) Novel hybrid ensemble credit scoring model with stacking-based noise detection and weight assignment. Expert Syst Appl 198:116913

    Article  Google Scholar 

  23. Luengo J, Shim SO, Alshomrani S et al (2018) CNC-NOS: Class Noise Cleaning by Ensemble Filtering and Noise Scoring. Knowl-Based Syst 140:27–49

    Article  Google Scholar 

  24. Gong C, Wang Ph, Zg Su (2020) An interactive nonparametric evidential regression algorithm with instance selection. Soft Comput 24:3125–3140

    Article  Google Scholar 

  25. Araújo RdA, Nedjah N, Oliveira AL et al (2019) A deep increasing-decreasing-linear neural network for financial time series prediction. Neurocomputing 347:59–81

    Article  Google Scholar 

  26. Su L, Xiong L, Yang J (2023) Multi-Attn BLS: Multi-head attention mechanism with broad learning system for chaotic time series prediction. Appl Soft Comput 132:109831

    Article  Google Scholar 

  27. Jiang G, Wang W, Qian Y et al (2021) A unified sample selection framework for output noise filtering: an error-bound perspective. J Mach Learn Res 22(18):1–66

    MathSciNet  Google Scholar 

  28. Jiang GX, Wang WJ (2022) A numerical label noise filtering algorithm for regression. J Comput Res Develop 59(8):1639–1652

    Google Scholar 

  29. Bowman AW, Azzalini A (1997) Applied smoothing techniques for data analysis. Oxford University Press Inc, New York

    Book  Google Scholar 

  30. Dua D, Graff C (2018) UCI machine learning repository. University of California, Irvine, School of information and computer science. http://archive.ics.uci.edu/ml

  31. Chang CC, Lin CJ (2018) LIBSVM data: Classification, regression, and multi-label. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  32. Punyani P, Gupta R, Kumar A (2020) Neural networks for facial age estimation: a survey on recent advances. Artif Intell Rev 53:3299–3347

    Article  Google Scholar 

  33. Agbo-Ajala O, Viriri S (2021) Deep learning approach for facial age classification: a survey of the state-of-the-art. Artif Intell Rev 54:179–213

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62276161, U21A20513, 62076154, 61906113), and the Fundamental Research Program of Shanxi Province (202303021221055).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjian Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, G., Li, Z. & Wang, W. Maximum a posteriori estimation and filtering algorithm for numerical label noise. Appl Intell 54, 8841–8855 (2024). https://doi.org/10.1007/s10489-024-05648-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05648-y

Keywords