Maximum a posteriori estimation and filtering algorithm for numerical label noise

Jiang, Gaoxia; Li, Zhengying; Wang, Wenjian

doi:10.1007/s10489-024-05648-y

Maximum a posteriori estimation and filtering algorithm for numerical label noise

Published: 05 July 2024

Volume 54, pages 8841–8855, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

210 Accesses
Explore all metrics

Abstract

Data quality, especially label quality, may have a significant impact on the prediction accuracy in supervised learning. Training on datasets with label noise causes a degradation in performance and a reduction in prediction accuracy. To overcome the numerical label noise problem in regression, we estimate the posterior distribution of the true label through the Gaussian mixture model (GMM). Then, label noise estimation is proposed by integrating the idea of maximum a posteriori (MAP) estimation with the posterior distribution. Besides, a noise filtering algorithm with MAP estimation (MAPNF) is designed by combining the optimal sample selection framework with the estimator. Extensive experiments are carried out on benchmark datasets and an age estimation dataset to verify the effectiveness of MAPNF. The results on benchmark datasets show that MAPNF outperforms other latest filtering algorithms in improving the generalization performance of different regression models, including noise-sensitive models and noise-robust models. The model error can be reduced by 29.7% to 69.6%. Our proposed approach can also identify erroneous labels in an age estimation dataset (total of 18424). The model trained on the filtered dataset (19% of the data removed) achieves a reduced test error on the dataset by at least 2.68%. The results demonstrate a less-is-better effect by achieving lower prediction errors with fewer high-quality samples. It can be concluded that MAPNF can effectively identify label noise and optimize the data quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

An interpretable sample selection framework against numerical label noise

Article 27 January 2025

Neighborhood Collective Estimation for Noisy Label Identification and Correction

Small-Vote Sample Selection for Label-Noise Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

Data will be made available on request.

References

Guo K, Cao R, Kui X et al (2019) LCC: towards efficient label completion and correction for supervised medical image learning in smart diagnosis. J Netw Comput Appl 133:51–59
Article Google Scholar
Yang B, Wu J, Ikeda K et al (2022) Face-mask-aware facial expression recognition based on face parsing and vision transformer. Pattern Recognit Lett 164:173–182
Article Google Scholar
Hossain MR, Hoque MM, Siddique N (2023) Leveraging the meta-embedding for text classification in a resource-constrained language. Eng Appl Artif Intell 124:106586
Article Google Scholar
Mallikarjuna C, Sivanesan S (2022) Question classification using limited labelled data. Inf Process & Manag 59(6):103094
Article Google Scholar
Ma B, Li C, Jiang L (2022) A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning. Appl Intell 52:17784–17796
Article Google Scholar
Wang K, Yang M, Yang W et al (2022) Dual-scale correlation analysis for robust multi-label classification. Appl Intell 52:16382–16397
Article Google Scholar
Sabzevari M, Martínez-Muñoz G, Suárez A (2018) Vote-boosting ensembles. Pattern Recognit 83:119–133
Article Google Scholar
Liu Y, Chen H, Li T et al (2023) A robust graph based multi-label feature selection considering feature-label dependency. Appl Intell 53(1):837–863
Article Google Scholar
Shi J, Cao Z, Wu J (2022) Meta joint optimization: a holistic framework for noisy-labeled visual recognition. Appl Intell 52(1):875–888
Article Google Scholar
Karimi D, Dou H, Warfield SK et al (2020) Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med Image Anal 65:101759
Article Google Scholar
Cano JR, Luengo J, García S (2019) Label noise filtering techniques to improve monotonic classification. Neurocomputing 353:83–95
Article Google Scholar
Kadhim AI (2019) Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 52(1):273–292
Article MathSciNet Google Scholar
Tsai CF, Lin WC, Hu YH et al (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54
Article Google Scholar
Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl-Based Syst 184:104895
Article Google Scholar
Zhang A, Yu H, Huan Z et al (2022) SMOTE-RkNN: a hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors. Inf Sci 595:70–88
Article Google Scholar
Vuttipittayamongkol P, Elyan E, Petrovski A (2021) On the class overlap problem in imbalanced data classification. Knowl-Based Syst 212:106631
Article Google Scholar
Algan G, Ulusoy I (2021) Image classification with deep learning in the presence of noisy labels: a survey. Knowl-Based Syst 215:106771
Jiang GX, Fan RX, Wang WJ (2020) Label noise filtering via perception of nearest neighbors. Pattern Recognit Artif Intell 33(6):518–529
Google Scholar
Blachnik M, Kordos M (2020) Comparison of instance selection and construction methods with various classifiers. Appl Sci 10(11):3933
Article Google Scholar
Kordos M, Blachnik M, Scherer R (2022) Fuzzy clustering decomposition of genetic algorithm-based instance selection for regression problems. Inf Sci 587:23–40
Article Google Scholar
Li C, Mao Z (2023) A label noise filtering method for regression based on adaptive threshold and noise score. Expert Syst Appl 228:120422
Article Google Scholar
Yao J, Wang Z, Wang L et al (2022) Novel hybrid ensemble credit scoring model with stacking-based noise detection and weight assignment. Expert Syst Appl 198:116913
Article Google Scholar
Luengo J, Shim SO, Alshomrani S et al (2018) CNC-NOS: Class Noise Cleaning by Ensemble Filtering and Noise Scoring. Knowl-Based Syst 140:27–49
Article Google Scholar
Gong C, Wang Ph, Zg Su (2020) An interactive nonparametric evidential regression algorithm with instance selection. Soft Comput 24:3125–3140
Article Google Scholar
Araújo RdA, Nedjah N, Oliveira AL et al (2019) A deep increasing-decreasing-linear neural network for financial time series prediction. Neurocomputing 347:59–81
Article Google Scholar
Su L, Xiong L, Yang J (2023) Multi-Attn BLS: Multi-head attention mechanism with broad learning system for chaotic time series prediction. Appl Soft Comput 132:109831
Article Google Scholar
Jiang G, Wang W, Qian Y et al (2021) A unified sample selection framework for output noise filtering: an error-bound perspective. J Mach Learn Res 22(18):1–66
MathSciNet Google Scholar
Jiang GX, Wang WJ (2022) A numerical label noise filtering algorithm for regression. J Comput Res Develop 59(8):1639–1652
Google Scholar
Bowman AW, Azzalini A (1997) Applied smoothing techniques for data analysis. Oxford University Press Inc, New York
Book Google Scholar
Dua D, Graff C (2018) UCI machine learning repository. University of California, Irvine, School of information and computer science. http://archive.ics.uci.edu/ml
Chang CC, Lin CJ (2018) LIBSVM data: Classification, regression, and multi-label. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Punyani P, Gupta R, Kumar A (2020) Neural networks for facial age estimation: a survey on recent advances. Artif Intell Rev 53:3299–3347
Article Google Scholar
Agbo-Ajala O, Viriri S (2021) Deep learning approach for facial age classification: a survey of the state-of-the-art. Artif Intell Rev 54:179–213
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62276161, U21A20513, 62076154, 61906113), and the Fundamental Research Program of Shanxi Province (202303021221055).

Author information

Authors and Affiliations

School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, China
Gaoxia Jiang, Zhengying Li & Wenjian Wang
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006, China
Wenjian Wang

Authors

Gaoxia Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Zhengying Li
View author publications
You can also search for this author inPubMed Google Scholar
Wenjian Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Wenjian Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, G., Li, Z. & Wang, W. Maximum a posteriori estimation and filtering algorithm for numerical label noise. Appl Intell 54, 8841–8855 (2024). https://doi.org/10.1007/s10489-024-05648-y

Download citation

Accepted: 25 June 2024
Published: 05 July 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s10489-024-05648-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum a posteriori estimation and filtering algorithm for numerical label noise

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An interpretable sample selection framework against numerical label noise

Neighborhood Collective Estimation for Noisy Label Identification and Correction

Small-Vote Sample Selection for Label-Noise Learning

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now