Towards an improved label noise proportion estimation in small data: a Bayesian approach

Bootkrajang, Jakramate; Chaijaruwanich, Jeerayut

doi:10.1007/s13042-021-01423-4

Towards an improved label noise proportion estimation in small data: a Bayesian approach

Original Article
Published: 14 September 2021

Volume 13, pages 851–867, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

498 Accesses
1 Altmetric
Explore all metrics

Abstract

Today’s classification task is getting more and more complex. This inevitably renders unanticipated compromises on the quality of data labels. In this paper, we consider learning label noise robust classifiers with focus on the tasks with limited training examples relative to the number of data classes and data dimensionality. In such cases, the existing label noise models tend to inaccurately estimate the noise proportions leading to suboptimal performance. To alleviate the problem, we formulated a regularised label noise model capable of expressing preference on the noise parameters. In addition, we treated the regularisation from a Bayesian perspective so that the regularisation parameters can be inferred from the data through the noise model, thereby facilitating model selection in the presence of label noise. This results in a more data and computationally efficient Bayesian label noise model which could be incorporated into any probabilistic classifier, including those that are known to be data intensive such as deep neural networks. We demonstrated the generality of the proposed method through its integrations with logistic regression, multinomial logistic regression and convolutional neural networks. Extensive empirical evaluations demonstrate that the proposed regularised label noise model can significantly improve, in terms of both the quality of noise parameters estimation and the classification accuracy, upon the existing ones when data is scarce, and is no worse than the existing approaches in the abundance of training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting Sample Weights Based Method for Noisy-Label Detection and Classification

Foster Adaptivity and Balance in Learning with Noisy Labels

Instance-Dependent Noisy-Label Learning with Graphical Model Based Noise-Rate Estimation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availibility Statement

Code which generates the synthetic data is available at https://github.com/jakramate/brNoiseModel.

Notes

We used the default implementation of VGG16, DenseNet121 and MobileNet from Keras library.

References

Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Article Google Scholar
Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Proceedings of the joint IAPR international workshops on statistical techniques in pattern recognition and structural and syntactic pattern recognition, pp 621–630 (2000)
Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: Proceedings of the Asian conference on machine learning, pp 97–112 (2011)
Bootkrajang J, Chaijaruwanich J (2018) Towards instance-dependent label noise-tolerant classification: a probabilistic approach. Pattern Anal Appl 1–17 (2018)
Bootkrajang J, Kabán A (2012) Label-noise robust logistic regression and its applications. In: Proceedings of the Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 143–158
Bootkrajang J, Kabán A (2013) Classification of mislabelled microarrays using robust sparse logistic regression. Bioinformatics 29(7):870–877
Article Google Scholar
Bootkrajang J, Chaijaruwanich J (2020) Towards instance-dependent label noise-tolerant classification: a probabilistic approach. Pattern Anal Appl 23:95–111. https://doi.org/10.1007/s10044-018-0750-z
Brodley CE, Friedl MA (1996) Identifying and eliminating mislabeled training instances. In: Proceedings of the thirteenth national conference on artificial intelligence, vol 1, pp 799–805
Buntine WL (1991) Bayesian backpropagation. Complex Syst 5:603–643
MATH Google Scholar
Cawley GC, Talbot NL (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107
MathSciNet MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Frénay B, Kabán A (2014) A comprehensive introduction to label noise. In: Proceedings of the European symposium on artificial neural networks, computational intelligence and machine learning
Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. In: Proceedings of the AAAI conference on artificial intelligence
Ghosh A, Manwani N, Sastry PS (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107
Article Google Scholar
Goldberger J, Ben-Reuven E (2017) Training deep neural-networks using a noise adaptation layer. In: Proceedings of the 5th international conference on learning representation
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 2261–2269
Jindal I, Nokleby M, Chen X (2016) Learning deep networks from noisy labels with dropout regularization. In: Proceedings of the 16th international conference on data mining, pp 967–972
Karimi D, Dou H, Warfield SK, Gholipour A (2019) Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. arXiv:1912.02911
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto
Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: Proceedings of the international conference on machine learning, pp 306–313
Lee KH, He X, Zhang L, Yang L (2018) Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5447–5456
Li D, Liu Y, Huang D (2020) Development of semi-supervised multiple-output soft-sensors with co-training and tri-training MPLS and MRVM. Chemom Intell Lab Syst 199:103970
Article Google Scholar
Li M, Soltanolkotabi M, Oymak S (2020) Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In: Proceedings of the international conference on artificial intelligence and statistics. PMLR, pp 4313–4324
Liu Y, Pan Y, Huang D (2015) Development of a novel adaptive soft-sensor using variational Bayesian pls with accounting for online identification of key variables. Ind Eng Chem Res 54(1):338–350
Article Google Scholar
Long PM, Servedio RA (2010) Random classification noise defeats all convex potential boosters. Mach Learn 78(3):287–304
Article MathSciNet Google Scholar
Lugosi G (1992) Learning with an unreliable teacher. Pattern Recognit 25:79–87
Article MathSciNet Google Scholar
Manwani N, Sastry PS (2013) Noise tolerance under risk minimization. IEEE Trans Cybern 43(3):1146–1151
Article Google Scholar
Martín-Merino M (2013) A kernel SVM algorithm to detect mislabeled microarrays in human cancer samples. In: 13th IEEE international conference on bioinformatics and bioengineering. IEEE, pp 1–4
Matsuda Y, Hoashi H, Yanai K (2012) Recognition of multiple-food images by detecting candidate regions. In: Proceedings of the IEEE international conference on multimedia and expo, pp 25–30
Menon A, Van Rooyen B, Ong CS, Williamson B (2015) Learning from corrupted binary labels via class-probability estimation. In: Proceedings of the international conference on machine learning, pp 125–134
Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Proceedings of the advances in neural information processing systems, pp 1196–1204
Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1944–1952
Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322
MathSciNet Google Scholar
Reeve H, Kabán A (2019) Fast rates for a KNN classifier robust to unknown asymmetric label noise. In: Proceedings of the international conference on machine learning, pp 5401–5409
Rusiecki A (2020) Standard dropout as remedy for training deep neural networks with label noise. In: Theory and applications of dependable computer systems, pp 534–542
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations
Song H, Kim M, Park D, Lee JG (2019) Prestopping: How does early stopping help generalization against label noise? arXiv:1911.08059
Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. arXiv:1406.2080
Tanno R, Saeedi A, Sankaranarayanan S, Alexander DC, Silberman N (2019) Learning from noisy labels by regularized estimation of annotator confusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11244–11253
Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularization. In: Proceedings of the advances in neural information processing systems, pp 351–359
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H Jr, Marks JAO, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98(20):11462–11467
Zhang R, Chen Z, Zhang S, Song F, Zhang G, Zhou Q, Lei T (2020) Remote sensing image scene classification with noisy label distillation. Remote Sens 12(15). https://doi.org/10.3390/rs12152376
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Proceedings of the advances in neural information processing systems, pp 8778–8788

Download references

Acknowledgements

This research is supported by Thailand Research Fund (TRF) and Office of the Higher Education Commission (grant number MRG6280252). Data Science Research Centre, the Department of Computer Science, Chiang Mai University provides research and computing facilities. Finally, the authors would like to express their gratitude for the constructive feedback from the editor and anonymous reviewers.

Funding

This research is supported by Thailand Research Fund (TRF) and Office of the Higher Education Commission (Grant number MRG6280252).

Author information

Authors and Affiliations

The Department of Computer Science, Chiang Mai University, Chiang Mai, 50200, Thailand
Jakramate Bootkrajang & Jeerayut Chaijaruwanich

Authors

Jakramate Bootkrajang
View author publications
You can also search for this author in PubMed Google Scholar
Jeerayut Chaijaruwanich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakramate Bootkrajang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

Codes are available at https://github.com/jakramate/brNoiseModel.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this section we show how to derive the multiplicative fixed point update equations for label noise parameters. For $\omega _{00}$ and $\omega _{01}$, we first construct a Lagrangian of Eq. (14) imposing a constraint that $\omega _{00} + \omega _{01} = 1$,

$$\begin{aligned} \mathcal {R}(\mathbf{w} , \Omega ) = \sum _{n=1}^N (1-\tilde{y}_n)\log \tilde{P}_n^0 + \tilde{y}_n\log \tilde{P}_n^1 - \sum _{j=0}^1 \sum _{k=0}^1 \alpha _{jk} \omega _{jk} + \lambda (1-\sum _{l=0}^1 \omega _{0l}) \end{aligned}$$

(25)

Taking the derivative of the above w.r.t. $\omega _{00}$ we can arrive at,

$$\begin{aligned} \lambda&= \sum _{n=1}^N \frac{(1-\tilde{y}_n)}{\tilde{P}_n^0}P_n^0 - \alpha _{00} \end{aligned}$$

(26)

Multiplying the above by $\omega _{00}$ yields,

$$\begin{aligned} \omega _{00} \lambda&= \omega _{00} \left( \sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^0}{\tilde{P}_n^0} - \alpha _{00} \right) \end{aligned}$$

(27)

We can derive a similar expression for $\omega _{01}$ which turns out to be,

$$\begin{aligned} \omega _{01} \lambda&= \omega _{01} \left( \sum _{n=1}^N \tilde{y}_n\frac{P_n^0}{\tilde{P}_n^1} - \alpha _{01} \right) \end{aligned}$$

(28)

Summing Eqs. (27) and (28) and using the fact that $\omega _{00} + \omega _{01} = 1$ we can work out the expression for the Lagrange multiplier.

$$\begin{aligned} \lambda&= \omega _{00} \left( \sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^0}{\tilde{P}_n^0} - \alpha _{00} \right) + \omega _{01} \left( \sum _{n=1}^N \tilde{y}_n\frac{P_n^0}{\tilde{P}_n^1} - \alpha _{01} \right) \end{aligned}$$

(29)

Substitute $\lambda$ in Eq. (29) back into Eqs. (27) and (28) gives us the multiplicative update equations for $\omega _{00}$ and $\omega _{01}$

$$\begin{aligned} \omega _{00}&= \frac{\omega _{00} \left( \sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^0}{\tilde{P}_n^0} - \alpha _{00} \right) }{\omega _{00} \left( \sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^0}{\tilde{P}_n^0} - \alpha _{00} \right) + \omega _{01} \left( \sum _{n=1}^N \tilde{y}_n\frac{P_n^0}{\tilde{P}_n^1} - \alpha _{01} \right) } \end{aligned}$$

(30)

$$\begin{aligned} \omega _{01}&= \frac{\omega _{01} \left( \sum _{n=1}^N \tilde{y}_n\frac{P_n^0}{\tilde{P}_n^1} - \alpha _{01} \right) }{\omega _{00} \left( \sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^0}{\tilde{P}_n^0} - \alpha _{00} \right) + \omega _{01} \left( \sum _{n=1}^N (\tilde{y}_n)\frac{P_n^0}{\tilde{P}_n^1} - \alpha _{01} \right) } \end{aligned}$$

(31)

The update equations for $\omega _{10}$ and $\omega _{11}$, which can be derived similarly, turned out to be.

$$\begin{aligned} \omega _{10}&= \frac{\omega _{10} \Big (\sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^1}{\tilde{P}_n^0} - \alpha _{10} \Big )}{\omega _{10} \Big (\sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^1}{\tilde{P}_n^0} - \alpha _{10} \Big ) + \omega _{11} \Big (\sum _{n=1}^N \tilde{y}_n\frac{P_n^1}{\tilde{P}_n^1} - \alpha _{11} \Big )} \end{aligned}$$

(32)

$$\begin{aligned} \omega _{11}&= \frac{\omega _{11} \Big (\sum _{n=1}^N \tilde{y}_n \frac{P_n^1}{\tilde{P}_n^1} - \alpha _{11} \Big )}{\omega _{10} \Big (\sum _{n=1}^N (1-\tilde{y}_n)\frac{P_n^1}{\tilde{P}_n^0} - \alpha _{10} \Big ) + \omega _{11} \Big (\sum _{n=1}^N \tilde{y}_n\frac{P_n^1}{\tilde{P}_n^1} - \alpha _{11} \Big )} \end{aligned}$$

(33)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bootkrajang, J., Chaijaruwanich, J. Towards an improved label noise proportion estimation in small data: a Bayesian approach. Int. J. Mach. Learn. & Cyber. 13, 851–867 (2022). https://doi.org/10.1007/s13042-021-01423-4

Download citation

Received: 03 March 2021
Accepted: 31 August 2021
Published: 14 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s13042-021-01423-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards an improved label noise proportion estimation in small data: a Bayesian approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Revisiting Sample Weights Based Method for Noisy-Label Detection and Classification

Foster Adaptivity and Balance in Learning with Noisy Labels

Instance-Dependent Noisy-Label Learning with Graphical Model Based Noise-Rate Estimation

Explore related subjects

Data Availibility Statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now