Abstract
The success of deep learning is mainly dependent on large-scale and accurately labeled datasets. However, real-world datasets are marked with much noise. Directly training on datasets with label noise may lead to the overfitting. Recent research is under the spotlight on how to design algorithms that can learn robust models from noisy datasets, via designing the loss function and integrating the idea of Semi-supervised learning (SSL). This paper proposes a robust algorithm for learning with label noise that does not require additional clean data and an auxiliary model. Specifically, on the one hand, Jensen–Shannon (JS) divergence is introduced as a component of the loss function, which measures the distance between the predicted distribution and the noisy label distribution. It can alleviate the overfitting problem caused by the traditional cross entropy loss theoretically and experimentally. On the other hand, a dynamic sample selection mechanism is also proposed. The dataset is divided into the pseudo-clean labeled subset and the pseudo-noisy labeled subset. Two subsets are treated differently to exploit prior information about the data, and then learned by SSL. The dynamic sample selection is performed with the iteration between two subsets and the model parameters, which are different from the conventional training. Considering the label of the pseudo-clean labeled subset is not correct entirely, they are also refined by linear interpolation. Furthermore, we experimentally show that the integration of SSL helps the model divide two subsets more precise and build decision boundaries more explicit. Extensive experimental results on corrupted data from benchmark datasets and the real-world dataset, including CIFAR-10, CIFAR-100, and Clothing1M, demonstrate that our method is superior to many state-of-the-art approaches for learning with label noise.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alam K, Siddique N, Adeli H (2019) A dynamic ensemble learning algorithm for neural networks. Neural Comput Appl 32:8675–8690
Arazo E, Ortego D, Albert P, OConnor, NE, McGuinness K (2020) Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: 2020 International joint conference on neural networks (IJCNN), pp 1–8
Arpit D, Jastrzebski S, Ballas N, Krueger D, et al. (2017) A closer look at memorization in deep networks. In: Proceedings of the international conference on machine learning (ICML), pp 233–242
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel C (2019) Mixmatch: a holistic approach to semi-supervised learning. In: Neural information processing systems (NIPS), vol 32, pp 5050–5060
Cheng H, Zhu Z, Li X, Gong Y, Sun X, Liu Y (2021) Learning with instance-dependent label noise: a sample sieve approach. In: International conference on learning representations (ICLR)
Ding Y, Wang L, Fan D, Gong B (2018) A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter conference on applications of computer vision (WACV), pp 1215–1224
Dong Z, Qin Y, Zou B, Xu J, Tang YY (2021) Lmsvcr: novel effective method of semi-supervised multi-classification. Neural Comput Appl 1:1–17
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Feng L, Shu S, Lin Z, Lv F, Li L, An B (2020) Can cross entropy loss be robust to label noise. In: Proceedings of the 29th international joint conferences on artificial intelligence (IJCAI), pp 2206–2212
Frenay B, Verleysen M (2014) Classification in the presence of label noise: a surve 25(5):845–869
Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Gong M, Li H, Meng D, Miao Q, Liu J (2018) Decomposition-based evolutionary multiobjective optimization to self-paced learning. IEEE Trans Evol Comput 23(2):288–302
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1
Gui X, Wang W, Tian Z (2021) Towards understanding deep learning from noisy labels with small-loss criterion. In: Proceedings of the 30th international joint conferences on artificial intelligence (IJCAI), pp 2469–2475
Han T, Tu WW, Li YF (2021) Explanation consistency training: facilitating consistency-based semi-supervised learning with interpretability. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 7639–7646
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision, pp 630–645
Hein M, Andriushchenko M, Bitterwolf J (2019) Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 41–50
Hu Z, Yang Z, Hu X, Nevatia R (2021) Simple: similar pseudo label exploitation for semi-supervised classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15099–15108
Kang Z, Pan H, Hoi SCH, Xu Z (2020) Robust graph learning from noisy data. IEEE Trans Cybern 50(1):1833–1843
Kang Z, Peng C, Cheng Q, Liu X, Peng X, Xu Z, Tian L (2021) Structured graph learning for clustering and semi-supervised classification. Pattern Recogn 110:107627
Kong K, Lee J, Kwak Y, Cho YR, Kim SE, Song WJ (2022) Penalty based robust learning with noisy labels. Neurocomputing 489:112–127
Kong K, Lee J, Kwak Y, Kang M, Kim SG, Song WJ (2019) Recycling: semi-supervised learning with noisy labels in deep neural networks. IEEE Access 7:66998–67005
Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks, pp 1097–1105
Kumar M, Packer B, Koller D (2010) Self-paced learning for latent variable models, pp 1189–1197
Li J, Kang Z, Peng C, Chen W (2021) Self-paced two-dimensional PCA. In: Proceedings of the 35th AAAI conference on artificial intelligence, pp 8392–8400
Li J, Socher R, Hoi SC (2019) Dividemix: Learning with noisy labels as semi-supervised learning. In: International conference on learning representations (ICLR)
Li J, Wong Y, Zhao Q, Kankanhalli MS (2019) Learning to learn from noisy labeled data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5051–5059
Liu J, Ren Z, Lu R, Luo X (2021) Gmm discriminant analysis with noisy label for each class. Neural Comput Appl 33:1171–1191
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Malach E, Shalev-Shwartz S (2017) Decoupling” when to update” from” how to update”. In: Neural information processing systems (NIPS), vol 30, pp 960–970
Martin A, Camacho D (2022) Recent advances on effective and efficient deep learning-based solutions. Neural Comput Appl 34:10205–10210
Nguyen T, Mummadi C, Ngo T, Beggel L, Brox T (2020) Self: learning to filter noisy labels with self-ensembling. In: International conference on learning representations (ICLR)
Ouali Y, Hudelot C, Tami M (2020) An overview of deep semi-supervised learning. arXiv preprint arXiv:2006.05278
Ouali Y, Hudelot C, Tami M (2020) Semi-supervised semantic segmentation with cross-consistency training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12674–12684
Reed SE, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2015) Training deep neural networks on noisy labels with bootstrapping. In: International conference on learning representations (ICLR)
Shu J, Xie Q, Yi L, Zhao Q, Zhou S, Xu Z, Meng D (2019) Meta-weight-net: learning an explicit mapping for sample weighting. In: Neural information processing systems (NIPS), vol 32, pp 1917–1928
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition
Song H, Kim M, Lee JG (2019) Selfie: refurbishing unclean samples for robust deep learning. In: International conference on machine learning (ICML), pp 5907–5915
Song H, Kim M, Park D, Lee JG (2020) Learning from noisy labels with deep neural networks: a survey
Sphaier P, Paes A (2022) User intent classification in noisy texts: an investigation on neural language models. Neural Comput Appl
Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Neural information processing systems (NIPS), vol 31, pp 8536–8546
Tanaka D, Ikami D, Yamasaki T, Aizawa K (2018) Joint optimization framework for learning with noisy labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5552–5560
Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 322–330
Xiao T, Xia T, Yang Y, Huang C, Wang X (2015) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2691–2699
Xu Y, Shang L, Ye J, Qian Q, Li YF, Sun B, Li H, Jin R (2021) Dash: Semi-supervised learning with dynamic thresholding. In: International conference on machine learning(ICML), pp 11525–11536
Yi K, Wu J (2019) Probabilistic end-to-end noise correction for learning with noisy labels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7017–7025
Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M (2019) How does disagreement help generalization against label corruption? In: International conference on machine learning (ICML), pp 7164–7173
Yuan W, Guan D, Zhu Q, Ma T (2018) Novel mislabeled training data detection algorithm. Neural Comput Appl 29:673–683
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference 2016
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Understanding deep learning requires rethinking generalization. In: International conference on learning representations (ICLR)
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) mixup: Beyond empirical risk minimization. In: International conference on learning representations (ICLR)
Zhang X, Wu X, Chen F, Zhao L, Lu CT (2020) Self-paced robust learning for leveraging clean labels in noisy data. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 6853–6860
Zhang Z, Sabuncu MR (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Neural information processing systems (NIPS), vol 31, pp 8792–8802
Zhang Z, Zhang H, Arik SO, Lee H, Pfister T (2020) Distilling effective supervision from severe label noise. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 9294–9303
Zhu X, Li Y, Sun J, Chen H, Zhu J (2021) Learning with noisy labels method for unsupervised domain adaptive person re-identification. Neurocomputing 452:78–88
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by National Key R and D Program of China (No. 2021YFA1003004), and the National Natural Science Foundation of China (11971296).
Appendices
Appendices
1.1 Proof of the robustness of JS loss
Lemma 1
For any x with ground truth label \(y_x\) and for any \(i \not = y_x\), we have \( {\mathcal {L}}_{JS}(f(x), e_i) \le \log 2 \). The classifier \(f(\cdot )\) contains the softmax layer.
Proof
For \(f(\cdot )\) is the classifier with softmax layer, so we have \(\sum \nolimits _{c=1}^C f_c(x) = 1\).
Prove Finished. \(\square \)
Theorem 1
In a C-class classification task and any softmax output f, under symmetric noise with noise rate \(\eta < 1 - 1/C\), we have
where \(f^*\) is the global minimizer of \({\mathcal {R}}_{{\mathcal {L}}_{JS}}(f)\).
Proof
For any f,
Since \(f^*\) is the global minimizer of \({\mathcal {R}}_{{\mathcal {L}}_{JS}}(f)\) and \(\eta < 1 - 1/C\),
Prove Finished. \(\square \)
Theorem 2
In a C-class classification task under asymmetric noise when noise rate \(\eta _{y_xc}<1-\eta _{y_x}\) with \(\sum _{c \not = y_x} \eta _{y_xc}= \eta _{y_x}\), for any softmax output f, if \({\mathcal {R}}_{{\mathcal {L}}_{JS}}(f^*) =0\) we have
where \(B=C{\mathbb {E}}(1-\eta _{y_x}) \ge 0\), and \(f^*\) is the global minimizer of \({\mathcal {R}}_{{\mathcal {L}}_{JS}}(f)\).
Proof
Thus,
From our assumption that \({\mathcal {R}}_{{\mathcal {L}}_{JS}}(f^*) =0\), we have \({\mathcal {L}}_{JS}(f^*(x),y_x)=0\). For any \(i \not = y_x\), \({\mathcal {L}}_{JS}(f^*(x),e_i)= \log 2\). And from Lemma 1, we have: \({\mathcal {L}}_{JS}(f(x), e_i) \le \log 2\) for \(i \not = y_x\). So we have:
where \(B=C{\mathbb {E}}(1-\eta _{y_x}) \ge 0\). Prove Finished. \(\square \)
1.2 The value of hyperparameter \(\beta \)
Noise type | Symmetric noise | Asymmetric noise | |||
---|---|---|---|---|---|
Datasets | CIFAR-10 | CIFAR-100 | CIFAR-10/CIFAR-100/Clothing1M | ||
Noise Rate | 0.2/0.4 | 0.6/0.8 | 0.2/0.4 | 0.6/0.8 | 0.1/0.2/0.3/0.4 |
\(\beta \) | 0 | 25 | 25 | 125 | 0 |
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wen, Z., Xu, H. & Ying, S. JSMix: a holistic algorithm for learning with label noise. Neural Comput & Applic 35, 1519–1533 (2023). https://doi.org/10.1007/s00521-022-07770-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07770-9