Skip to main content
Log in

Binary cross-entropy with dynamical clipping

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

We investigate the adverse effect of noisy labels in a training dataset on a neural network’s precision in an image classification task. The importance of this research lies in the fact that most datasets include noisy labels. To reduce the impact of noisy labels, we propose to extend the binary cross-entropy by dynamical clipping, which clips all samples’ loss values in a mini-batch by a clipping constant. Such a constant is dynamically determined for every single mini-batch using its statistics. The advantage is the dynamic adaptation to any number of noisy labels in a training dataset. Thanks to that, the proposed binary cross-entropy with dynamical clipping can be used in any model utilizing cross-entropy or focal loss, including pre-trained models. We prove that the proposed loss function is an \(\alpha \)-calibrated classification loss, implying consistency and robustness to noise misclassification in more general asymmetric problems. We demonstrate our loss function’s usefulness on Fashion MNIST, CIFAR-10, CIFAR-100 datasets, where we heuristically create training data with noisy labels and achieve a nice performance boost compared to the standard binary cross-entropy. These results are also confirmed in the second experiment, where we use a trained model on Google Images to classify the ImageWoof dataset, and the third experiment, where we deal with the WebVision and ANIMAL-10N datasets. We also show that the proposed technique yields significantly better performance than the gradient clipping. Code: gitlab.com/irafm-ai/clipping_cross_entropy

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://paperswithcode.com/sota/image-classification-on-imagenet.

  2. https://https-deeplearning-ai.github.io/data-centric-comp.

  3. https://gitlab.com/irafm-ai/clipping_cross_entropy.

  4. https://keras.io/api/preprocessing/image/.

  5. https://github.com/fastai/imagenette.

  6. https://paperswithcode.com/sota/learning-with-noisy-labels-on-animal.

References

  1. Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2623–2631

  2. Al-Rawi M, Karatzas D (2018) On the labeling correctness in computer vision datasets. In: IAL@ PKDD/ECML, pp. 1–23

  3. Arazo E, Ortego D, Albert P, O’Connor N, McGuinness K (2019) Unsupervised label noise modeling and loss correction. In: International Conference on Machine Learning, pp. 312–321. PMLR

  4. Zhou BC, Han CY, Td G (2021) Convergence of stochastic gradient descent in deep neural network. Acta Mathematicae Applicatae Sinica, English Ser 37(1):126–136

    Article  MathSciNet  Google Scholar 

  5. Beyer L, Hénaff OJ, Kolesnikov A, Zhai X, Oord Avd (2020) Are we done with imagenet? arXiv preprint arXiv:2006.07159

  6. Brock A, De S, Smith SL, Simonyan K (2021) High-performance large-scale image recognition without normalization. arXiv preprint arXiv:2102.06171

  7. Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678

  8. Chen P, Liao BB, Chen G, Zhang S (2019) Understanding and utilizing deep neural networks trained with noisy labels. In: International Conference on Machine Learning, pp. 1062–1070. PMLR

  9. Chen X, Wu SZ, Hong M (2020) Understanding gradient clipping in private sgd: a geometric perspective. Adv Neural Inform Process Syst 33:13773

    Google Scholar 

  10. Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4470–4478

  11. Chen Y, Shen X, Hu SX, Suykens JA (2021) Boosting co-teaching with compression regularization for label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2688–2692

  12. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee

  13. Ding Y, Wang L, Fan D, Gong B (2018) A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1215–1224. IEEE

  14. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  15. Ekambaram R, Goldgof DB, Hall LO (2017) Finding label noise examples in large scale datasets. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2420–2424. IEEE

  16. Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. arXiv preprint arXiv:1712.09482

  17. Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8536–8546

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  19. Huang J, Qu L, Jia R, Zhao B (2019) O2u-net: a simple noisy label detection approach for deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3326–3334

  20. Jiang L, Zhou Z, Leung T, Li LJ, Fei-Fei L (2018) Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR

  21. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  22. Köhler JM, Autenrieth M, Beluch WH (2019) Uncertainty based detection and relabeling of noisy image labels. In: CVPR Workshops, pp. 33–37

  23. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech rep, Citeseer

    Google Scholar 

  24. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105

  25. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  26. Lee KH, He X, Zhang L, Yang L (2018) Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–5456

  27. Li W, Wang L, Li W, Agustsson E, Van Gool L (2017) Webvision database: visual learning and understanding from web data. arXiv preprint arXiv:1708.02862

  28. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

  29. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer

  30. Liu Y, Guo H (2020) Peer loss functions: learning from noisy labels without knowing noise rates. In: International Conference on Machine Learning, pp. 6226–6236. PMLR

  31. Masnadi-Shirazi H, Vasconcelos N (2008) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1049–1056

  32. Menon AK, Rawat AS, Reddi SJ, Kumar S (2019) Can gradient clipping mitigate label noise? In: International Conference on Learning Representations

  33. Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: Advances in Neural Information Processing Systems, pp. 4694–4703

  34. Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952

  35. Pham H, Xie Q, Dai Z, Le QV (2020) Meta pseudo labels. arXiv preprint arXiv:2003.10580

  36. Pleiss G, Zhang T, Elenberg ER, Weinberger KQ (2020) Identifying mislabeled data using the area under the margin ranking. arXiv preprint arXiv:2001.10528

  37. Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596

  38. Composite binary losses (2010) Reid M.D., W.R. Journal of Machine Learning Research 11:2387–2422

  39. Rippel O, Gelbart M, Adams R (2014) Learning ordered representations with nested dropout. In: International Conference on Machine Learning, pp. 1746–1754. PMLR

  40. Scott C (2012) Calibrated asymmetric surrogate losses. Electron J Statist 6:958–992

    Article  MathSciNet  Google Scholar 

  41. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  42. Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE

  43. Smith LN (2018) A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820

  44. Song H, Kim M, Lee JG (2019) Selfie: Refurbishing unclean samples for robust deep learning. In: International Conference on Machine Learning, pp. 5907–5915. PMLR

  45. Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080

  46. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147. PMLR

  47. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826

  48. Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946

  49. Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: Proceedings of the European Conference on Computer Vision

  50. Van Rooyen B, Menon A, Williamson RC (2015) Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18

  51. Wang X, Hua Y, Kodirov E, Robertson NM (2019) Imae for noise-robust learning: mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv preprint arXiv:1903.12141

  52. Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 322–330

  53. Xie Q, Luong MT, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698

  54. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500

  55. Xu Y, Cao P, Kong Y, Wang Y (2019) L\_dmi: an information-theoretic noise-robust loss function. arXiv preprint arXiv:1909.03388

  56. Yang G, Schoenholz SS (2017) Mean field residual networks: on the edge of chaos. arXiv preprint arXiv:1712.08969

  57. Yang J, Feng L, Chen W, Yan X, Zheng H, Luo P, Zhang W (2020) Webly supervised image classification with self-contained confidence. arXiv preprint arXiv:2008.11894

  58. Zhang Y, Zheng S, Wu P, Goswami M, Chen C (2020) Learning with feature-dependent label noise: a progressive approach. In: International Conference on Learning Representations

  59. Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in neural information processing systems, pp. 8778–8788

  60. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: AAAI, pp. 13001–13008

  61. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710

Download references

Acknowledgements

Stefania Tomasiello acknowledges support from the European Social Fund through the IT Academy program The work is co-supported by ERDF/ESF “Centre for the development of Artificial Intelligence Methods for the Automotive Industry of the region” (No. CZ.02.1.01/ 0.0/0.0/17049/0008414). For more supplementary materials and overview of our lab work see graphicwg.irafm.osu.cz/storage/pr/links.html.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Hurtik.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hurtik, P., Tomasiello, S., Hula, J. et al. Binary cross-entropy with dynamical clipping. Neural Comput & Applic 34, 12029–12041 (2022). https://doi.org/10.1007/s00521-022-07091-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07091-x

Keywords

Navigation