Abstract
We investigate the adverse effect of noisy labels in a training dataset on a neural network’s precision in an image classification task. The importance of this research lies in the fact that most datasets include noisy labels. To reduce the impact of noisy labels, we propose to extend the binary cross-entropy by dynamical clipping, which clips all samples’ loss values in a mini-batch by a clipping constant. Such a constant is dynamically determined for every single mini-batch using its statistics. The advantage is the dynamic adaptation to any number of noisy labels in a training dataset. Thanks to that, the proposed binary cross-entropy with dynamical clipping can be used in any model utilizing cross-entropy or focal loss, including pre-trained models. We prove that the proposed loss function is an \(\alpha \)-calibrated classification loss, implying consistency and robustness to noise misclassification in more general asymmetric problems. We demonstrate our loss function’s usefulness on Fashion MNIST, CIFAR-10, CIFAR-100 datasets, where we heuristically create training data with noisy labels and achieve a nice performance boost compared to the standard binary cross-entropy. These results are also confirmed in the second experiment, where we use a trained model on Google Images to classify the ImageWoof dataset, and the third experiment, where we deal with the WebVision and ANIMAL-10N datasets. We also show that the proposed technique yields significantly better performance than the gradient clipping. Code: gitlab.com/irafm-ai/clipping_cross_entropy
Similar content being viewed by others
Notes
https://paperswithcode.com/sota/image-classification-on-imagenet.
https://https-deeplearning-ai.github.io/data-centric-comp.
https://gitlab.com/irafm-ai/clipping_cross_entropy.
https://paperswithcode.com/sota/learning-with-noisy-labels-on-animal.
References
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2623–2631
Al-Rawi M, Karatzas D (2018) On the labeling correctness in computer vision datasets. In: IAL@ PKDD/ECML, pp. 1–23
Arazo E, Ortego D, Albert P, O’Connor N, McGuinness K (2019) Unsupervised label noise modeling and loss correction. In: International Conference on Machine Learning, pp. 312–321. PMLR
Zhou BC, Han CY, Td G (2021) Convergence of stochastic gradient descent in deep neural network. Acta Mathematicae Applicatae Sinica, English Ser 37(1):126–136
Beyer L, Hénaff OJ, Kolesnikov A, Zhai X, Oord Avd (2020) Are we done with imagenet? arXiv preprint arXiv:2006.07159
Brock A, De S, Smith SL, Simonyan K (2021) High-performance large-scale image recognition without normalization. arXiv preprint arXiv:2102.06171
Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678
Chen P, Liao BB, Chen G, Zhang S (2019) Understanding and utilizing deep neural networks trained with noisy labels. In: International Conference on Machine Learning, pp. 1062–1070. PMLR
Chen X, Wu SZ, Hong M (2020) Understanding gradient clipping in private sgd: a geometric perspective. Adv Neural Inform Process Syst 33:13773
Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4470–4478
Chen Y, Shen X, Hu SX, Suykens JA (2021) Boosting co-teaching with compression regularization for label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2688–2692
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee
Ding Y, Wang L, Fan D, Gong B (2018) A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1215–1224. IEEE
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Ekambaram R, Goldgof DB, Hall LO (2017) Finding label noise examples in large scale datasets. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2420–2424. IEEE
Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. arXiv preprint arXiv:1712.09482
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8536–8546
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Huang J, Qu L, Jia R, Zhao B (2019) O2u-net: a simple noisy label detection approach for deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3326–3334
Jiang L, Zhou Z, Leung T, Li LJ, Fei-Fei L (2018) Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Köhler JM, Autenrieth M, Beluch WH (2019) Uncertainty based detection and relabeling of noisy image labels. In: CVPR Workshops, pp. 33–37
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech rep, Citeseer
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee KH, He X, Zhang L, Yang L (2018) Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–5456
Li W, Wang L, Li W, Agustsson E, Van Gool L (2017) Webvision database: visual learning and understanding from web data. arXiv preprint arXiv:1708.02862
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer
Liu Y, Guo H (2020) Peer loss functions: learning from noisy labels without knowing noise rates. In: International Conference on Machine Learning, pp. 6226–6236. PMLR
Masnadi-Shirazi H, Vasconcelos N (2008) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1049–1056
Menon AK, Rawat AS, Reddi SJ, Kumar S (2019) Can gradient clipping mitigate label noise? In: International Conference on Learning Representations
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: Advances in Neural Information Processing Systems, pp. 4694–4703
Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952
Pham H, Xie Q, Dai Z, Le QV (2020) Meta pseudo labels. arXiv preprint arXiv:2003.10580
Pleiss G, Zhang T, Elenberg ER, Weinberger KQ (2020) Identifying mislabeled data using the area under the margin ranking. arXiv preprint arXiv:2001.10528
Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596
Composite binary losses (2010) Reid M.D., W.R. Journal of Machine Learning Research 11:2387–2422
Rippel O, Gelbart M, Adams R (2014) Learning ordered representations with nested dropout. In: International Conference on Machine Learning, pp. 1746–1754. PMLR
Scott C (2012) Calibrated asymmetric surrogate losses. Electron J Statist 6:958–992
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE
Smith LN (2018) A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820
Song H, Kim M, Lee JG (2019) Selfie: Refurbishing unclean samples for robust deep learning. In: International Conference on Machine Learning, pp. 5907–5915. PMLR
Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147. PMLR
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946
Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: Proceedings of the European Conference on Computer Vision
Van Rooyen B, Menon A, Williamson RC (2015) Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18
Wang X, Hua Y, Kodirov E, Robertson NM (2019) Imae for noise-robust learning: mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv preprint arXiv:1903.12141
Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 322–330
Xie Q, Luong MT, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500
Xu Y, Cao P, Kong Y, Wang Y (2019) L\_dmi: an information-theoretic noise-robust loss function. arXiv preprint arXiv:1909.03388
Yang G, Schoenholz SS (2017) Mean field residual networks: on the edge of chaos. arXiv preprint arXiv:1712.08969
Yang J, Feng L, Chen W, Yan X, Zheng H, Luo P, Zhang W (2020) Webly supervised image classification with self-contained confidence. arXiv preprint arXiv:2008.11894
Zhang Y, Zheng S, Wu P, Goswami M, Chen C (2020) Learning with feature-dependent label noise: a progressive approach. In: International Conference on Learning Representations
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in neural information processing systems, pp. 8778–8788
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: AAAI, pp. 13001–13008
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710
Acknowledgements
Stefania Tomasiello acknowledges support from the European Social Fund through the IT Academy program The work is co-supported by ERDF/ESF “Centre for the development of Artificial Intelligence Methods for the Automotive Industry of the region” (No. CZ.02.1.01/ 0.0/0.0/17049/0008414). For more supplementary materials and overview of our lab work see graphicwg.irafm.osu.cz/storage/pr/links.html.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hurtik, P., Tomasiello, S., Hula, J. et al. Binary cross-entropy with dynamical clipping. Neural Comput & Applic 34, 12029–12041 (2022). https://doi.org/10.1007/s00521-022-07091-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07091-x