An Automated Method of Identifying Incorrectly Labelled Images Based on the Sequences of Loss Functions of Deep Learning Networks

Zhang, Zhipeng; Shou, Wenhui; Ma, Wengting; Xing, Dongjia; Xu, Qingqing; Xu, Li-Qun; Fan, Qingxia; Xu, Ling

doi:10.1007/978-3-030-67514-1_21

An Automated Method of Identifying Incorrectly Labelled Images Based on the Sequences of Loss Functions of Deep Learning Networks

Zhipeng Zhang²⁰,
Wenhui Shou²⁰,
Wengting Ma²⁰,
Dongjia Xing²⁰,
Qingqing Xu²⁰,
Li-Qun Xu²⁰,
Qingxia Fan²¹ &
…
Ling Xu²¹

Conference paper
First Online: 31 January 2021

1065 Accesses

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 346))

Abstract

Deep learning has been widely applied to medical image analysis tasks. Since the labelled medical images are the foundation of the training, validation, and test of deep learning classification models, the quality of labelling process could directly affect the performance of the models. However, it was estimated that up to ten percent of manually labelled medical images may be incorrectly labelled. In this paper, by utilizing the sequences of loss functions of deep learning classification networks through multiple training epochs, an automated method of identifying incorrectly labelled medical images was proposed. For those identified images, their labels could be further reviewed and updated by senior and experienced physicians, ultimately improving the quality of labelled medical image datasets, as well as the performance of the deep learning models.

Two experiments were carried out to validate the effectiveness of the proposed method, based on a specific fundus image dataset for referable diabetic retinopathy screening. a) In the first experiment, the effectiveness of the method to accurately identify the incorrectly labelled samples from the whole labelled dataset was verified. For a fundus image dataset comprising 10788 samples with gold-standard labels (5394 non-referable diabetic retinopathy samples and 5384 referable diabetic retinopathy samples), the labels of a small part (6%, 648) of the images were intentionally changed to the opposite, in order to simulate the real-world situation. By utilizing the proposed method, 75.31% (488) of the incorrectly labelled samples were successfully identified, and only 4.85% (492) of the correctly labelled samples were wrongly identified as the incorrectly labelled ones. b) In the second experiment, by further reviewing those 980 samples (only 9.1% of the whole dataset) that were identified as incorrectly labelled from the dataset and updating their labels to the correct ones, the deep learning classification model for referable diabetic retinopathy screening was retrained. Tested on an independent test dataset with completely correct labels (700 non-referable diabetic retinopathy samples and 700 referable diabetic retinopathy samples), the best accuracy of the model was increased from 95.93% (trained on the dataset with 6% incorrectly labelled samples) to 96.50% (trained on the revised dataset with 1.5% incorrectly labelled samples), approaching the ideal value 96.57% (trained on the original dataset with 0% incorrectly labelled samples), demonstrating the effectiveness of the proposed method to improve the performance of the deep learning models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Litjens, G., Kooi, T., Bejnordi, B.E., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., et al.: Corrigendum: dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017)
Article Google Scholar
Hoochang, S., et al.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298 (2016)
Article Google Scholar
Gulshan, V., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402 (2016)
Article Google Scholar
Yang, W., et al.: Cascade of multi-scale convolutional neural networks for bone suppression of chest radiographs in gradient domain. Med. Image Anal. 35, 421–433 (2016)
Article Google Scholar
Ghafoorian, M., et al.: Non-uniform patch sampling with deep convolutional neural networks for white matter hyperintensity segmentation. In: IEEE International Symposium on Biomedical Imaging, pp. 1414–1417 (2016)
Google Scholar
Roach, L.: Artificial intelligence: the next step in diagnostics. EyeNet Mag. 77–83 (2017)
Google Scholar
Shen, Y.: Loss functions for binary classification and class probability estimation. University of Pennsylvania (2005)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: IEEE CVPR 2009, pp. 248–255 (2009)
Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.), 28, 100–108 (1979)
Google Scholar
Kaggle Diabetic Retinopathy Detection competition. https://www.kaggle.com/c/diabetic-retinopathy-detection. Accessed 28 Sept 2018
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE CVPR 2016; pp. 2818–2826 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

China Mobile Research Institute, Beijing, 100032, China
Zhipeng Zhang, Wenhui Shou, Wengting Ma, Dongjia Xing, Qingqing Xu & Li-Qun Xu
Shenyang He Eye Hospital, Shenyang, 110034, China
Qingxia Fan & Ling Xu

Authors

Zhipeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhui Shou
View author publications
You can also search for this author in PubMed Google Scholar
Wengting Ma
View author publications
You can also search for this author in PubMed Google Scholar
Dongjia Xing
View author publications
You can also search for this author in PubMed Google Scholar
Qingqing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Li-Qun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qingxia Fan
View author publications
You can also search for this author in PubMed Google Scholar
Ling Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhipeng Zhang .

Editor information

Editors and Affiliations

Northwestern Polytechnical University, Xi’an, China
Bo Li
State Key Laboratory of ISN, Xidian University, Xi'an, China
Changle Li
Northwestern Polytechnical University, Xi'an, China
Mao Yang
School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China
Zhongjiang Yan
Northwest University, Xi'an, China
Jie Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z. et al. (2021). An Automated Method of Identifying Incorrectly Labelled Images Based on the Sequences of Loss Functions of Deep Learning Networks. In: Li, B., Li, C., Yang, M., Yan, Z., Zheng, J. (eds) IoT as a Service. IoTaaS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 346. Springer, Cham. https://doi.org/10.1007/978-3-030-67514-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-67514-1_21
Published: 31 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67513-4
Online ISBN: 978-3-030-67514-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics