Elsevier

Knowledge-Based Systems

Volume 175, 1 July 2019, Pages 12-25
Knowledge-Based Systems

Automated identification and grading system of diabetic retinopathy using deep neural networks

https://doi.org/10.1016/j.knosys.2019.03.016Get rights and content

Highlights

  • We established a high-quality labelled dataset of DR medical images.

  • We developed a novel and well-performing DR recognition and classification system.

  • The optimal combination of ensemble model was explored experimentally.

Abstract

Diabetic retinopathy (DR) is a major cause of human vision loss worldwide. Slowing down the progress of the disease requires early screening. However, the clinical diagnosis of DR presents a considerable challenge in low-resource settings where few ophthalmologists are available to care for all patients with diabetes. In this study, an automated DR identification and grading system called DeepDR is proposed. DeepDR directly detects the presence and severity of DR from fundus images via transfer learning and ensemble learning. It comprises a set of state-of-the-art neural networks based on combinations of popular convolutional neural networks and customised standard deep neural networks. The DeepDR system is developed by constructing a high-quality dataset of DR medical images and then labelled by clinical ophthalmologists. We further explore the relationship between the number of ideal component classifiers and the number of class labels, as well as the effects of different combinations of component classifiers on the best integration performance to construct an optimal model. We evaluate the models on the basis of validity and reliability using nine metrics. Results show that the identification model performs best with a sensitivity of 97.5%, a specificity of 97.7% and an area under the curve of 97.7%. Meanwhile, the grading model achieves a sensitivity of 98.1% and a specificity of 98.9%. On the basis of the methods above, DeepDR can detect DR satisfactorily. Experiment results indicate the importance and effectiveness of the ideal number and combinations of component classifiers in relation to model performance. DeepDR provides reproducible and consistent detection results with high sensitivity and specificity instantaneously. Hence, this work provides ophthalmologists with insights into the diagnostic process.

Introduction

Diabetic retinopathy (DR) is a chronic complication of diabetes that damages the retina. Notably, the risk of blindness in patients with DR is 25 times that in healthy people; thus, DR is a leading cause of blindness amongst people aged 20–65 years worldwide [1]. The blindness caused by DR can be prevented through regular fundus examinations [2]. A widespread consensus regarding the benefits and cost-effectiveness of screening for DR has been formed amongst western nations [3], [4], [5]. Most DR studies use the international clinical disease severity scale to classify DR (Table 1) in accordance with the Early Treatment Diabetic Retinopathy Study (ETDRS). Other details are available in the latest American Association of Ophthalmology Clinical Guidelines: Diabetic Retinopathy (2016 Edition, 2017 Updated) [6]. Nowadays, diabetes screening is common in developed countries; patients with diabetes are screened from the general population and transferred to DR specialists. Follow-up examinations are performed by these specialists, and medical intervention is implemented when necessary; therefore, the incidence of severe DR in developed countries is low.

However, the situation in China is not as promising. (1) Currently, ophthalmologist-to-patient ratio is nearly 1:1000 inChina. (2) The rate of DR screening is less than 10%. (3) The vast majority of patients with DR in China do not know the risks associated with DR; hence, they may not realise they have the disease. Furthermore, a large proportion of diabetic patients ignore this serious complication. Patients with DR in China often undergo late invasive treatment and exhibit serious illness, resulting in poor prognosis and high medical expenses. Therefore, the incidence of severe proliferative DR is much higher in China than in developed countries. Moreover, the eventual blindness from DR is irreversible, thereby placing a heavy burden on Chinese families and the society. The automatic screening and grading of DR is a pressing demand because it can help to solve the abovementioned problems.

Deep neural networks (DNNs), also called deep learning by brain-inspired systems [7], [8], can automatically learn numerous abstract high-level features or representations of attribute categories directly from original big data to ascertain a distributed representation of data. A widely used type of DNN is the recurrent neural network, which has shown unprecedented success in academia and industries, including in the areas of speech recognition and machine translation [9], [10], [11]. With regard to the characteristics of the spatial coherence of images, convolutional neural networks (CNNs) are preferred because they are highly specialised in views for image recognition, analysis and classification [12]. In recent years, CNNs have also provided insights into various medical studies. Furthermore, they have abilities rivalling those of medical experts [13], especially when applied to skin cancer [14] and breast cancer classification [15] and lung cancer [16] and retinopathy of prematurity detection [17], [18].

Nevertheless, challenges remain in the use of CNNs in medical studies. First, sufficient real-world medical images, especially those for some specialised diseases, are difficult to obtain. Furthermore, the availability of labelled medical data is typically limited. Second, DR features are so complex that they are likely to cross-effect with various other lesions, and the minute lesions of DR cannot be detected if images quality is poor. According to medical journals, fundus photographs are labelled by a manual operation process, which is prone to subjectivity. Third, high disease-detection accuracy is difficult to attain effectively by training a single model with a limited scale of medical image data and inevitable image noise. Therefore, two important strategies are used in deep learning: transfer learning [19], [20] and ensemble learning [21], [22], [23]. The primary concept of transfer learning is knowledge reuse: the migration of big data to small data fields to resolve the problem of data and knowledge scarcity in small data fields. The major conception of classifier ensemble learning is the combination of a series of component classifiers with different learning preferences to resolve the same predictive problem. These ensemble methods enable increased generalisation that outperforms that of each individual component.

In the current work, we developed an automated system called DeepDR for DR screening via deep learning. DeepDR is a complex process composed of three steps: judgment of the existence of retinal lesion characteristics via screening of fundus photographs, evaluation of the severity of DR if lesion features are detected and reporting of the detection of clinical DR. Thus far, DeepDR has been used in some local hospitals to aid primary hospitals in remote areas or clinical communities that lack retinal specialists or appropriate equipment.

The contributions of this work are as follows.

(1) We establish a high-quality labelled dataset of DR medical images.

(2) We develop a novel DR identification and grading system. The system performs well in comparison with human evaluation metrics.

(3) We explore the relationship between the number of ideal component classifiers and the number of class labels. Furthermore, the effects of different combination methods on the best integration performance are discussed.

In Section 2, we analyse the related works. In Section 3, we detail the dataset. In Section 4, we describe two novel ensemble models for the two respective tasks. In Section 5, we show the experiments on the two tasks. In Section 6, we provide a discussion of the entire study and future work. Finally, In Section 7, we draw the conclusions.

Section snippets

Related works

In the past few decades, the development of automated DR pathology screening has made encouraging progress. From an application perspective, computer-aided detection (CADe) algorithms and computer-aided diagnosis (CADx) algorithms can be viewed as typical representatives in the field. CADe detects lesions at the pixel level with manual segmentations [24]. On the basis of the detected lesions, CADx detects pathologies at the image level [25].

Materials

In our study, macula-centred retinal fundus images were taken from the Sichuan Academy of Medical Sciences and Sichuan Provincial Peoples Hospital between September 2017 and May 2018. The original data comprising 13,767 images of 1872 patients were collected from three sources: ophthalmology, endocrinology and physical examination centres (Fig. 1). In general, almost all patients from the ophthalmology department were diagnosed with DR, and nearly two-thirds of the patients from the

Aim and objective

The following two aims motivated this study and were realised with corresponding ensemble models that were designed in this work.

(1) To build an early DR automatic screening system. This aim is a binary classification task to identify the presence of DR. Currently, the manual DR screening method is labour intensive and suffers from inconsistencies across sites [40]; moreover, the number of people who can master DR treatment skills is still small amongst most grassroots health workers.

Configuration

The algorithms were implemented using Keras (http://keras.io/). All experiments were performed on a high-end workstation with an Intel Xeon E5-2620 CPU and NVIDIA Tesla K40 GPU with 64 GB of RAM. The dataset was split into 70% for training, 10% for validation and 20% for testing. The configuration of the hyper-parameter and the class distributions of the two tasks are shown in Table 4, Table 5, Table 6.

Strategy

The experiment process consisted of six steps: input data, data pre-processing,

Discussion

During the design of the two ensemble models, we made several considerations in the following aspects.

(1) Combined strategy of components: The frameworks of the two classification tasks searched for the ideal combination of component classifiers on the basis of the number of class tags in the dataset as a guide. We assumed that the basic components used in our experiments were all independent. In the experiments, we found that arbitrarily increasing or decreasing the number of component

Conclusion

In conclusion, a high-quality labelled medical imaging DR dataset was built, and an identification and grading system of DR called DeepDR was proposed. The relationship between the number of ideal component classifiers and the number of class labels was verified and explored. Using nine medical metrics, we evaluated the models in terms of validity and reliability. The results demonstrated that DeepDR worked satisfactorily.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 61432012 and U1435213.

References (54)

  • SingerD.E.

    Screening for diabetic retinopathy

    J. Intern. Med.

    (1996)
  • KristinssonJ.K. et al.

    Systematic screening for diabetic eye disease in insulin dependent diabetes

    Acta Ophthalmol.

    (2010)
  • Diabetic Retinopathy, American Academy of Ophthalmology...
  • BengioY. et al.

    Representation learning: A review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • LeCunY. et al.

    Deep learning

    Nature

    (2015)
  • SzeV. et al.

    Efficient processing of deep neural networks: A tutorial and survey

    Proc. IEEE

    (2017)
  • YiZ.

    Foundations of implementing the competitive layer model by lotka-volterra recurrent neural networks

    IEEE Trans. Neural Netw.

    (2010)
  • ZhangL. et al.

    Theoretical study of oscillator neurons in recurrent neural networks

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • KrizhevskyA. et al.

    Imagenet classification with deep convolutional neural networks

    Adv. Neural Inf. Process. Syst.

    (2012)
  • FakoorR. et al.

    Using deep learning to enhance cancer diagnosis and classification

    Proceedings of the International Conference on Machine Learning, vol. 28

    (2013)
  • EstevaA. et al.

    Dermatologist-level classification of skin cancer with deep neural networks

    Nature

    (2017)
  • WangD. et al.

    Deep learning for identifying metastatic breast cancer

    (2016)
  • RossettoA.M. et al.

    Deep learning for categorization of lung cancer CT images

    2017 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologie, CHASE

    (2017)
  • WangJ. et al.

    Automated retinopathy of prematurity screening using deep neural networks

    Ebiomedicine

    (2018)
  • HuJ. et al.

    Automated analysis for retinopathy of prematurity by deep neural networks

    IEEE Trans. Med. Imaging

    (2018)
  • PanS.J. et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • RokachL.

    Ensemble-based classifiers

    Artif. Intell. Rev.

    (2010)
  • Cited by (205)

    View all citing articles on Scopus
    View full text