Abstract
In order to develop a deep neural network able to differentiate glaucoma from non-glaucoma patients based on visual filed (VF) test results, we collected VF tests from 3 different ophthalmic centers in mainland China. Visual fields (VFs) obtained by both Humphrey 30-2 and 24-2 tests were collected. Reliability criteria were established as fixation losses less than 2/13, false positive and false negative rates of less than 15%. All the VFs from both eyes of a single patient are assigned to either train or validation set to avoid data leakage. We split a total of 4012 PD images from 1352 patients into two sets, 3712 for training and another 300 for validation. On the validation set of 300 VFs, CNN achieves the accuracy of 0.876, while the specificity and sensitivity are 0.826 and 0.932, respectively. For ophthalmologists, the average accuracies are 0.607, 0.585 and 0.626 for resident ophthalmologists, attending ophthalmologists and glaucoma experts, respectively. AGIS and GSS2 achieved accuracy of 0.459 and 0.523 respectively. Three traditional machine learning algorithms, namely support vector machine (SVM), random forest (RF), and k-nearest neighbor (k-NN) were also implemented and evaluated in the experiments, which achieved accuracy of 0.670, 0.644, and 0.591 respectively. In glaucoma diagnosis based on VF, our algorithm based on CNN has achieved higher accuracy compared to human ophthalmologists and traditional rules (AGIS and GSS2). It will be a powerful tool to distinguish glaucoma from non-glaucoma VFs, and may help screening and diagnosis of glaucoma in the future.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Glaucoma is currently the second leading cause of irreversible blindness in the world [1], which is commonly characterized by sustained or temporary elevation of IOP and defects in visual field. Diagnosis of glaucoma depends on the information from various clinical examinations including visual field (VF), optical coherence tomography (OCT) and fundus photo [1, 2]. Fundus photos are easy to capture and frequently used in glaucoma screening. Localization of optic cup and disc is the main clue for machines to make diagnosis [3, 4]. In clinical practice, VF is widely used as the gold standard to judge whether patients have typical glaucomatous damage. Specific patterns of defects such as nasal step and arcuate scotoma shown in visual field indicate existence of glaucoma [5, 6].
Researchers have developed several algorithms based on data from clinical studies, such as Advanced Glaucoma Intervention Study (AGIS) criteria and Glaucoma Staging System (GSS) criteria to grade glaucomatous VFs [5, 7,8,9]. However, it is hard to diagnose glaucoma depending on VF alone and for early stage glaucoma, even if retinal nerve fiber layer (RNFL) had been damaged there can be no obvious defect in VF. Therefore, it is necessary to develop new algorithm for glaucoma diagnosis. Thus, we designed this study to investigate the performance of deep neural network to identify glaucomatous VFs from non-glaucomatous VFs and to compare the performance of machine against human ophthalmologists.
2 Methods
2.1 Data Preparation
The study was approved by the Ethical Review Committee of the Zhongshan Ophthalmic Center and was conducted in accordance with the Declaration of Helsinki for research involving human subjects. The study has been registered in clincaltrials.gov (NCT: 03268031). All the visual fields (VFs) were obtained by either Humphrey Field Analyzer 30-2 or 24-2 tests. To guarantee reliability, only VFs with fixation losses of less than 2/13, false positive and false negative rates of less than 15% were selected in the experiments. Representative examples of non-glaucoma and glaucoma PD plots are shown in Fig. 1.
The probability map of pattern deviation (PD image) is then cropped from the VF report and resized to \( 224 \times 224 \) as the input of a deep CNN. All the VFs of both eyes of a single patient are assigned to either training or validation set to avoid data leakage. In this way, we split a total of 4012 PD images into two sets, 3712 for training and another 300 for validation. For data augmentation, we randomly flip the PD images in the training set horizontally to obtain final 7424 training samples. Cross validation is performed by randomly splitting the training and validation sets 3 times and no significant difference is observed. The validation set consists of 150 glaucomatous PD images and 150 non-glaucomatous PD images. The non-glaucomatous PD images include 50 images with only cataract and 150 images with no ocular disease, retinal diseases or neuro-ophthalmic diseases.
2.2 Diagnostic Criteria of Glaucoma
Glaucoma was diagnosed with similar criteria to UKGTS study [10]. VFs of patients who have glaucomatous damage to optic nerve head (ONH) and reproducible glaucomatous VF defects were included. A glaucomatous VF defect was defined as a reproducible reduction of sensitivity compared to the normative database in reliable tests at: (1) two or more contiguous locations with P < 0.01 loss or more, (2) three or more contiguous locations with P < 0.05 loss or more. ONH damage was defined as C/D ratio ≥ 0.7, thinning of RNFL or both, without a retinal or neurological cause of VF loss.
2.3 Deep CNN for Glaucoma Diagnosis
We adopted the powerful VGG [11, 12] as our network structure. The VGG network consists of 13 convolution layers and 3 fully connected layers. We modified the output dimension of the penultimate layer fc7 from 4096 to 200. And the last layer is modified to output a two-dimension vector which corresponds to the prediction scores of healthy VF and glaucoma VF. The network is first pre-trained on a large scale, natural image classification dataset ImageNet [13] to initialize its parameters. Then we modified the last two layers as mentioned above and initialized their parameters by drawing from a Gaussian distribution. All the parameters of the network were updated by the stochastic gradient descend algorithm with the softmax cross-entropy loss. The network structure is shown in Fig. 2.
2.4 Comparison Between CNN-Based Algorithm and Human Ophthalmologists in Glaucoma Diagnosis
We compared diagnostic accuracy between our algorithm based on deep neural network and ophthalmologists. We chose 9 ophthalmologists in 3 different levels (glaucoma experts: Professor YL-L, XC-D and SJ-F; attending ophthalmologists: Dr. T-S, WY-L and WY-Y; resident ophthalmologists: Dr. X-G, WJ-Z and YY-W), from 4 eye institutes (see details in acknowledgements). None of them has participated in the current research. Attending ophthalmologists are doctors who have clinical training in ophthalmology for at least 5 years, while resident ophthalmologists are doctors who have clinical training in ophthalmology for 1–3 years. Ophthalmologists were shown the PD images alone and requested to assign one of five labels to each PD image, i.e., non-glaucoma, likely non-glaucoma, uncertain, likely glaucoma and glaucoma.
2.5 Traditional Methods for Glaucoma Diagnosis
As a comparison, we also evaluated several rule-based methods and traditional machine learning methods for glaucoma diagnosis.
Rule-based methods included AGIS and GSS methods. For AGIS, a VF is considered to be abnormal if three or more contiguous points in the TD plot are outside of normal limits [8]. GSS2 uses both MD and PSD values to classify VFs into 6 stages [9]. Only stage 0 is considered healthy and other stages are treated as glaucoma.
Moreover, we also compared our method with three other non-deep machine learning algorithms. Support Vector Machine (SVM) [14] maps training samples into high dimensional points that can be separated by a hyperplane as wide as possible. Random Forest (RF) [15] constructs a set of decision trees, and each sample is classified according to the number of training samples of different categories falling into the same leaf node. For k-Nearest-Neighbor (k-NN) [16] method, the sample is classified as healthy or glaucoma by majority voting from its k nearest training samples. Throughout these experiments, we used 52 PD values in VFs obtained in 24-2 test. For 30-2 test, 22 outermost values were discarded so that they can be treated equally. We optimized all the algorithms to improve their performance, e.g., we experimented whether to use Principal Component Analysis (PCA) for preprocessing, different kernel types in SVM, different numbers of trees in RF and various k values in k-NN.
3 Results
Baseline characteristics are shown in Table 1. We totally collected 4012 VF reports, including glaucoma and non-glaucoma reports. To compare the statistical difference between non- glaucoma group and glaucoma group, we run an unpaired test for numerical data and chi-square test for categorical data. It can be observed that there was no significant difference between left eye to right eye ratio (P = 0.6211, chi-square test), while age (P = 0.0022, unpaired t test), VFI (P = 0.0001, unpaired t test), MD (P = 0.0039, unpaired t test) and PSD (P = 0.0001, unpaired t test) exhibited obvious statistical differences.
To evaluate the effectiveness of the algorithm for automatic diagnosis of glaucoma, we summarized the performance of the proposed algorithm in Table 2.
On the validation set of 300 VFs, our algorithm based on CNN achieved an accuracy of 0.876, while the specificity and sensitivity was 0.826 and 0.932, respectively. In order to compare the results of ophthalmologists with machines, we also developed a software to collect evaluation results from ophthalmologists. Ophthalmologists were shown the PD images alone and requested to assign one of five labels to each image, i.e., non-glaucoma, likely non-glaucoma, uncertain, likely glaucoma and glaucoma. They were strongly advised not to choose the uncertain label. For final evaluation, the non-glaucoma and likely non-glaucoma labels were counted as normal, while the likely glaucoma and glaucoma labels were counted as glaucoma, and the uncertain level is considered as a wrong answer. Although the ophthalmologists included three resident ophthalmologists, three attending ophthalmologists and three glaucoma experts, we did not observe significant differences among these three groups. The average accuracies are 0.607, 0.585 and 0.626 for resident ophthalmologists, attending ophthalmologists and glaucoma experts, respectively. However, there exists a huge performance gap between ophthalmologists and CNN, which indicates that CNN may have strong ability to identify the complex patterns presented in the PD images for glaucoma diagnosis. Two rule based methods, AGIS and GSS2, were also compared in the experiment. Both methods are not able to achieve satisfactory results. Interestingly, all the ophthalmologists performed better than GSS2 and AGIS, indicating the importance of human experience in the decision-making process. Three traditional machine learning algorithms were also included in the experiments. SVM performed best among these machine learning methods, but still much worse than CNN.
As shown in Fig. 3, we examined the receiver operating characteristic curve (ROC) of CNN and the compared methods. Our algorithm achieved an AUC of 0.966 (95%CI, 0.948-0.985). It outperformed all the ophthalmologists, rule based methods and traditional machine learning methods by a large margin.
We also studied the relative validation set accuracy as a function of the number of images in the training set. The training set is randomly chosen as a subset of the original training set at rates of (5%, 10%, …, 100%). Each set includes all the images in the smaller subset. As shown in Fig. 4, we can see the performance does not improve too much after the training set includes more than 3612 images.
4 Discussion
In our study, we presented two meaningful contributions: (1) we designed a project to develop our algorithm for diagnosis of glaucoma, which consisted of 4 steps: data collection, model design, training strategy design and model validation; (2) we have developed a deep learning-based method that can differentiate glaucoma from non-glaucoma based on VFs and verified its efficacy on differentiation of VFs and advantage over human ophthalmologists. Our approach based on CNN achieved both higher sensitivity and specificity than traditional machine learning method and the algorithms concluded from clinical trials such as AGIS [8]. Applying CNN to the interpretation of VFs, we found that the method is both sensitive and reliable. Although ophthalmologists performed better than AGIS and GSS2, CNN-based algorithm is even better at recognizing patterns presented in the PD images. Our results demonstrated the possibility of applying CNN to assist screening and diagnosis of glaucoma.
We compared the performance of our algorithm based on CNN against human ophthalmologists of different levels. As expected, glaucoma experts achieved the highest accuracy in VF interpretation, although there was just 2% and 4% different when compared to attending and resident doctors respectively. With accumulation of clinical experience, doctors tend to have higher specificity while lower sensitivity. Because doctors only have VFs as accessory examination to make a diagnosis, their diagnostic ability was restricted, and they would tend to be more careful about their decision. However, machines got the highest score in the test, achieving highest sensitivity while keeping high specificity. In our second step, we compared performance of our algorithm against 2 criteria summarized from clinical trials, AGIS and GSS2 [8, 9]. AGIS and GSS2 criteria were built to evaluate severity and staging of glaucoma based on VF. VF is divided into different areas with different weights. These algorithms, however, were based on regression analysis, so it is typically linear and won’t have good performance with complex VFs. In the last step, we compared performance of our CNN-based algorithm with traditional machine learning method, including RF, SVM and k-NN. A previous study used feed forward neural network (FNN) to detect preperimetric glaucoma, which showed overwhelming advantage over traditional machine learning methods [17]. In our study, similar results were obtained. This is because these algorithms are all shallow models which cannot extract representative features of the PD images.
It should be noted that this study had several limitations. First, we used only pattern deviation images as the input of machine learning algorithms. Thus, preperimetric glaucoma may not be effectively detected by machine. We don’t consider VF from cross-sectional test is able to help diagnose early stage disease, that’s why we didn’t try to differentiate preperimetric glaucoma in our study. In future studies, we plan to combine VF with OCT scans. With input from different imaging modalities, it is expected that deep networks may be able to make more accurate diagnosis. Second, at current stage, the program we developed can just tell glaucoma from non-glaucoma. Various diseases, such as neuro-ophthalmic diseases and cataract, may influence VFs. We hope to extend the function of our deep models to diagnose more ocular diseases.
5 Conclusion
In glaucoma diagnosis based on VF, our algorithm based on CNN has achieved higher accuracy compared to human ophthalmologists and traditional rules (AGIS and GSS2). The accuracy is 0.876, while the specificity and sensitivity are 0.826 and 0.932, respectively, indicating advantages of CNN-based algorithms over humans in diagnosis of glaucoma. It will be a powerful tool to distinguish glaucoma from non-glaucoma VFs, and may help screening and diagnosis of glaucoma in the future.
References
Quigley, H.A.: Glaucoma. Lancet 377(9774), 1367–1377 (2011)
Jonas, J.B., Aung, T., Bourne, R.R., Bron, A.M., Ritch, R., Panda-Jonas, S.: Glaucoma. Lancet 390, 2183–2193 (2017)
Fu, H., Cheng, J., Xu, Y., Wong, D.W.K., Liu, J., Cao, X.: Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. arXiv preprint arXiv:180100926 (2018)
Fu, H., Cheng, J., Xu, Y., et al.: Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image. IEEE Trans. Med. Imaging (2018)
The Advanced Glaucoma Intervention Study (AGIS): 7. The relationship between control of intraocular pressure and visual field deterioration.The AGIS Investigators. Am. J. Ophthalmol. 130(4), 429–440 (2000)
Musch, D.C., Gillespie, B.W., Lichter, P.R., Niziol, L.M., Janz, N.K., Investigators, C.S.: Visual field progression in the collaborative initial glaucoma treatment study the impact of treatment and other baseline factors. Ophthalmology 116(2), 200–207 (2009)
Nouri-Mahdavi, K., Hoffman, D., Gaasterland, D., Caprioli, J.: Prediction of visual field progression in glaucoma. Invest. Ophthalmol. Vis. Sci. 45(12), 4346–4351 (2004)
Advanced Glaucoma Intervention Study. 2. Visual field test scoring and reliability. Ophthalmology 101(8), 1445–1455 (1994)
Brusini, P., Filacorda, S.: Enhanced Glaucoma Staging System (GSS 2) for classifying functional damage in glaucoma. J. Glaucoma 15(1), 40–46 (2006)
Garway-Heath, D.F., Crabb, D.P., Bunce, C., et al.: Latanoprost for open-angle glaucoma (UKGTS): a randomised, multicentre, placebo-controlled trial. Lancet 385(9975), 1295–1304 (2015)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:14053531 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 (2014)
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Ho, T.K.: Random decision forests. Paper presented at: Document Analysis and Recognition (1995). Proceedings of the Third International Conference
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
Asaoka, R., Murata, H., Iwase, A., Araie, M.: Detecting preperimetric glaucoma with standard automated perimetry using a deep learning classifier. Ophthalmology 123(9), 1974–1980 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, F., Wang, Z., Qu, G., Qiao, Y., Zhang, X. (2018). Visual Field Based Automatic Diagnosis of Glaucoma Using Deep Convolutional Neural Network. In: Stoyanov, D., et al. Computational Pathology and Ophthalmic Medical Image Analysis. OMIA COMPAY 2018 2018. Lecture Notes in Computer Science(), vol 11039. Springer, Cham. https://doi.org/10.1007/978-3-030-00949-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-00949-6_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00948-9
Online ISBN: 978-3-030-00949-6
eBook Packages: Computer ScienceComputer Science (R0)