Visual Field Based Automatic Diagnosis of Glaucoma Using Deep Convolutional Neural Network

Li, Fei; Wang, Zhe; Qu, Guoxiang; Qiao, Yu; Zhang, Xiulan

doi:10.1007/978-3-030-00949-6_34

Fei Li²⁸,
Zhe Wang³⁰,
Guoxiang Qu²⁹,
Yu Qiao²⁹ &
…
Xiulan Zhang²⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11039))

Included in the following conference series:

2342 Accesses

Abstract

In order to develop a deep neural network able to differentiate glaucoma from non-glaucoma patients based on visual filed (VF) test results, we collected VF tests from 3 different ophthalmic centers in mainland China. Visual fields (VFs) obtained by both Humphrey 30-2 and 24-2 tests were collected. Reliability criteria were established as fixation losses less than 2/13, false positive and false negative rates of less than 15%. All the VFs from both eyes of a single patient are assigned to either train or validation set to avoid data leakage. We split a total of 4012 PD images from 1352 patients into two sets, 3712 for training and another 300 for validation. On the validation set of 300 VFs, CNN achieves the accuracy of 0.876, while the specificity and sensitivity are 0.826 and 0.932, respectively. For ophthalmologists, the average accuracies are 0.607, 0.585 and 0.626 for resident ophthalmologists, attending ophthalmologists and glaucoma experts, respectively. AGIS and GSS2 achieved accuracy of 0.459 and 0.523 respectively. Three traditional machine learning algorithms, namely support vector machine (SVM), random forest (RF), and k-nearest neighbor (k-NN) were also implemented and evaluated in the experiments, which achieved accuracy of 0.670, 0.644, and 0.591 respectively. In glaucoma diagnosis based on VF, our algorithm based on CNN has achieved higher accuracy compared to human ophthalmologists and traditional rules (AGIS and GSS2). It will be a powerful tool to distinguish glaucoma from non-glaucoma VFs, and may help screening and diagnosis of glaucoma in the future.

You have full access to this open access chapter, Download conference paper PDF

Automatic differentiation of Glaucoma visual field from non-glaucoma visual filed using deep convolutional neural network

Article Open access 04 October 2018

Use of multimodal dataset in AI for detecting glaucoma based on fundus photographs assessed with OCT: focus group study on high prevalence of myopia

Article Open access 24 November 2022

An Automated Early Detection of Glaucoma using Support Vector Machine Based Visual Geometry Group 19 (VGG-19) Convolutional Neural Network

Article 01 January 2021

Keywords

1 Introduction

Glaucoma is currently the second leading cause of irreversible blindness in the world [1], which is commonly characterized by sustained or temporary elevation of IOP and defects in visual field. Diagnosis of glaucoma depends on the information from various clinical examinations including visual field (VF), optical coherence tomography (OCT) and fundus photo [1, 2]. Fundus photos are easy to capture and frequently used in glaucoma screening. Localization of optic cup and disc is the main clue for machines to make diagnosis [3, 4]. In clinical practice, VF is widely used as the gold standard to judge whether patients have typical glaucomatous damage. Specific patterns of defects such as nasal step and arcuate scotoma shown in visual field indicate existence of glaucoma [5, 6].

Researchers have developed several algorithms based on data from clinical studies, such as Advanced Glaucoma Intervention Study (AGIS) criteria and Glaucoma Staging System (GSS) criteria to grade glaucomatous VFs [5, 7,8,9]. However, it is hard to diagnose glaucoma depending on VF alone and for early stage glaucoma, even if retinal nerve fiber layer (RNFL) had been damaged there can be no obvious defect in VF. Therefore, it is necessary to develop new algorithm for glaucoma diagnosis. Thus, we designed this study to investigate the performance of deep neural network to identify glaucomatous VFs from non-glaucomatous VFs and to compare the performance of machine against human ophthalmologists.

2 Methods

2.1 Data Preparation

The study was approved by the Ethical Review Committee of the Zhongshan Ophthalmic Center and was conducted in accordance with the Declaration of Helsinki for research involving human subjects. The study has been registered in clincaltrials.gov (NCT: 03268031). All the visual fields (VFs) were obtained by either Humphrey Field Analyzer 30-2 or 24-2 tests. To guarantee reliability, only VFs with fixation losses of less than 2/13, false positive and false negative rates of less than 15% were selected in the experiments. Representative examples of non-glaucoma and glaucoma PD plots are shown in Fig. 1.

The probability map of pattern deviation (PD image) is then cropped from the VF report and resized to \( 224 \times 224 \) as the input of a deep CNN. All the VFs of both eyes of a single patient are assigned to either training or validation set to avoid data leakage. In this way, we split a total of 4012 PD images into two sets, 3712 for training and another 300 for validation. For data augmentation, we randomly flip the PD images in the training set horizontally to obtain final 7424 training samples. Cross validation is performed by randomly splitting the training and validation sets 3 times and no significant difference is observed. The validation set consists of 150 glaucomatous PD images and 150 non-glaucomatous PD images. The non-glaucomatous PD images include 50 images with only cataract and 150 images with no ocular disease, retinal diseases or neuro-ophthalmic diseases.

2.2 Diagnostic Criteria of Glaucoma

Glaucoma was diagnosed with similar criteria to UKGTS study [10]. VFs of patients who have glaucomatous damage to optic nerve head (ONH) and reproducible glaucomatous VF defects were included. A glaucomatous VF defect was defined as a reproducible reduction of sensitivity compared to the normative database in reliable tests at: (1) two or more contiguous locations with P < 0.01 loss or more, (2) three or more contiguous locations with P < 0.05 loss or more. ONH damage was defined as C/D ratio ≥ 0.7, thinning of RNFL or both, without a retinal or neurological cause of VF loss.

2.3 Deep CNN for Glaucoma Diagnosis

We adopted the powerful VGG [11, 12] as our network structure. The VGG network consists of 13 convolution layers and 3 fully connected layers. We modified the output dimension of the penultimate layer fc7 from 4096 to 200. And the last layer is modified to output a two-dimension vector which corresponds to the prediction scores of healthy VF and glaucoma VF. The network is first pre-trained on a large scale, natural image classification dataset ImageNet [13] to initialize its parameters. Then we modified the last two layers as mentioned above and initialized their parameters by drawing from a Gaussian distribution. All the parameters of the network were updated by the stochastic gradient descend algorithm with the softmax cross-entropy loss. The network structure is shown in Fig. 2.

2.4 Comparison Between CNN-Based Algorithm and Human Ophthalmologists in Glaucoma Diagnosis

We compared diagnostic accuracy between our algorithm based on deep neural network and ophthalmologists. We chose 9 ophthalmologists in 3 different levels (glaucoma experts: Professor YL-L, XC-D and SJ-F; attending ophthalmologists: Dr. T-S, WY-L and WY-Y; resident ophthalmologists: Dr. X-G, WJ-Z and YY-W), from 4 eye institutes (see details in acknowledgements). None of them has participated in the current research. Attending ophthalmologists are doctors who have clinical training in ophthalmology for at least 5 years, while resident ophthalmologists are doctors who have clinical training in ophthalmology for 1–3 years. Ophthalmologists were shown the PD images alone and requested to assign one of five labels to each PD image, i.e., non-glaucoma, likely non-glaucoma, uncertain, likely glaucoma and glaucoma.

2.5 Traditional Methods for Glaucoma Diagnosis

As a comparison, we also evaluated several rule-based methods and traditional machine learning methods for glaucoma diagnosis.

Rule-based methods included AGIS and GSS methods. For AGIS, a VF is considered to be abnormal if three or more contiguous points in the TD plot are outside of normal limits [8]. GSS2 uses both MD and PSD values to classify VFs into 6 stages [9]. Only stage 0 is considered healthy and other stages are treated as glaucoma.

Moreover, we also compared our method with three other non-deep machine learning algorithms. Support Vector Machine (SVM) [14] maps training samples into high dimensional points that can be separated by a hyperplane as wide as possible. Random Forest (RF) [15] constructs a set of decision trees, and each sample is classified according to the number of training samples of different categories falling into the same leaf node. For k-Nearest-Neighbor (k-NN) [16] method, the sample is classified as healthy or glaucoma by majority voting from its k nearest training samples. Throughout these experiments, we used 52 PD values in VFs obtained in 24-2 test. For 30-2 test, 22 outermost values were discarded so that they can be treated equally. We optimized all the algorithms to improve their performance, e.g., we experimented whether to use Principal Component Analysis (PCA) for preprocessing, different kernel types in SVM, different numbers of trees in RF and various k values in k-NN.

3 Results

Baseline characteristics are shown in Table 1. We totally collected 4012 VF reports, including glaucoma and non-glaucoma reports. To compare the statistical difference between non- glaucoma group and glaucoma group, we run an unpaired test for numerical data and chi-square test for categorical data. It can be observed that there was no significant difference between left eye to right eye ratio (P = 0.6211, chi-square test), while age (P = 0.0022, unpaired t test), VFI (P = 0.0001, unpaired t test), MD (P = 0.0039, unpaired t test) and PSD (P = 0.0001, unpaired t test) exhibited obvious statistical differences.

Table 1. Baseline characteristics of participants

Full size table

To evaluate the effectiveness of the algorithm for automatic diagnosis of glaucoma, we summarized the performance of the proposed algorithm in Table 2.

Table 2. Performance of the algorithm and the compared methods.

Full size table

On the validation set of 300 VFs, our algorithm based on CNN achieved an accuracy of 0.876, while the specificity and sensitivity was 0.826 and 0.932, respectively. In order to compare the results of ophthalmologists with machines, we also developed a software to collect evaluation results from ophthalmologists. Ophthalmologists were shown the PD images alone and requested to assign one of five labels to each image, i.e., non-glaucoma, likely non-glaucoma, uncertain, likely glaucoma and glaucoma. They were strongly advised not to choose the uncertain label. For final evaluation, the non-glaucoma and likely non-glaucoma labels were counted as normal, while the likely glaucoma and glaucoma labels were counted as glaucoma, and the uncertain level is considered as a wrong answer. Although the ophthalmologists included three resident ophthalmologists, three attending ophthalmologists and three glaucoma experts, we did not observe significant differences among these three groups. The average accuracies are 0.607, 0.585 and 0.626 for resident ophthalmologists, attending ophthalmologists and glaucoma experts, respectively. However, there exists a huge performance gap between ophthalmologists and CNN, which indicates that CNN may have strong ability to identify the complex patterns presented in the PD images for glaucoma diagnosis. Two rule based methods, AGIS and GSS2, were also compared in the experiment. Both methods are not able to achieve satisfactory results. Interestingly, all the ophthalmologists performed better than GSS2 and AGIS, indicating the importance of human experience in the decision-making process. Three traditional machine learning algorithms were also included in the experiments. SVM performed best among these machine learning methods, but still much worse than CNN.

As shown in Fig. 3, we examined the receiver operating characteristic curve (ROC) of CNN and the compared methods. Our algorithm achieved an AUC of 0.966 (95%CI, 0.948-0.985). It outperformed all the ophthalmologists, rule based methods and traditional machine learning methods by a large margin.

We also studied the relative validation set accuracy as a function of the number of images in the training set. The training set is randomly chosen as a subset of the original training set at rates of (5%, 10%, …, 100%). Each set includes all the images in the smaller subset. As shown in Fig. 4, we can see the performance does not improve too much after the training set includes more than 3612 images.

4 Discussion

In our study, we presented two meaningful contributions: (1) we designed a project to develop our algorithm for diagnosis of glaucoma, which consisted of 4 steps: data collection, model design, training strategy design and model validation; (2) we have developed a deep learning-based method that can differentiate glaucoma from non-glaucoma based on VFs and verified its efficacy on differentiation of VFs and advantage over human ophthalmologists. Our approach based on CNN achieved both higher sensitivity and specificity than traditional machine learning method and the algorithms concluded from clinical trials such as AGIS [8]. Applying CNN to the interpretation of VFs, we found that the method is both sensitive and reliable. Although ophthalmologists performed better than AGIS and GSS2, CNN-based algorithm is even better at recognizing patterns presented in the PD images. Our results demonstrated the possibility of applying CNN to assist screening and diagnosis of glaucoma.

We compared the performance of our algorithm based on CNN against human ophthalmologists of different levels. As expected, glaucoma experts achieved the highest accuracy in VF interpretation, although there was just 2% and 4% different when compared to attending and resident doctors respectively. With accumulation of clinical experience, doctors tend to have higher specificity while lower sensitivity. Because doctors only have VFs as accessory examination to make a diagnosis, their diagnostic ability was restricted, and they would tend to be more careful about their decision. However, machines got the highest score in the test, achieving highest sensitivity while keeping high specificity. In our second step, we compared performance of our algorithm against 2 criteria summarized from clinical trials, AGIS and GSS2 [8, 9]. AGIS and GSS2 criteria were built to evaluate severity and staging of glaucoma based on VF. VF is divided into different areas with different weights. These algorithms, however, were based on regression analysis, so it is typically linear and won’t have good performance with complex VFs. In the last step, we compared performance of our CNN-based algorithm with traditional machine learning method, including RF, SVM and k-NN. A previous study used feed forward neural network (FNN) to detect preperimetric glaucoma, which showed overwhelming advantage over traditional machine learning methods [17]. In our study, similar results were obtained. This is because these algorithms are all shallow models which cannot extract representative features of the PD images.

It should be noted that this study had several limitations. First, we used only pattern deviation images as the input of machine learning algorithms. Thus, preperimetric glaucoma may not be effectively detected by machine. We don’t consider VF from cross-sectional test is able to help diagnose early stage disease, that’s why we didn’t try to differentiate preperimetric glaucoma in our study. In future studies, we plan to combine VF with OCT scans. With input from different imaging modalities, it is expected that deep networks may be able to make more accurate diagnosis. Second, at current stage, the program we developed can just tell glaucoma from non-glaucoma. Various diseases, such as neuro-ophthalmic diseases and cataract, may influence VFs. We hope to extend the function of our deep models to diagnose more ocular diseases.

5 Conclusion

In glaucoma diagnosis based on VF, our algorithm based on CNN has achieved higher accuracy compared to human ophthalmologists and traditional rules (AGIS and GSS2). The accuracy is 0.876, while the specificity and sensitivity are 0.826 and 0.932, respectively, indicating advantages of CNN-based algorithms over humans in diagnosis of glaucoma. It will be a powerful tool to distinguish glaucoma from non-glaucoma VFs, and may help screening and diagnosis of glaucoma in the future.

References

Quigley, H.A.: Glaucoma. Lancet 377(9774), 1367–1377 (2011)
Article Google Scholar
Jonas, J.B., Aung, T., Bourne, R.R., Bron, A.M., Ritch, R., Panda-Jonas, S.: Glaucoma. Lancet 390, 2183–2193 (2017)
Article Google Scholar
Fu, H., Cheng, J., Xu, Y., Wong, D.W.K., Liu, J., Cao, X.: Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. arXiv preprint arXiv:180100926 (2018)
Fu, H., Cheng, J., Xu, Y., et al.: Disc-aware Ensemble Network for Glaucoma Screening from Fundus Image. IEEE Trans. Med. Imaging (2018)
Google Scholar
The Advanced Glaucoma Intervention Study (AGIS): 7. The relationship between control of intraocular pressure and visual field deterioration.The AGIS Investigators. Am. J. Ophthalmol. 130(4), 429–440 (2000)
Article Google Scholar
Musch, D.C., Gillespie, B.W., Lichter, P.R., Niziol, L.M., Janz, N.K., Investigators, C.S.: Visual field progression in the collaborative initial glaucoma treatment study the impact of treatment and other baseline factors. Ophthalmology 116(2), 200–207 (2009)
Article Google Scholar
Nouri-Mahdavi, K., Hoffman, D., Gaasterland, D., Caprioli, J.: Prediction of visual field progression in glaucoma. Invest. Ophthalmol. Vis. Sci. 45(12), 4346–4351 (2004)
Article Google Scholar
Advanced Glaucoma Intervention Study. 2. Visual field test scoring and reliability. Ophthalmology 101(8), 1445–1455 (1994)
Google Scholar
Brusini, P., Filacorda, S.: Enhanced Glaucoma Staging System (GSS 2) for classifying functional damage in glaucoma. J. Glaucoma 15(1), 40–46 (2006)
Article Google Scholar
Garway-Heath, D.F., Crabb, D.P., Bunce, C., et al.: Latanoprost for open-angle glaucoma (UKGTS): a randomised, multicentre, placebo-controlled trial. Lancet 385(9975), 1295–1304 (2015)
Article Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:14053531 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 (2014)
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Ho, T.K.: Random decision forests. Paper presented at: Document Analysis and Recognition (1995). Proceedings of the Third International Conference
Google Scholar
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet Google Scholar
Asaoka, R., Murata, H., Iwase, A., Araie, M.: Detecting preperimetric glaucoma with standard automated perimetry using a deep learning classifier. Ophthalmology 123(9), 1974–1980 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Zhongshan Ophthalmic Center, State Key Laboratory of Ophthalmology, Sun Yat-Sen University, Guangzhou, China
Fei Li & Xiulan Zhang
Guangdong Key Lab of Computer Vision and Virtual Reality, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China
Guoxiang Qu & Yu Qiao
SenseTime Group Limited, Hong Kong, China
Zhe Wang

Authors

Fei Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoxiang Qu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiulan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiulan Zhang .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
Radboud University Medical Center, Nijmegen, The Netherlands
Francesco Ciompi
Baidu, Beijing, China
Yanwu Xu
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
University of Warwick, Coventry, UK
Nasir Rajpoot
Radboud University Medical Centre, Nijmegen, The Netherlands
Jeroen van der Laak
Eindhoven University of Technology, Eindhoven, The Netherlands
Mitko Veta
University of Dundee, Dundee, UK
Stephen McKenna
University Hospital Coventry, Coventry, UK
David Snead
University of Dundee, Dundee, UK
Emanuele Trucco
University of Iowa, Iowa City, IA, USA
Mona K. Garvin
Soochow University, Suzhou, China
Xin Jan Chen
Medical University of Vienna, Vienna, Austria
Hrvoje Bogunovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, F., Wang, Z., Qu, G., Qiao, Y., Zhang, X. (2018). Visual Field Based Automatic Diagnosis of Glaucoma Using Deep Convolutional Neural Network. In: Stoyanov, D., et al. Computational Pathology and Ophthalmic Medical Image Analysis. OMIA COMPAY 2018 2018. Lecture Notes in Computer Science(), vol 11039. Springer, Cham. https://doi.org/10.1007/978-3-030-00949-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-00949-6_34
Published: 14 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00948-9
Online ISBN: 978-3-030-00949-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics