1 Introduction

For many years, color fundus photographs have been a common imaging modality for ophthalmologists to examine the back of the eye in cases of optic disc swelling Fig. 1(a) [5]. Traditionally, assessment of swelling via fundus photographs has been a very challenging task due to the specialized expertise required and the inability to compute volumetric measures of the swelling [8]. With the introduction of spectral-domain optical coherence tomography (SD-OCT) Fig. 1(b) and development of associated retinal-layer-segmentation algorithms [3, 10] Fig. 1(c), SD-OCT-based volumetric measures have recently shown great promise in overcoming many of the limitations of a traditional fundus-based assessment.

Fig. 1.
figure 1

(a) An example registered and cropped fundus photograph at the optic-nerve-head (ONH) region with the nasal (N), temporal (T), superior (S), and inferior (I) sides marked. (b) A 3D rendering of the corresponding OCT image depicting a swollen optic disc. (c) A surface rendering of the internal limiting membrane (ILM) shown in red and the retinal pigment epithelium complex (RPE) shown in blue. (d) The ONH thickness map with a grid showing regional volumes (in mm\(^3\)) displayed within the grid, peripapillary volume (PRV), and total retinal volume (TRV) both shown in the bottom left-hand corner (also in mm\(^3\)). (Color figure online)

However, SD-OCT is not always available as its use is primarily limited to specialized eye clinics rather than in primary care or telemedical settings. Thus, the need for better approaches for the assessment of optic disc swelling via fundus photographs alone still exists. In this area, Echegaray et al. [2] proposed a decision-tree system that automatically grades/stages the optic disc swelling using image features directly extracted from monocular fundus photographs. However, this approach only provides ordinal severity grades rather than continuous volumetric measures. Tang et al. [9] demonstrated that the stereoscopic color fundus photographs have the potential to reconstruct the depth information and allow the volumetric estimation for the optic disc swelling, but the requirement of carefully acquired stereo images rather than monocular images limits its applicability. More recently, Agne et al. [1] proposed a regression approach, which can directly estimate the total retinal volume (TRV) at the optic-nerve-head (ONH) region by only inputting a single fundus photograph; however this approach doesn’t predict local thickness values as may be needed to compute regional volumes.

Thus, to overcome the limitations above, we propose a machine-learning method that estimates the local volumetric information by only requiring a single monocular color fundus photograph Fig. 1(a) as the input. The proposed method outputs a thickness map with \(200\times 200\) pixels covering \(6\times 6\) mm\(^2\) at the ONH region. Based on the resulting thickness map, the volumes of the peripapillary region, the nasal, temporal, inferior, and superior quadrants, as well as the TRV can be computed shown in Fig. 1(d). Results are quantitatively assessed using the root-mean-square errors between the model’s outputs and the OCT ground truths, as well as Spearman’s rank correlation coefficients. Visualizations of the predicted thickness maps are also provided for qualitative assessment.

2 Methods

2.1 Overview

For the purpose of a fair comparison between the two image modalities, the input fundus photographs were registered to the SD-OCT image domain and centered/cropped at the ONH Fig. 1(a). Next, the blood vessels were inpainted and the resulting fundus image had features pertaining to textural and color information extracted. Based on these selected features, random forest classifiers, which will be discussed more thoroughly in Sect. 3, were trained to estimate the depth information of the retina at the pixel level and be able to output an ONH thickness map that makes regional volumetric measurements computable Fig. 1(d).

2.2 Preprocessing

We first registered the input fundus photograph with the ONH-centered SD-OCT en-face image. In particular, the SD-OCT images were segmented using 2D/3D graph-theoretical algorithms [3, 10], and the en-face image was created by averaging the pixel intensities along each A-scan within the retinal pigment epithelium (RPE) complex. After that, we applied blood vessel inpainting on the ONH-centered/registered fundus photograph Fig. 2(a) to suppress the negative effects from blood vessels on the predicted thickness map. During the processes of vessel inpainting, a blood vessel probability map was computed using a deep learning based approach using U-Net [6] Fig. 2(b). Then, this vessel probability map was thresholded (\(p=0.5\)) into a blood vessel mask. Next, a binary morphology dilation (spherical filter size: \(r=1\) pixel) was used to ensure that the vessels were completely encompassed by the mask. By overlapping the cropped fundus photograph with the dilated blood vessel mask Fig. 2(c), a blood vessel inpainted image using second order interpolation was created Fig. 2(d).

Fig. 2.
figure 2

(a) An example cropped ONH-centered fundus image. (b) The corresponding blood vessel probability map obtained from a deep-learning neural network. (c) Overlapped dilated vessel mask on the cropped fundus photograph. (d) The blood vessel inpainted image.

2.3 Feature Extraction

In the processes of assessing optic disc swelling via fundus photographs, neuro-ophthalmologists grade the swelling severity by inspecting key observable features on the image. Similarly, in this work, several feature sets (categories include: image intensity, color representations, gradient, and texture information) were extracted from the inpainted image to help the proposed classifier to estimate the thickness information at pixel-level; a total of 58 features were used as listed in Table 1.

Table 1. Complete list of features for regression analysis.

To quantify the textural information, Gabor filters [7] are commonly used to analyze image objects with specific combinations of frequencies, directions, and regions of interest. Here, Gabor magnitude responses were computed at 0\(^\circ \), 45\(^\circ \), 90\(^\circ \), and 135\(^\circ \) with wavelengths of two and four pixels at each orientation Fig. 3.

Textural features were also obtained via use of gray-level co-occurrence matrices (GLCM) which involves statistically considering the spatial relationship of pixels [4]. The GLCMs were computed for each pixel in the inpainted image at an offset of one pixel at the right using three different neighborhood sizes: \(10\times 10\), \(15\times 15\), and \(20\times 20\). For each GLCM, statistical properties, including variance, contrast, entropy Fig. 4(a), homogeneity Fig. 4(b), and inertia, were used to create different feature images.

Fig. 3.
figure 3

Example Gabor responses with a wavelength of two pixels with directions (a) 0\(^\circ \), (b) 45\(^\circ \), (c) 90\(^\circ \), (d) 135\(^\circ \).

Fig. 4.
figure 4

Example feature images. (a) GLCM entropy (\(15\times 15\)). (b) GLCM homogeneity (\(15\times 15\)). (c) Saturation image. (d) Histogram equalized fundus image.

In addition, entropy can also be used to evaluate the image information in a particular region of interest. In this work, both small and large sliding window sets were applied on the inpainted fundus photographs to compute image entropy in regionsFootnote 1. For the smaller windows, these computed entropy values were interpreted as quantitative indicators of the presence of homogeneous textures; for the larger windows, the computed entropy values indicated the presence of heterogeneous textures. The image gradient direction and magnitude were also included as features as well to enhance the borders among the regions with texture and/or color changes. Along with the red, green, and blue intensity values, the inpainted fundus image was also converted to the hue-saturation-value (HSV) color space, and the separate channel values were used as features as well. Differences in color contrast can be accentuated in the hue and saturation Fig. 4(c) channels, which are highly indicative of texture since an opaque texture is more associated with swelling. In addition, the histogram-equalized image Fig. 4(d) as well as an intensity mapped image with the top and bottom one percent of pixels saturated were used as features. Both images work to emphasize contrast in color between regions of differing colors or differing intensities.

3 Experimental Methods

A total of 88 subjects with optic disc swelling having both volumetric SD-OCT as well as color fundus images were used for experimental analysis. The true thickness information at the ONH (i.e., the ground truth) for each subject was calculated based on the segmented internal limiting membrane (ILM) and the lower bounding surface of the RPE complex in the SD-OCT image. With a total of 58 input features from the input registered fundus image, a random forest bagged ensemble regression model was trained using 500 learning cycles and feature importance was calculated as part of the training process. To reduce computational complexity, the model was trained on two and a half percent of the pixels in each image chosen randomly. Evaluation was performed in a leave-one-subject-out approach so that the model for each subject was obtained by training the classifier on the images from the remaining 87 subjects. After predictions were made for individual pixel locations, volumes were calculated for the peripapillary, nasal, temporal, inferior, and superior regions. The peripapillary region was defined as the region inside a central circle with radius 1.73 mm. The nasal, temporal, inferior, and superior regions were defined as the four interior quadrants of the peripappilary circle using the 135\(^\circ \) and 45\(^\circ \) lines as boundaries. The overall TRV was calculated as well. Errors and correlations were then calculated for each individual region.

4 Results

When comparing the total retinal volume (TRV) calculated from the retinal thickness predictions generated from the described model and the ground truth from OCT images, a root-mean-square-error of 1.66 mm\(^3\) was achieved. Spearman’s correlation coefficient was \(R=0.73\). When comparing regional volumes, the nasal, temporal, inferior, superior, and peripapillary regions had root-mean-square-errors of 0.64 mm\(^3\), 0.61 mm\(^3\), 0.74 mm\(^3\), 0.71 mm\(^3\), and 1.30 mm\(^3\), respectively. The correlations (R) were 0.71 (nasal), 0.72 (temporal), 0.61 (inferior), 0.65 (superior), and 0.75 (peripapillary). Examples of comparisons between the total retinal thickness maps from the SD-OCTs (i.e., ground truths) and from the monocular fundus photographs are shown in Fig. 5.

Fig. 5.
figure 5

Input cropped and registered fundus photographs with their accompanying thickness maps generated by SD-OCT (used as the ground truth) as well as the predicted thickness maps generated by the model with regional volumes shown in the grid, and peripapillary plus total volumes shown in the lower left hand corner (in mm\(^3\)) shown in order of increasing swelling. Each column represents a different subject.

Average feature importance across all models was calculated by permuting the features and looking for change in the model error. Top features were found to be entropy in large neighborhoods as well as features that accentuate color change, such as hue, saturation Fig. 4(c), or the histogram equalized image Fig. 4(d). All features that had distinctly different values for the optic disc compared to the peripheral area were helpful in distinguishing swollen regions from non-swollen regions.

5 Discussion and Conclusion

In this preliminary study, we have shown that the proposed method demonstrates the monocular fundus photographs as a potentially lower cost but more available alternative to the SD-OCT in the primary care or telemedical settings in cases of assessment for optic disc swelling. Using features pertaining to textural and color information obtained directly from the fundus photographs, akin to properties neuro-ophthalmologists use, local thickness measurements can be estimated using a regression model trained on a variety of subjects with varying degrees of swelling. The ability to assess depth information at the swollen regions can help clinicians identify onset of diseases in earlier stages; for example, papilledema (a particular type of optic disc swelling due to increased intracranial pressure) often presents with swelling in the nasal quadrant relatively early [11]. In addition, the predicted thickness maps provide a future opportunity to create 3D retinal shape models directly from the 2D monocular fundus photographs. The retinal shape information is hypothetically helpful for distinguishing the different causes of the optic disc swelling.

The limitations of this work may include: (1) the lack of subjects with severe optic disc swelling causes the trained classifiers to slightly underestimate the retinal thickness at the extremely swollen regions, and (2) the thickness predictions at the regions with imperfect vessel inpainting are less accurate due to the inconsistent texture information between the swollen retinal tissue and the vessel residual. Potential future work can involve collecting more SD-OCT and fundus image pairs with various severity of optic disc swelling or creating 3D retinal shape models directly from the 2D fundus photographs to help identify causes of optic disc swelling.