Keywords

1 Introduction

Medical practice shows that retinal vascular arteriovenous diameter ratio, shape and other changes are the basis for the early diagnosis of many diseases, such as cardiovascular disease [1], diabetic retinopathy [2], hypertension [3] and so on. Medical experience shows that the region of 1DD-1.5DD (disc diameter) from the disc center is an important area to extract the biological information of retina [4, 5]. AVR (Arteriole to Venule Ratio) in this area is a commonly used characteristic signal for disease prediction and assessment. The paper [6] clearly pointed out: If the middle-aged people have a smaller AVR, then they are more likely to suffer from stroke. An important prerequisite for obtaining AVR values is to correctly distinguish the arteries and the veins. Therefore, achieving the automatic classification of arteries and veins is the key to obtaining AVR values before predicting and assessing the disease, it is of great practical significance. But the fundus images have the following characteristics: Illumination is not uniform, the blood vessels stagger with each other complexly, and the difference between arteries and veins is small. These characteristics result in difficulties in distinguishing arteries and veins.

In recent years, there are many research achievements on retinal vessels segmentation at home and abroad. However, there are just few researches focusing on automatic classification of arteries and veins. Vázquez proposed a localized arteriovenous classification method based on K-means clustering algorithm [7]. Relan automatically classified retinal vessels as arteries or veins based on color features using GMM-EM (Gaussian Mixture Model, Expectation-Maximization) unsupervised classifier and a quadrant-pairwise approach [8]. Niemeijer proposed a global artery/vein classification method [9]. [10, 11] proposed an approach for artery/vein classification based on the analysis of a graph extracted from the retinal vasculature. The proposed method classified the entire vascular tree deciding on the type of each intersection point (graph nodes) and assigning one of two labels to each vessel segment (graph links). Vijayakumar proposed a classification method based on Random Forest and SVM (Support Vector Machines) [12]. All of the above methods rely on color information heavily, which leads to a relatively low classification accuracy for fundus images with complex background and uneven brightness. Mirsharif divided the vessel tree into several subsets, and then integrated vascular tracking techniques and color information to classify the global vessels [13]. Vázquez used the minimal path approach [7, 14] to revise the results of classification mentioned above [7]. Estrada proposed a novel, graph-theoretic framework for distinguishing arteries from veins in a fundus image, and made use of the underlying vessel topology to better classify small and midsized vessels [15]. These three methods take the vascular structural properties into account, but the use of vascular color information is not enough.

In summary, many of the current studies on artery/vein classification are mainly based on color information or part of the vascular structural information, but there are just few researches taking the color and structural information into account simultaneously. The paper comprehensively utilizes the prior knowledge of fundus color and vessels structure, and proposes a retinal artery/vein classification algorithm to further improve the accuracy of classification. As shown in Fig. 1, we first integrate the FSFDP (Clustering by fast search and find of density peaks) algorithm [16] into the CRCFV (Classification by Rotary Cutting on Fundus Vessels) focus points classification algorithm designed in this paper. Then we use the vascular connectivity and the structural information that arteries and veins are usually accompanied by each other to classify the blood vessels, and proposes a local retinal artery/vein classification method based on color and structure of retinal vessels.

Fig. 1.
figure 1

The overall flow chart of the proposed algorithm.

2 FSFDP

The FSFDP algorithm is a new clustering method proposed by Alex Rodriguez, which can quickly determine the clustering center and solve the nonlinear classification problem. It is much more efficient than the traditional clustering algorithm. The FSFDP algorithm was published in Journal Science, 2014.

The basic principle of FSFDP: for a set of data points \( Q = \left\{ {q_{1} ,q_{2} , \cdots ,q_{n} } \right\} \) where n is the number of data points, we will calculate the local density \( \rho_{i} \) of each data point \( q_{i} \) and the minimum distance \( \delta^{\prime}_{i} \) from this point to a point with a higher local density.

$$ \rho_{i} = \sum\nolimits_{j = 1}^{n} {\varphi \left( {d_{ij} - d_{c} } \right)} ,\;\varphi \left( m \right) = \left\{ {\begin{array}{*{20}c} {1,\;m < 0} \\ {0,\;m \ge 0} \\ \end{array} } \right. $$
(1)
$$ \delta_{i} = \frac{{\delta^{\prime}_{i} }}{{\mathop {\hbox{max} }\nolimits_{i = 1 \cdots n} \left( {\delta^{\prime}_{i} } \right)}},\;\delta^{\prime}_{i} = \left\{ {\begin{array}{*{20}c} {\mathop {\hbox{min} }\nolimits_{{j:\rho_{j} > \rho_{i} }} \left( {d_{ij} } \right),\;\rho_{i} < \rho_{max} } \\ {\mathop {\hbox{max} }\nolimits_{j = 1 \cdots n} \left( {d_{ij} } \right),\;\rho_{i} = \rho_{max} } \\ \end{array} } \right. $$
(2)

And then \( \rho = \left\{ {\rho_{1} ,\rho_{2} , \cdots ,\rho_{n} } \right\} \) and \( \delta = \left\{ {\delta_{1} ,\delta_{2} , \cdots \delta_{n} } \right\} \) will be obtained. In the above formula, \( d_{ij} \) is the distance between two points \( q_{i} \) and \( q_{j} \), \( d_{c} \) is the threshold for calculating the local density, \( \rho_{max} \) is the maximum in the local density set \( \rho \), and \( \delta_{i} \) is the normalization of \( \delta^{\prime}_{i} \). According to the formulas (1) and (2), the only points with higher \( \delta \) and higher \( \rho \) are the clustering centers. As can be seen from Fig. 2, points 26, 27 and 28 have relatively higher \( \delta \) and lower \( \rho \), because they are isolated and can be considered as clusters composed of a single point. Points 1 and 10 have both higher \( \delta \) and higher \( \rho \), they can be judged as clustering centers. While the other data points have higher \( \rho \) and lower \( \delta \), they are judged to be the points near the clustering centers.

Fig. 2.
figure 2

The algorithm in two dimensions. (A) Point distribution. Data points are ranked in order of decreasing density. (B) Decision graph for the data in (A). Different colors correspond to different clusters. (Color figure online)

The FSFDP algorithm can be used to solve the nonlinear classification problem. Because there is no iterative and complex computation in the algorithm, the clustering centers can be found quickly. The keys of the algorithm are to reasonably describe the data set \( Q \) according to the actual problem and to define the correct and proper distance \( d_{ij} \).

3 CRCFV

In this paper, we consider the non-uniformity of the light in the fundus images and the difference in the color of the adjacent arterial and venous blood vessels, and propose a CRCFV algorithm for the classification of arteries and veins in the local area. We extract the focus points within the region of 1DD – 1.5DD, which is of particular interest in medicine, and define the data set \( Q \) and the distance \( d_{ij} \) based on the color features.

3.1 Focus Points and Color Features

In Fig. 3(a), we draw three concentric circles \( R_{1} ,R_{2} ,R_{3} \) with the radius \( r_{{R_{1} }} = 1DD,\;r_{{R_{2} }} = 1.25DD,\;r_{{R_{3} }} = 1.5DD \). And then we define the points where the circles intersect the center lines of the blood vessels as the focus points, denoted as \( Q = \left\{ {q_{1} ,q_{2} , \cdots ,q_{n} } \right\} \). The position of each point \( q_{i} \left( {i = 1,2, \cdots ,n} \right) \) in the image is represented as \( \left( {x_{{q_{i} }} ,y_{{q_{i} }} } \right) \), as shown in Fig. 3(b).

Fig. 3.
figure 3

Extraction of the focus points

In the fundus retinal images, arteries are often brighter than the veins. In this paper, we use formula (3) to obtain its color feature vector for each focus point.

$$ I_{{q_{i} }} = \left[ {R_{{q_{i} }} ,G_{{q_{i} }} ,B_{{q_{i} }} } \right]_{1*3} $$
(3)

where \( R_{{q_{i} }} = \frac{{\sum\nolimits_{m = - 1}^{1} {\sum\nolimits_{n = - 1}^{1} {I_{r} \left( {x_{{q_{i} }} - n,y_{{q_{i} }} - m} \right)} } }}{9} \), \( G_{{q_{i} }} = \frac{{\sum\nolimits_{m = - 1}^{1} {\sum\nolimits_{n = - 1}^{1} {I_{g} \left( {x_{{q_{i} }} - n,y_{{q_{i} }} - m} \right)} } }}{9} \), \( B_{{q_{i} }} = \frac{{\sum\nolimits_{m = - 1}^{1} {\sum\nolimits_{n = - 1}^{1} {I_{b} \left( {x_{{q_{i} }} - n,y_{{q_{i} }} - m} \right)} } }}{9} \), and \( I_{k} \left( {x,y} \right)\left( {k = r,g,b} \right) \) is the gray value of each color channel. Thus, the feature matrix of the focus points can be obtained.

$$ I = \left[ {I_{{q_{1} }} ,I_{{q_{2} }} , \cdots ,I_{{q_{i} }} , \cdots I_{{q_{n} }} } \right]_{n*3}^{T} $$
(4)

3.2 CRCFV Algorithm

There is a general problem that the color and brightness are uneven in the fundus images. The color and the brightness of arteries and veins vary widely in different regions. In this section, we propose a CRCFV algorithm to achieve the artery/vein classification of the focus points. The specific steps are as follows:

  1. 1.

    Divide the image into four parts by creating a coordinate system with the center of the disc as the origin, thus the set of focus points Q are separated into four subsets \( Q_{1}^{\left( k \right)} ,Q_{2}^{\left( k \right)} ,Q_{3}^{\left( k \right)} ,Q_{4}^{\left( k \right)} \left( {k = 1,2, \cdots ,9} \right) \), and k represents the k th division of Q.

  2. 2.

    For each subset \( Q_{i}^{\left( k \right)} \left( {i = 1 \cdots 4,\;k = 1 \cdots 9} \right) \), the color difference between the points \( q_{i} \) and \( q_{j} \) is defined as:

    $$ d_{ij} = \sqrt {\left( {R_{{q_{i} }} - R_{{q_{j} }} } \right)^{2} + \left( {G_{{q_{i} }} - G_{{q_{j} }} } \right)^{2} \,* \, \gamma + \left( {B_{{q_{i} }} - B_{{q_{j} }} } \right)^{2} } $$
    (5)

    where \( \upgamma > 1 \), because the Green channel has stronger color information [17].

  3. 3.

    The FSFDP algorithm is used to calculate the two clustering centers of the focus points in each subset \( Q_{i}^{\left( k \right)} \). And then we classify the other focus points to arterial point or venous point based on the color difference, which is recorded as the first classification result.

  4. 4.

    Rotate the coordinate axis counter-clockwise by 20°, and repeat step 3 until the coordinate axis is rotated nine times to return to the initial state. Then we will get nine kinds of division results of the focus points.

  5. 5.

    In step 4, nine classification results are obtained. For each point, the final classification result is determined by voting.

    $$ R_{e}^{\left( 1 \right)} = \left[ {r_{{q_{1} }}^{\left( 1 \right)} ,r_{{q_{2} }}^{\left( 1 \right)} , \cdots ,r_{{q_{i} }}^{\left( 1 \right)} , \cdots r_{{q_{n} }}^{\left( 1 \right)} } \right]_{1*n} $$
    (6)

4 The Improvement of the CRCFV Algorithm

The CRCFV algorithm only uses the color information of the arterial and venous blood vessels. Compared with other methods that use only color information to distinguish arteries and veins, the accuracy of this method is also limited. The simulation is carried out for 40 pictures of the public database DRIVE. But the average accuracy rate is only 77.05%. According to medical knowledge that the blood vessels have connectivity, in the same blood vessels, the classification results of the focus points are consistent. In addition, arteries and veins are usually accompanied by each other, there can not be all arteries or all veins in a region. This section summarizes and quantifies these medical prior knowledges. First, we determine the correspondence between the focus points and the blood vessel. On the basis of CRCFV algorithm, we propose an artery/vein classification method that integrates arteriovenous vascular structure information.

4.1 Vascular Connectivity

Vascular connectivity indicates that all focus points on the same vessel belong to the vein or belong to the artery. The classification of focus points is only dividing the set of focus points Q into two categories according to the color and the brightness information, not for blood vessels yet. In order to make the classification results of focus points consistent with vascular connectivity, and then to classify the blood vessels, we must firstly solve the problem of how to obtain the correspondence relationship between the set of focus points Q and the set of blood vessels \( {\text{l}} = \left\{ {l_{1} \cdots l_{i} \cdots l_{k} } \right\} \), where \( l_{i} \) represents the vessel, and k represents the number of blood vessels. This paper uses the connectivity determination method to solve the above problems.

The connectivity determination method: Fig. 4(b) shows the basic process of connectivity determination. Suppose that \( q_{i} \, \in \,R_{n} \left( {n = 1,2,3} \right) \) and \( q_{j} \, \in \,R_{m} \left( {m \ne n} \right) \), which means that \( q_{i} \) and \( q_{j} \) are not in the same circle. In Fig. 4(b):

Fig. 4.
figure 4

The relationship between the focus points and the blood vessels

$$ D_{f} = r_{3} - r_{1} + c $$
(7)

Determine a local image with \( q_{i} \) as the center and \( D_{f} \) as the radius. In the local image, the connected domain set \( L = \left\{ {L_{1} \cdots L_{i} \cdots L_{k} } \right\} \) is determined by 8 neighborhood rules, where \( L_{i} \) denotes the i th connected domain in the region and n denotes the number of connected domains. If \( q_{i} \) and \( q_{j} \) belong to the same connected domain \( L_{i} \), then \( q_{i} \) and \( q_{j} \) belong to the same vessel \( l_{i} \).

4.2 Artery/Vein Classification with Vascular Structural Information

After the analysis of Sect. 4.1, the corresponding relationship between the set of blood vessels l and the set of focus points Q was determined. Assuming \( q_{i} \, \in \,l_{i} \), we can correct the classification of the focus points on the same vessel according to the following rules, and then achieve the classification of blood vessels:

$$ r_{{q_{i} }}^{\left( 2 \right)} = \left\{ {\begin{array}{*{20}l} a \hfill & {\sum\nolimits_{{q_{{i \in l_{i} }} }} {J_{{q_{i} }} > 0} } \hfill \\ v \hfill & {\sum\nolimits_{{q_{{i \in l_{i} }} }} {J_{{q_{i} }} < 0} } \hfill \\ u \hfill & {\sum\nolimits_{{q_{{i \in l_{i} }} }} {J_{{q_{i} }} = 0} } \hfill \\ \end{array} } \right.,\;r_{{l_{i} }} = \left\{ {\begin{array}{*{20}l} {a_{l} } \hfill & {\sum\nolimits_{{q_{{i \in l_{i} }} }} {J_{{q_{i} }} > 0} } \hfill \\ {v_{l} } \hfill & {\sum\nolimits_{{q_{{i \in l_{i} }} }} {J_{{q_{i} }} < 0} } \hfill \\ {u_{l} } \hfill & {\sum\nolimits_{{q_{{i \in l_{i} }} }} {J_{{q_{i} }} = 0} } \hfill \\ \end{array} } \right. $$
(8)

In the above formula, \( J_{{q_{i} }} = \left( { + 1, - 1,0} \right) \) indicates that the classification result of point \( q_{i} \) before correcting is artery, vein or uncertain point, respectively. And \( r_{{q_{i} }}^{\left( 2 \right)} \) is the result of point \( q_{i} \) after correcting. \( a,v,u \) respectively indicates that the classification result is arterial, venous and uncertain. \( r_{{l_{i} }} \) represents the classification result of blood vessel \( l_{i} \). We can get the matrix of the corrected classification results, as follows:

$$ R_{e}^{\left( 2 \right)} = \left[ {r_{{q_{1} }}^{\left( 2 \right)} ,r_{{q_{2} }}^{\left( 2 \right)} , \cdots ,r_{{q_{i} }}^{\left( 2 \right)} , \cdots ,r_{{q_{n} }}^{\left( 2 \right)} } \right]_{1*n} ,\;R_{el} = \left[ {r_{{l_{1} }} ,r_{{l_{2} }} , \cdots ,r_{{l_{i} }} , \cdots ,r_{n} } \right]_{1*k} $$
(9)

Although the formula (9) gives the classification result of the blood vessels, but a large number of simulation results show that if several adjacent blood vessels are all identified as veins, this is likely to be wrong. Because this classification result violates the structural characteristics that arteries and veins are usually accompanied by each other. In the implementation of the above classification method, we have to follow the following rules:

  1. 1.

    If the classification results of two or more adjacent points on the same circle are all arteries or are all veins, all the focus points on the corresponding vessels are reclassified according to steps 2–3 in Sect. 3.2.

  2. 2.

    If the classification result obtained in the above step is changed, the classification result is corrected according to the formula (8).

5 Experimental Results Analysis

This method is applied to the public database DRIVE. DRIVE is a fundus image database that is widely used to research the related problems of retinal images. It is taken from 400 different individuals aged 25–90 years, and forty of all images are randomly selected, seven of which are fundus images with early diabetic retinopathy. Each image contains 565 * 584 pixels, and each pixel contains 8 bits of color information. In order to determine the accuracy rate of automatic classification results, we invite an ophthalmologist to determine the results of artery/vein classification. In this paper, the simulation results of all the images in DRIVE are used for statistical calculation. The final results of artery/vein classification are shown in Fig. 5, where “o” and “+” respectively indicate the classification of focus points as veins and arteries.

Fig. 5.
figure 5

Examples of artery/vein classification results

The accuracy of this method is calculated as follows:

$$ {\text{Accuracy}} = \frac{{n_{c} }}{{n_{c} + n_{i} + n_{n} }}*100\% $$
(10)

where \( n_{c} \) represents the number of focus points that are correctly classified, \( n_{i} \) represents the number of focus points that are misclassified, and \( n_{n} \) represents the number of focus points that can not be classified by the computer.

Table 1 shows some artery/vein classification methods published in recent years. Compared with other methods, our method can achieve more accurate classification.

Table 1. Comparison about the accuracy of several classification algorithms

6 Conclusion

In this paper, all 40 images in the DRIVE database are used for simulation, and 1531 focus points are extracted. The results of artery/vein classification are judged by an ophthalmologist. By comparing the final results of this paper with the methods published by other scholars in recent years, the accuracy rate of this paper is 93%, which is at a high level. The main contents and achievements of this paper are summarized as follows:

  1. 1.

    Based on the color information of blood vessels and the FSFDP algorithm, we propose a CRCFV algorithm to achieve the artery/vein classification of focus points under the condition of color and brightness unevenness.

  2. 2.

    Considering the structural features of vascular connectivity and companion, we propose an artery/vein classification method of blood vessels that integrates vascular structure information, which is a progressive fundus arteries and veins recognition algorithm.

In further research, we will consider using vascular color and location information simultaneously to improve the accuracy of classification and solving the problem that the classification result depends on the results of the extraction of vascular trees.