Keywords

1 Introduction

Liver diseases may be roughly divided into two categories, focal diseases, where the abnormality is concentrated in small area, and diffused diseases, where the abnormality is distributed all over the whole liver volume [1]. Different, noninvasive (in the sense that they do not require surgery for the patient), diagnostic imaging techniques, such as Magnetic Resonance Imaging (MRI), Computer Tomography (CT) or Ultrasound Tomography (UT), can be effectively used for preliminary diagnosis and for planning surgery interventions or pharmacological treatments. However, Light Microscopy (LM) represents the method by which pathologists study and review histological sections and the observations by LM can be considered the gold standard for making diagnosis and for its diagnostic accuracy, in particular regarding the possibility of defining the heaviness of a given pathology at a very high resolution. The classes that can be defined through the observation of LM images of the liver are: normal, steatosis, fibrosis, cirrhosis and hepatocarcinoma. Normally, a pathologist has to examine by LM many histological sections to perform a complete and accurate diagnosis. For this reason, an automatic system for the analysis of LM images of the liver would be particularly useful. Aim of this paper is to define a complete procedure for automatic classification of LM images presenting different pathologies affecting liver parenchyma. The problem considered in this paper has been addressed by many researchers [2]. Combination of methods from traditional image analysis and sophisticated machine learning and pattern recognition techniques has yielded interesting texture based information and effective quantitative characterization for a number of applications of practical interest, including medical image analysis [3,4,5]. Since the possible textures of interest may be very different, several methods can suit for different kind of medical images. Basically we may distinguish between statistical, spectral and structural analysis of textures; in particular, in texture analysis, one of the most difficult aspects is to define a set of features that adequately describe the characteristics of a texture [6].

Wavelet transform and Fisher Linear Discrimination Analysis are efficiently used in [7] in color medical images for liver fibrosis identification. A wavelet multi resolution analysis on the three color image components is applied to reduce the liver slice background noise, thus increasing the discrimination power of the Fisher algorithm in segmenting the liver fibrotic tissue from the other tissues on pathological section images. In [8] focal lesions in ultrasound images of the liver are automatically assigned to four classes (normal, cyst, benign and malignant masses). The texture features are extracted by four procedures (grey level co-occurrence and run length matrices for the statistical properties, Gabor wavelets and 2D Laws for the local spectral content). The two sets of textures features are reduced by either a manual or a PCA based selection. The former reduced set is classified by neural networks and the latter by k-means. The neural network achieves a higher correct classification rate than the k-means in this experiment. In [9] fractal dimension and M-band wavelet transform are used for composing the feature vector in the classification of ultrasonic liver images. Three conditions of normal, cirrhosis, and hepatoma are recognized with a high classification rate.

In [10] statistical methods of texture analysis are applied on microscopic liver images, in particular of liver fibrosis; the sensitivity of texture analysis is tested when fibrotic and normal tissues are stained with different fibrosis biomarkers. The texture analysis is performed by using the co-occurrence matrix and the run-length matrix; a classification using agglomerative hierarchical clustering and linear discriminant analysis with cross validation is applied on different biomarkers that in some cases influenced the results. In [11] an interesting review on machine learning techniques combined with image processing methods for automatic segmentation of liver CT and MRI images is presented; a particular attention is devoted to SVM based techniques [12, 13] that assumed as input texture descriptors. In [14] region-based shape descriptors, gray level and co-occurrence matrix (GLCM) features are adopted for automatic CT image classification, by SVM, of specific liver diseases like cysts, hepatoma, and cavernous hemangioma.

In this paper, LM liver images are analyzed to distinguish different tissue types: normal, steatosis, fibrosis, cirrhosis and hepatocarcinoma, [15,16,17,18]. A classification method is designed to assign a sample image to one of the five classes. The color images are first reduced to grey level scale and then a first level classification is accomplished to identify the steatosis liver tissue; to this aim a suitable segmentation algorithm is applied along with an object analysis to detect the roundish, smooth edge, fat droplets. The ratio of fat droplets area over the total image area determines a quite robust indicator to reliably distinguish the steatosis class from the others. Images of the remaining four classes are characterized by texture analysis by considering two groups of features: statistical properties of the grey level value (contrast, uniformity, entropy), and statistical features of the grey level spatial distribution as obtained by the co-occurrence matrix (contrast, correlation, homogeneity, energy). Any sample image of the given class is partitioned into tiles of suitable size, and the average and standard deviation of the texture descriptors are computed over the set of tiles of each image of the training set. Fourteen textures features are obtained with a good separation between classes (strong correlation within classes, and weak correlation between classes). It is worth noting that the tiling procedure strengthens the local character of the texture parameters in order to better capture the local parenchyma structure in the different tissues (that is sometimes very subtle as between fibrosis and cirrhosis tissue). The set of features is processed by PCA to obtain a more efficient representation of the information content used to train four SVMs binary classifiers. High correct classification rates are obtained for each class, and the ROC curves denote a quite satisfactory behavior of the classifiers over a repeated random selection of the training set. The result is a very flexible and general purpose approach for the classification of the LM images of the human liver. In a future work, by using a richer image dataset, the proposed approach will also be applied to images where multiple kinds of tissue are present.

The paper is organized as follows. In Sect. 2, the LM structure of the liver parenchyma, for the considered five classes, is described and the features extraction and classification procedure is proposed. In Sect. 3 the numerical results are presented and discussed. Conclusions and future developments are outlined in Sect. 4.

2 Materials and Methods

In this paper microscopic images of different kind of liver tissue are observed under light microscope. Two independent pathologists examined various histological sections of the hepatic parenchyma and, on the basis on specific structures, they placed samples in different groups and described the relevant and specific shapes they considered to define the allowance to different groups. The considered images may be grouped into five classes: normal (N), steatosis (S), fibrosis (F), cirrhosis (C), HCC (H) see Fig. 1, though the transition from one class to the other is often gradual and different states could be contemporary present (steatosis aspects are interleaved with normal tissue; fibrotic structures can be also present in an early cirrhosis; focal steatosis sis present in alcoholic cirrhosis; etc.). The automatic analysis of this kind of samples may present a number of technical issues due to the contemporary presence of different states. In fact, irregular regions that can be easily detected by LM may represent pathologies completely different and not so well distinguishable (for example, fibrotic tissue can be easily present in mainly cirrhotic images).

Fig. 1.
figure 1

The microphotographs show different classes of liver parenchyma (Haematoxylin & Eosin, original magnification 4X). (a) normal liver; (b) steatosis; (c) fibrosis; (d) cirrhosis; (e) HCC.

For each class, a binary classifier simply labels the test image as belonging or not to a given class. Then, the overall classification process consists in the application of five binary classifiers, according to the block diagram of Fig. 2, one for each class (N, S, F, C and H), due to the fact that images belonging to different classes can be very different, even if, in some cases, different pathologic states could be contemporary present. The result is a binary string containing 1 where the answer for a specific class is positive and 0 in the case of negative answer. Each of the five binary classifiers is structured for the specificity of the tissue to be recognized.

Fig. 2.
figure 2

Block diagram of the classification process.

As a matter of fact, a fatty liver tissue is mainly characterized by the presence of roundish fat droplets spread over the liver surface, therefore a steatosis can be easily classified by object segmentation and evaluating shape and size of bright items over the background; a fat presence indicator can be defined and a suitable threshold value determined to discriminate easily between a steatosis/non steatosis condition. The other kinds of tissue present diffused abnormalities that can be described by texture analysis: the grey level texture features and the grey level spatial distribution texture features computed by the co-occurrence matrix.

Therefore the first result that must be assessed is whether a tissue is a steatotic one or not. If the answer is negative a set of features, adequately transformed by the PCA, are used to train SVMs classifiers, as will be described in the following.

2.1 Steatosis Characterization

An image binarization is sufficient to distinguish the fat droplets (typical of steatosis) as the brightest objects over the background. We applied the discrete level set approach proposed in [19]. On the binarized image, the brighter objects are isolated and, between them, the fat droplets are identified by filtering the size and the shape, preserving only the non-eccentric non-ragged objects with a significant area. On the selected set of items, the Percentage of Fatness (PoF) is computed as the total area of fat droplets (number of white pixels) over the image size.

An image is classified as a steatosis one if the PoF is above a chosen threshold; by analysing the data of all the classes it was noted that generally there is a difference of an order of magnitude between the PoF of an S image and the others.

2.2 Feature Extraction for Non-steatosis Aspect

The liver images not belonging to class S denote an appearance that is difficult to characterize as objects over a background. Even though some structures are detectable (as described above), the very difference between the classes N, C, F and H is mainly due to the texture structure. The considered texture features belong to two groups: the first group is related to the grey level, the second to the grey level spatial distribution as characterized by the co-occurrence matrix. The features are computed on a training set of \( N_{tr} \) images by the Matlab Image Processing Toolbox standard functions; each class contributes with the same number of sample images.

For the grey level texture features, each image of size \( m \times n \) is partitioned assuming tiles \( T \) of size \( \ell \times \ell \) (a part of each rectangular image is left out). The grey level texture features considered are the Contrast, the Uniformity and the Entropy. The contrast \( C_{T} \) is a measure of the variability of the grey level within the tile, the higher the contrast the better the details are identified over the background. The uniformity \( U_{T} \) and the entropy \( E_{T} \) describe the degree of regularity of the grey level values in a tile: if all the pixels have the same grey level it would be \( U_{T} = 1 \) and \( E_{T} = 0 \), meaning that the tile is maximally uniform (it has constant grey value indeed) and maximally ordered (all the pixels are equal). On the contrary an unstructured noise, would have the grey values all different and maximally disordered (\( U_{T} = 1/L \) and \( E_{T} = \log_{2} L \)). The image features are then computed as average and standard deviation over the set of tiles, obtaining six features.

The grey level spatial distribution texture features may be characterized by defining some relations between the grey level values of neighbouring pixels, and by computing the frequency of the occurrence of any such a relation in the whole image domain. These frequencies for all the pairs of grey level values \( \left\{ {g_{i} ,\,g_{j} } \right\} \) define the co-occurrence matrix \( GCO \). Quantities like Contrast, Correlation, Homogeneity and Energy are computed as averages over the whole image weighted with the entries of \( GCO \), so that the local spatial distribution constraint at different scales \( s = 1,2, \ldots ,\bar{s},\, \) and directions \( d = 1,2, \ldots ,\bar{d} \) is taken into account. These quantities have similar meaning of the ones previously defined but their values depend of the given scale and direction. For each image of the training set, eight features are obtained by the average and standard deviation of the contrast, the correlation, the homogeneity and the energy for each pair of scale-direction values.

2.3 Feature Analysis and Principal Component Analysis

Fourteen features are computed for the \( N_{tr} \) images of the training set and are collected in the matrix \( \Phi \) of dimension \( N_{tr} \times 14 \). A correlation analysis of these features among the \( N_{tr} \) images showed that the features are highly correlated within each class but substantially uncorrelated between classes. This in turn suggests that the selected set of features is suitable for classifying liver images belonging to the chosen four classes. Nevertheless, numerical experiments showed that in the space of these features the images are not linearly separable, therefore the use of the correlation analysis would deliver a classification system with poor performances. Therefore a more efficient representation of the data is advisable; this could obtained by principal components analysis, choosing, for any class training set of images, \( p^{*} \) principal components maintaining the \( P\% \) of the information content.

2.4 Binary Classifier Training

So far, for any of the four classes N, F, C, and H, a set of \( p^{*} \) principal components is selected. For any class, a binary classifier is trained to recognize a test image either belonging to the class or not; the procedure is briefly outlined for a single class, being the same for all classes. Consider the class N and let \( \tilde{E}_{N} \) be the selected \( p^{*} \) principal components. Compute now the coordinates \( \tilde{L}_{N,\,N} \) of the features vectors \( M_{FN} \) of the \( N_{tr,\,N} \) images of the training set of class N: for any row vector of \( \tilde{L}_{N,\,N} \) the response variable of the class N classifier is set to 1. Now repeat the process for the training set images of the remaining classes determining \( \tilde{L}_{N,\,C} \), \( \tilde{L}_{N,\,F} \), \( \tilde{L}_{N,\,H} \) and set the classifier response variable to 0 for these set of coordinates. The perfect classifier would separate the points with response equal to 1 from the points with response equal to 0. Such a task can be accomplished by training a SVM, [12]: it is a well-established method aiming at the determination of the best hyperplane (in general a manifold) able to separate a set of response points into two classes. The parameters of the SVM are determined by using the ten-fold cross validation [20] and the classification is performed by LIBSVM 3.18 [21].

Four classifiers Ci i = 1, 2, 3, 4 are trained to determine if the image belongs to the class N, F, C or H respectively; it means that, for example, the classifier C1 is trained to identify the normal images, i.e. it is able to distinguish normal tissues versus F, C or H ones, assigning label 1 if the image is classified as belonging to the N class or label 0 if not. When a fibrotic image X is tested with the classifier C1 trained to identify the normal images N versus all the other, the classifier should identify the X image as “not-normal” and assign label “0”, whereas when one uses the “right classifier” C2 (trained to identify the type of images like the X data), the classifier should assign label “1”.

3 Numerical Results and Discussion

The classification procedure proposed in this paper considered a set of 120 images of size \( m \times n, \) pixels, \( m = 543, \) \( n = 780 \), 24 for each of the five classes N, S, F, C, H.

The set of 24 steatosis images is used to tune and validate the steatosis classifier. The remaining group of 96 images is divided into two sets, with images for each class in the proportion of 60% and 40%: the first set, Ntr, is used to train the classifiers whereas the latter, Ntest, is used for testing the classifiers over data not used for the training.

The described classification procedure starts with the decision whether the image could be in the S class or not. The steatotic images are characterized by the presence of circular white elements; as already said, they can be easily determined by a binarization procedure that allows the recognition of white objects. To identify only the fat droplets, a morphological filtering is performed, preserving only objects with area larger than 6 pixels, with eccentricity lower than 0.6. Moreover, to avoid too ragged objects, only white objects whose ratio between their area and the area of the ellipse circumscribing the objects is greater than 0.5 are considered.

The analysis of the percentage of fat droplets in all the images yields an evident difference, see Table 1:

Table 1. Mean values and standard deviations of the percentage of fat in liver tissues images

If a sample image has a percentage of fat less than 1% it can be assumed that the tissue is not in the S class. The fat droplets identification method has provided convincing results approved by pathologists that evaluated by themselves the fat percentage and compared their results with the ones obtained by applying the described automatic method.

To determine the grey level texture features each image is partitioned considering tiles of size \( l = n/10 \). This tiling resolution yields good results to the subsequent classifiers training phase. To determine the Uniformity and the Energy also the number of bins must be fixed and \( L = 4 \) appeared a good choice, allowing a simplification of the data and the preservation of interesting structures. Therefore, from this analysis six features are computed as average and standard deviation over the set of tiles.

As far as the grey level spatial distribution texture features is concerned, four scales \( s = 1,\,2,\,3,\,4 \) and four directions \( d = 0,45^{ \circ } ,\,90^{ \circ } ,\,135^{ \circ } \) for \( L = 4 \) grey level values are considered, thus obtaining a co-occurrence matrix GCO of size \( 4 \times 4 \times 16 \). From the co-occurrence matrix GCO the Contrast, Correlation, Homogeneity and Energy are evaluated, yielding the eight features obtained as their average and standard deviation.

To classify if an image belongs to one of the N, F, C or H class the analysis based on feature classification is performed and four different classifier Ci, i = 1, 2, 3, 4 are designed.

Once the fourteen features are calculated for all the images of the Ntr set, the principal component analysis is applied; to preserve the percentage P = 99% of information, after the evaluation of the eigenvalues of the covariance matrix, the first \( p = 8 \) principal components are retained. For the classification, the chosen kernel function is the radial basis function. Each classifier Ci is trained to assign the label “1” to the i-th class and the label “0” to all the others; more precisely, C1 assigns 1 to the class of normal tissues and 0 to all the others, C2 assigns 1 to the class of fibrotic tissues and 0 to all the others, C3 assigns 1 to the class of cirrhotic tissues and 0 to all the others and C4 assigns 1 to the class of HCC tissues and 0 to all the others. The accuracy in the training phase relies in the percentage of success in assigning the labels “1” and “0”.

The obtained classifiers are tested on the test set of Ntest = 10 images of each class N, F, C, H; in this case we assumed to ignore the nature of the data X to be classified, and by default we initially assign it the label “1”. Therefore the generic classifier should confirm label “1” if it is the classifier trained to identify the class of the specific unknown image X, otherwise the classifier should assign label “0” (meaning that the image belongs to one of the other classes). The mean value of the results over the 10 test images of each class are reported in Table 2. The results reported are obtained as mean values of the results of each classifiers after randomly choosing the training set and the test set, in order to avoid lucky choices of the test images.

Table 2. Results of the test: mean value of the percentage of success of the classifiers.

From Table 2 it could be noted that the percentages of identification are high on the diagonal of the table (default label “1” identified correctly as label “1”), whereas if the test image is tested with a classifier trained to identify with label “1” a different kind of image the percentage of success must be low (default label “1” identified correctly as label “0”).

The results are encouraging; it could be noted in fact that only the 6.25% of test images of normal parenchyma could be confused with a fibrotic one, 16.25% could be confused with cirrhotic aspect and only 1.25% could be wrongly classified as HCC.

As far as test images of class F, when tested versus all the classifiers, they appear clearly identifiable; they are not confused with normal tissues, a percentage of 10% could be wrongly classified as HCC and only 6.67% could be confused with cirrhotic parenchyma. A cirrhotic test image could be confused (percentage of 7.5%) with a normal tissue or with an HCC one and only a 1.25% could be wrongly classified as fibrotic tissue.

The less robust results appear to be the ones connected with the HCC images; for example, a percentage of 25% could be identified as image of the cirrhotic class C. Trying to analyse the motivations of this result, one has to take into account that cirrhosis can be lead to HCC and the latter can appear as a multiple nodules that resemble cirrhotic nodules and for this reason the two histological aspects could be confused by the automatic system.

The classifiers have been further tested by choosing randomly the training and the test set, thus obtaining 10 trials for each classifier. In each trial the true positive rate (TRP, the rate of the images correctly classified), and the false positive rate (FPR, the rate of the images misclassified) were computed. All the trials show a score above the intercept (random classifier) and the most part of them have score between 0.8 and 1 (and therefore confirming the average score reported in Table 2), thus denoting good performances.

It is worth noting that the proposed algorithm is tuned to classify the unknown images into each of the 5 classes by considering the possibility of the contemporary presence of more than one aspects at once. This comes from the overall adopted procedure. However, in the images used therein each tissue can be assigned to a single specific class: in this way it has been possible to tune unambiguously the parameters of the different classifiers. Nevertheless it may happen that on a given sample image different classifiers yield a positive identification, i.e. the method classifies the image as belonging to different classes. These preliminary data of our investigation would only suggest the presence in the liver parenchyma of different pathologies. In this case, the image should be classified as referred to the heaviest pathology between those recognized and the other observed features should additional give information about the possible contemporary presence of different hepatopaties in the same sample.

4 Conclusions and Future Developments

In this paper the classification of different hepatopathies is addressed by proposing an automatic multi-stage procedure. We combine a textural based segmentation method with a support vector machine supervised pattern recognition procedure for automatic classification of microscopic images of liver in order to detect the presence of abnormal regions of a given family of pathologies, thus supporting medical diagnosis. The liver specimen is classified into one of the five following classes: normal, steatosis, fibrosis, cirrhosis and HCC, by considering both object analysis and a machine learning approach. More precisely, the former is used to determine first if the tissue is a steatosis one, by using the presence of the fat bright circular structures as a useful indicator; the machine learning approach is applied to determine if the tissue belongs to one of the other four classes. Suitable features are evaluated considering texture properties of the images and a principal component analysis is applied to derive the best representation of the data to be submitted to the support vector machine. Four distinct binary classifiers are trained providing promising results with good capability in separating the considered data. In this early investigation the selected texture features allowed the training of binary classifiers with encouraging performances that could be further improved by a better description of the spatial distribution of the grey level in the LM liver images; to this aim a richer set of scales and directions values to compute the co-occurrence matrix could be considered, along with some differential characteristics of the image signal. Moreover the overall classifying process will be applied on images containing more than a single pathology in order to establish the nature of the pathology and/or its heaviness. An effort will be done in order to indicate also the percentage of image occupied by different classes. This generalization will be investigated by enriching the data set with LM images containing also mixtures of the discussed five aspects of the human liver parenchyma.