1 Introduction

Liver cancer is a leading cause of cancer deaths worldwide, accounting for more than 600,000 deaths each year. CT is a common non-invasive screening method for detecting liver lesions. However, the large numbers of images in routine liver CT studies, in addition to their high diversity in appearance, have been hurdles for detecting all lesions by visual inspection. Therefore, computer-aided systems play an important role to assist radiologists in automatically detecting liver lesions.

Many works have been undertaken on automated liver lesion detection. Generally, the approaches can be summarized into the following categories: intensity based methods [6], clustering based methods [4], active contour models [5] and machine learning methods [7, 10]. There are some disadvantages associated with existing methods. The intensity based methods are sensitive to noise while the clustering based methods need a pre-defined number of clusters as an input. Active contour methods usually depend on user interaction to supply an initial contour or set of parameters for weighting the energy terms. Supervised learning methods are the widely used methods in lesion detection. They first define specific features to describe lesions, and then use well-trained classifiers to classify spatial regions into abnormal lesions or normal tissues. However, they require a large, diverse training set to ensure good detection performance, and such data sets are difficult to create in the medical domain.

Clinicians are able to distinguish liver lesions because of different characteristics of liver lesions compared with the surrounding normal liver. To model this visual process, we argue that visual saliency computations can be used. The saliency maps for object detection [1,2,3, 8] have become popular in non-medical imaging work. Given an input image, the salient objects have high saliency values while the background has lower values. The saliency method is advantageous since it eliminates the need for any training step of machine learning methods and the need for pre-defined parameters or user interaction that is usually required for the clustering and active contour models. Thus, our approach provides several improvements over prior methods.

In this paper, we propose a novel saliency model for automated detection of liver lesions. Our proposed two-stage saliency method makes use of the characteristic of liver lesions. First saliency map is based on the global gray level contrast to remove blood vessels within the liver and a second local feature contrast saliency map is obtained based on the LLC feature [9] uniqueness and distribution. In addition, the proposed saliency method is based on multi-scale processing considering the different sizes of lesions. Specially, instead of using traditional hand-craft features to describe patch features, we proposed a modified sparse autoencoder (SAE) with image patch manifold information to learn discriminative high level features from raw features. As far as I know, we are the first to use the LLC feature to calculate the saliency.

2 Method

Saliency is a concept describing the tendency of a region of an image to attract an observer’s attention. Many saliency extraction methods exist for natural RGB images [1,2,3, 8]. However, few works have studied the use of saliency to detect abnormalities in medical images. In medical images, domain knowledge about disease plays an important role in helping the radiologist to determine the importance of a region [12]. Usually, liver lesions show different features compared with the normal adjacent liver, such as showing darker gray values. By exploiting these characteristics in a saliency region detection method, we can identify liver lesions.

Our method includes following four steps. First, the image is divided into patches at multiple scales, from small to large patch sizes. Second, two types of saliency measures are calculated for each patch size: (1) patch saliency based on the global gray level contrast, and (2) patch saliency based on the local differences between LLC features (feature contrast). For the first saliency map, we focus on recognizing the surrounding blood vessels and differentiate them from liver lesions. Since the blood vessels in the liver are usually uniformly bright compared with lesion region, the homogeneity (Haralick features) and the gray values within a chosen patch are evaluated and compared with the equivalent values that have been estimated for the whole liver. High gray value and high homogeneity (both) lead to small saliency value. For the second saliency map, we represent patches by extracting LLC features and derive the second type of saliency map for lesions to evaluate the Euclidean distances between patches. These distances are calculated according to their features and spatial locations. The fourth part of the algorithm fuses these two saliency maps together to obtain the final saliency maps.

2.1 Multi-scale Patch Presentation

To facilitate the subsequent multi-scale saliency extraction, we represent images in patches. Choosing a specific patch size to calculate the saliency map is empirical and case-specific. If the patch is too small, the statistics evaluation may not be reliable. On contrary, if the patch is too large, this may lead to region blurring and low contrast. Therefore, we use multi-scale patches to detect the lesion saliency. We consider patch sizes of \(5\times 5, 7\times 7, 9\times 9\) pixels. Then, all saliency maps in all scales are combined.

2.2 Patch Saliency Based on Gray Level Contrast

Detection of blood vessels is required because the blood vessels are dispersed throughout the liver region, which will affect the detection of lesions. Since the blood vessels in the liver are usually uniformly bright compared with lesion region, we propose to evaluate the homogeneity and the gray values within a chosen patch to calculate the corresponding saliency and remove the vessels in the CT images. We first compute the mean gray values of patch i as G(i). Then we calculate a gray level co-occurrence matrix (GLCM) for a given size patch and obtain the homogeneity H(i).

Thus, we design the first level saliency map by

$$\begin{aligned} S1_l (x,y)=1-\frac{G(i)-min(G)}{max(G)-min(G)}\times \frac{H(i)-min(H)}{max(H)-min(H)} \end{aligned}$$
(1)

where G(i) is the mean gray value of the ith patch and min(G) (or max(G) ) is the minimum (or maximum) value of G(i), respectively. \(S1_l\) is the saliency map for the scale l. According to Eq. (1), if the gray value and the homogeneity of an examined patch are higher, the patch will more likely be a vessel and the corresponding saliency value of this patch should be smaller. After obtaining the patch saliency based on gray level contrast for each scale, we calculate the first stage saliency map by fusing these maps with multiplication.

2.3 Patch Saliency Based on Feature Contrast

Patch Feature Extraction with Modified SAE. The second saliency map is calculated based on the LLC feature contrast. Different from using hand-crafted feature based methods, which may have limited representational power, we propose to a modified SAE to learn discriminative high level patch features from raw features.

During the encoding step, the raw features within a patch \(x_i \in \mathbb {R}^M (i=1,...,N)\) is processed by applying a linear mapping and a nonlinear activation function as \(h=f(W_1x_i+b_1)\), where \(W_1 \in \mathbb {R}^{D\times M}\) is a weight matrix, \(b_1 \in \mathbb {R}^{D}\) denotes the encoding bias. M and N are the dimension and numbers of raw features, respectively. f(z) denotes the logistic sigmoid function \((1+exp(-z))^{-1}\). In the decoding step, we decode a hidden representation h using another linear decoding matrix \(W_2 \in \mathbb {R}^{D\times M}\) as \(\hat{x}_i=f(W_2^T h+b_2)\). \(b_2 \in \mathbb {R}^{M}\) is the decoding bias and \(\hat{x}_i\) represents the reconstructed feature of \(x_i\).

Then the object of the modified SAE [11] is to learn parameters \(W_1, W_2, b_1,b_2\) with a sparsity constraint by minimizing the following cost function:

$$\begin{aligned} J_{SAEIM}&=\frac{1}{2} \sum _{i=1}^{N} \Vert x_i-\hat{x}_i\Vert ^2+\frac{\alpha }{2}(\Vert W_1\Vert ^2+\Vert W_2\Vert ^2) \nonumber \\&\quad +\beta \sum _{j=1}^K KL(\rho \Vert \hat{\rho })+\frac{\gamma }{2} \sum _{i,j=1}^{N}\Vert h_i-h_j\Vert ^2F_{ij} \end{aligned}$$
(2)

where the parameter \(\alpha \) controls the penalty term facilitating weight decay and \(\beta \) represents the sparsity penalty control parameter. The first term in Eq. (2) is an average sum-of-squares error term which describes the differences between input \(x_i\) and its reconstruction \(\hat{x}_i\). The second term is a weight decay term that tends to decrease the magnitude of the weight. The third term represents sparsity constraint term, which provides the sparsity connection constraint between layers in SAE. In this item, \(KL(\rho \Vert \hat{\rho _j})\) represents the Kullback-Leibler divergence, where the parameter \(\hat{\rho _j}\) is the average activation of jth input vector \(h_j\) over the N training data and \(\rho \) denotes the sparsity parameter to impose the sparsity constraint to the SAE model. The fourth item represents the neighbourhood constraint item, which helps to discover more accurate features. The matrix of patch geometric information \(F_{ij}\) is defined as follows: If the patch i is with the 8 neigbourhood of the patch j, then the \(F_{ij}\) is 1. Otherwise the corresponding value in \(F_{ij}\) equals \(-1\). The patch manifold constraint is introduced in the traditional SAE model to enforce that patches within the same neighbourhood to be close in the learned feature space and patches in different neighbourhood to be kept far away. In this way, our proposed modified SAE model is able to obtain more discriminative features compared with traditional hand-crafted features.

In the proposed modified SAE model, there are four free parameters: the weight decay cost parameter (\(\alpha \)), the sparsity penalty control parameter (\(\beta \)), the sparsity parameter (\(\rho \)) and the patch manifold constraint parameter (\(\gamma \)). We empirically set \(\alpha =0.002, \beta =5, \rho =0.5, \gamma =0.001\) based on the practical tricks.

Patch Feature Representation with LLC. After obtaining the patch features with modified SAE, we generate a dictionary of visual words B through K-means method clustering over the all patch features in the dataset. The number of clusters (K) determines the size of the visual vocabulary. Most of existing saliency methods use traditional low-level features to calculate saliency. Considering that these features ignore local information, we propose to utilize LLC to represent patch information and then calculate saliency.

Then we represent each patch in the image as a LLC histogram.

$$\begin{aligned} \min _{{Z}}~\sum _{i=1}^{N} \Vert {x}_i - {B}z_i \Vert _2^2 + \lambda \Vert {d}_i \odot {z}_i\Vert _2^2,~~{d}_i = exp (\frac{dist({x}_i,{B})}{\sigma }) \nonumber \\ \textit{subject to}~~ {1}^{T}{z}_i = 1,~\forall i, \end{aligned}$$
(3)

where \(\lambda \) is a weight parameter for adjusting the weights of locality, \(\odot \) denotes element-wise multiplication, and \({d}_i \) is locality adapter that gives different freedom for each basis vector proportional to its similarity to the input descriptor \(x_i\). \(dist({x}_i,B)\) is the Euclidean distance between \({x}_i\) and each visual word of the codebook B, and \(\sigma \) is used for adjusting the weight decay speed for locality adapter. With the coding coefficients \(z_i\) for a patch i, we applied the max pooling strategy to obtain the final patch features. Based on the above feature representation, each patch will be represented as a LLC histogram. The feature vector of the ith patch at the l scale is given as \(F_l(i)=[f_1,f_2,...,f_k]\).

Patch Saliency Based on Feature Contrast. Since a region is salient if it is distinct from its surrounding regions, the second saliency map is the average weighted dissimilarity between a center patch and its surrounding patches. The saliency at the i th patch in the level l is defined as:

$$\begin{aligned} S2_l(x,y) = \sum _{j=1,j\ne i}^{N} \frac{d_f(i,j)}{1+d_{location} (i,j)} , \end{aligned}$$
(4)

where \(d_f(i,j)\) is the Euclidean distance between the patch feature vectors \(F_l(i)\) and \(F_l(j)\) of the i-th and j-th patches. \(d_{location} (i,j)\) is the spatial Euclidean distance between the center of patch i and the center of patch j. N is the number of patches in the liver. According to Eq. (4), a patch whose feature vector is different from the other patch features is assigned a higher value. Moreover, those patches that are located further away from the evaluated patch i will have smaller effect on the patch saliency value. The saliency value is set to the same value within the same patch. The final saliency map S2 is also calculated by fusing different scales saliency maps together.

2.4 Final Saliency Construction

The final saliency map is defined by fusing two stage saliency maps together as follows: \(S=w\times S1+(1-w)\times S2\), where w is a weighting factor. Since it is not obvious in advance whether saliency based on feature contrast or based on the gray level contrast is best for identifying lesions, and in order to evaluate the effectiveness of these two saliency maps, we experimentally checked several values of w in the range of [0–1].

Fig. 1.
figure 1

(a) Comparison result of saliency performance with the different weights on the saliency maps. (b) Total precision with different parameters.

3 Results

In this paper, we conducted experiments on a dataset that consists of 179 CT images, which are labeled by three experienced radiologists. The liver boundaries were also traced in each CT image so that we could create an image in which the liver is isolated from the rest of the body structures.

Parameters Analysis for Saliency Detection. There are four free parameters in our experiments: the weight for saliency (w), the patch size (P), cluster numbers (K) and PCA dimension (D). Evaluation of the strategy for fusing the two saliency maps is an important step in the assessment of the models accuracy. We tested w for the values 0, 0.2, 0.4, 0.5, 0.6, 0.8, 1 to obtain the final saliency maps. We also chose experimentally the following parameters: \(P=7, K=120, D=14\) to conduct the experiment. This set of parameters supply the most accurate lesion detection. To evaluate the saliency accuracy that was obtained by using different w, precision-recall (PR) curves, which capture the tradeoff between accuracy and noise as the threshold varies, were computed. The corresponding results are shown in Fig. 1(a). When w is set to 0 or 1, the result is lower than compared with other values. This result shows that both the first and the second stage saliency maps include some useful information for lesion detection, and they complement each other to provide the most accurate information regarding the lesion. The best saliency result was obtained by using \(w=0.5\) (red line) with higher precision-recall value. The precision of the presented saliency model by using different sets of these parameters was tested. P values between 5 and 9 with increments of 2 were evaluated. D between 4 and 16 and number of K between 80 and 160 were analyzed, with increments of 2 and 20, respectively. Fig. 1(b) illustrates the total precision obtained for each parameter combination. The precision remained high throughout the changes in parameters (range of [0.9–0.92]). Small differences of the precision have been demonstrated (Fig. 1(b)), thus the proposed saliency model is robust. In addition, patch size \(7\times 7\) gives higher accuracy compared with other patches. The best performance was obtained by using patch size of \(7\times 7\) pixels, dictionary size of 120 words, and 14 feature dimension (Fig. 1(b)) and the corresponding accuracy is 90.81%. We fixed this set of parameters in the following experiments.

Fig. 2.
figure 2

Experimental results for saliency detection for CT images. (a) Visual comparison results. (b) PR curves.

Comparison Results. We compared our saliency method with four state-of-the-art saliency models: ITTI [2], GBVS [1], SLTA [3], and LFP [8]. The visual comparison of different methods is shown in Fig. 2(a). As can be seen, our approach (last column) can deal well to detect liver lesions. To evaluate the saliency detection performance of different saliency models quantitatively, we calculated corresponding PR curves. Figure 2(b) shows PR curves of different saliency models, which effectively demonstrate that our proposed saliency model can achieve better performance compared with the other models. We then applied an adaptive threshold on the calculated saliency maps using the Otsu’s method and obtained the final detection results. We further compared it with the state-of-the-art lesion detection methods: intensity based methods [6], clustering method [4], supervised learning method [7, 10]. Our method archived 90.81% accuracy and 83.41% sensitivity. The proposed method shows superior performance with an improvement of 10.10%, 18.17%, 7.25% and 6.14% in accuracy, 25.45%, 15.80%, 11.32% and 9.11% in sensitivity compared with the state-of-the-art methods [4, 6, 7, 10], respectively. This result validates the proposed method possesses superior ability to detect the lesions.

4 Conclusion

In this study, we propose a novel two-stage saliency model for liver lesion detection. The first stage saliency map removes the influence of the blood vessels while the second stage saliency map highlights the regions that show significant features compared with other patches. Each saliency map is calculated by using multi-scale patch sizes. Finally, through a weighting fusion of the first and second stage multi-scales saliency maps, we could detect the lesions. These two saliency stages detect the saliency from different aspects, thus they could complement each other to detect the lesions. Experiment results show that the overall accuracy of our methods are 90.81%. Our proposed model has negligible sensitivity to parameter tuning, and it is robust, fully automated.