Keywords

1 Introduction

Image segmentation consists of recognizing and delineating the edges of a particular object contained in an image. Several approaches accomplish this task employing distinct strategies such as region-growing or clustering algorithms [10]. Human interaction during the segmentation or database training are often required in order to achieve better results.

Within the medical area, image segmentation is key to an efficient diagnosis and treatment of diseases [14, 15]. It may be used to isolate or highlight objects contained in an image of interest such as organs, tissues, and tumors. For that purpose, it is mandatory to provide an user-friendly interface that enables the specialist to visualize desired information without understanding unnecessary technical concepts employed by image processing and analysis techniques.

This work presents an implementation of a user-friendly framework with an interface in which a specialist may perform segmentations using four different kinds of contour tracking methods: Live-wire, Riverbed, Lazywalk, and straight lines. It runs all implemented methods simultaneously, allowing the user to select the most suitable for each segment. A classifier suggests the best method as the default, based on the contour segment features. Quantitative validation of the proposed framework employs a robot user which simulates human interaction.

2 Technical Background

2.1 Image Foresting Transform - IFT

The Image Foresting Transform (IFT) [8] is a methodology extensively used for implementing several image processing operators including image segmentation [4,5,6,7]. In its context, the image is defined as a weighted graph \(\mathcal{G} \in \{V, E, w\}\) where V is a set of vertexes composed by each pixel, and E is an edge set defined based on a binary adjacency relation A between pairs of pixels. \(w: E \rightarrow \mathfrak {R}\) is a function which assigns a weight value for each edge \(e \in E\). The goal is to compute an optimum path forest from \(\mathcal G\) by apply a generalized version of Dijkstra algorithm [3] with multiple source vertexes defined by a set of seeds \(S \subset V\) and a smooth path propagation function \(f(\pi )\) for all paths \(\pi = \{v_0, v_1, ..., v_n\} \in \varPi \), \(v_i \in V\), with \(e_{ij}=\langle v_i,v_{i+1} \rangle \in E\). This way, seeds compete among themselves for the most connected pixels in the entire image domain according to a greed path-propagation function.

The most commonly used adjacency relations are symmetric defining edges for small neighborhoods around each vertex in the image domain, so that \(e_{ij},e_{ji} \in E\) if and L2 distance function \(d(v_i,v_j) \le \alpha \), for a small value of constant \(\alpha \).

The path propagation function has two components: an initial value for trivial paths of only a single vertex and a propagation value for extended paths during the algorithm computation. Equations 1 and 2 contain two commonly used functions for several image processing problems. In these equations, \(\delta \) is an initial value assigned to a trivial path {v}, \(\pi \cdot v\) is the concatenation of a vertex v to the end of a path \(\pi \).

$$\begin{aligned} f_{sum}(\langle v\rangle )= & {} \delta \nonumber \\ f_{sum}(\pi \cdot v_j)= & {} f_{sum}(\pi ) + w(\langle v_i,v_j \rangle ) \end{aligned}$$
(1)
$$\begin{aligned} f_{max}(\langle v\rangle )= & {} \delta \nonumber \\ f_{max}(\pi \cdot v_j)= & {} max\{ f_{max}(\pi ),\ w(\langle v_i,v_j \rangle )\} \end{aligned}$$
(2)

Algorithm 1 describes IFT execution. The inputs are an image I, an adjacency relation A, a path propagation function f, and a seed set S. The outputs are a predecessor expressed as a function of the vertexes \(P: V \rightarrow V, nil\) and a value map given by function \(C: V \rightarrow \mathfrak {R}\). The predecessor map stores the predecessor of each vertex in the optimum forest, that is, if \(P(v_i)=v_j\), \(v_j\) is the predecessor of \(v_i\), and if \(P(v_i)=nil\), \(v_i\) is a root of the forest. The value map contains the optimum value of the path arriving at each vertex. The auxiliary structures are a priority queue Q that sorts the path values in non-decreasing order and the temporary variables \(prop\_val\), status, \(v_i\), and \(v_j\).

In line 1, the initial graph is constructed based on the image dimension and the adjacency relation. Then, in the first loop (lines 2–8) the seeds are distinguished from the other vertexes. \(\delta = 0\) is assigned to all seed vertex values (line 4) and \(\delta = +\infty \) for the others (line 8). Also, all seeds are inserted into the priority queue (line 5) and their predecessor function is set to nil (line 6). Finally, in the second loop (lines 19–17), the paths are propagated and the optimum-path forest generated. The loop ends when the priority queue is empty, meaning that all vertexes have been processed (line 9). A vertex \(v_i\) with minimal value is removed from the queue (line 10) and propagates its path to each adjacent \(v_j\) (line 11). The path is only propagated if the proposed value (computed in line 12) is lower than its current value (line 13). In this case, the path value and predecessor of \(v_j\) are updated (lines 16 and 17) and the conquered pixel is inserted into the queue (line 15) if it was not there yet (test in line 14).

figure a

2.2 Contour Tracking Algorithms

Live Wire (LW) is a contour tracking technique which is most commonly used in a semi-automatic fashion [9]. It may be implemented using the IFT algorithm by utilizing the path-propagation function \(f\_sum\) in Eq. 1. The edge weight function w is defined by the complement of the gradient (e.g. Sobel, Canny) of the image. The seed set consists of vertexes (or pixels) on the contour to be delineated. Finally, the adjacency relation is normally symmetric with \(d(v_i,v_j) \le \sqrt{2}\). After running the IFT algorithm, the contour is given by walking path vertexes backwards using the predecessor map P. The same strategy is also employed by Riverbed (RB) contour tracking technique [12], which simulates the water flow going down a riverbed. The only difference of RB with respect to LW is the usage of \(f_{l\_max}\) in Eq. 3 as the path-propagation function.

Because of the summation in \(f\_sum\) function, LW is robust to the presence of weak contours with small discontinuities and it tends to favor shorter tracks. RB, on the other hand, will follow paths with local maximal value despite of its origins. As a result, RB is capable of following strong contours with unlimited length, but it does not have a good behavior in the presence of small gaps or high-frequency noise.

$$\begin{aligned} f_{l\_max}(\langle v\rangle )= & {} \delta \nonumber \\ f_{l\_max}(\pi \cdot v_j)= & {} f_{max}(\pi ) \end{aligned}$$
(3)

Lazywalk (LZ) algorithm was proposed to estimate the level of water bodies in Remote Sensing Images [1]. It employs the path propagation function \(f_{max}\) in Eq. 2. The idea of this method is to overcome the weakness of both the LW and the RB. Nevertheless, LZ fails to follow paths with several discontinuities.

Most of the times, the contour detected by these algorithms is not acceptable to a variety of applications. The solution used since the first implementation of LW was to track sections of the contour running the algorithm more than once. The final pixel of the first execution is the seed of the second. In this context, seeds are called anchors. Figure 1 shows an example of the execution of LW, RB, and LZ to track the external contour of the brain in a magnetic resonance image.

2.3 Supervised Classifiers

Descriptors. Image descriptors are used in machine learning algorithms for a series of distinct tasks such as image classification and content based image retrieval [11]. Descriptors summarize important information related to color, texture, intensity, and shape from images allowing a faster and more comprehensive evaluation of their content. As medical images such as computed tomography, ultrasound, and magnetic resonance only have one color channel, the focus of this paper will be on intensity, texture and shape descriptors.

Quantized image histogram is an intensity descriptor which removes all spatial distribution information and summarizes the frequency in which intensities appear in the image. A vector bin stores the quantity of pixels in intensity ranges. If the histogram is normalized, each bin contains the probability of the intensity range for a pixel chosen randomly.

Texture based descriptors may be derived from statistics by computing the moments of the histogram given by Eq. 4, where L is the number of bins of the histogram, \(z_{i}\) is the intensity of the pixel, \(p(z_{i})\) is the probability of intensity \(z_{i}\), and n is the number of the moment. For instance, with \(n=1\) Eq. 4 computes the average of intensities and for \(n=2\) it denotes their variance.

$$\begin{aligned} \mu _{n}= & {} \sum _{i=0}^{L-1} (z_{i} -m)^{n} p(z_{i}) \\ m= & {} \sum _{i=0}^{L-1} z_{i} p(z_{i}) \nonumber \end{aligned}$$
(4)

The Local Binary Pattern (LBP) is another texture descriptor [13]. It employs a sliding window over the image setting to 1 the pixels with intensity greater than or equal to the central pixel and setting to 0 the others. Then, each pixel is multiplied by a power of two given by its position inside the window. The sum of these multiplications is the LBP descriptor for the central pixel of the window. Histogram, statistical moments, and LBP may be used as global descriptors, extracted from the entire image, or as a local descriptor from a limited area.

An example of a local shape descriptor is the eccentricity of a region. It consists of the ratio between the longest and the shortest axises of an object. The longest axis is the largest distance between any two points of the object and the shortest axis is the smallest distance between any two points in its boundary, perpendicular to the longest axis.

A border segment of an object may also be described by its curvature: the ratio between the diameter and the distance between its initial and final points.

Support Vector Machine. Support Vector Machine (SVM) [2] is a methodology applied for data classification, regression, and outlier identification. Given data belonging to two distinct classes, linear SVM classifier tries to locate the best hyper plan which separates the samples of the classes maintaining a small margin between them. The algorithm may also be modified to allow a few outliers to lay inside the margin or in the opposite side of its class. There are also variations of non-linear SVM which separates classes using more complicated geometries than hyper planes [16].

There are some solutions for multi-class problem using classification including the one-vs-one with \(N(N-1)/2\) classifiers, given N distinct classes, and the one-vs-all with only N classifiers. In the later case, the classifier which outputs the highest confidence is selected as indicating the correct class.

3 Proposed Framework

An environment was firstly implemented for interactive contour tracking in C++ using Qt Graphical Toolkit. 2D and 3D images may be loaded and presented in canvasses. Then, the user clicks on the desirable contour inserting anchors through out the track. Figure 1 shows an example of the usage of the interface on a sagittal slice of a human brain in which green circles represent anchors and the contours in different colors represent distinct path of: LW in green; RB in red; LZ in cyan; and a straight line in yellow. The purple contours are consolidated tracks of previous iterations.

At each iteration the user moves the mouse over the contour in order to find the longest correctly tracked segment by at least one of the methods. When such segment is found, the right mouse button switches among the methods and the left mouse button establishes the anchor. The straight line is useful for segments with low contrast and high noise in which all other methods behave poorly.

Fig. 1.
figure 1

Contour tracking interface: green circles: anchor points; and segments: pink: previous iterations; yellow: straight line; red: RB; green: LW; and cyan: LZ. (Color figure online)

The next step was to automatically suggest the best contour tracking method as the default option to the user, reducing the number of clicks and consequently, the execution time. It employs a SVM classifier over descriptors extracted from segments computed by each method and the minimum rectangular region encompassing them. This procedure reduces the number of user interactions since the software suggests the best method most of times. The following descriptors are extracted from the paths given by each method and then concatenated into a single descriptor:

(1) Perimeter; (2) Euclidean distance between anchors; (3) Curvature; (4) \(1^{st}\) moment of contour segment; (5) \(2^{nd}\) moment of the contour segment; (6) \(3^{nd}\) moment of the contour segment; (7) \(4^{nd}\) moment of the contour segment; (8) LBP descriptor of the contour segment; (9) Global histogram of the minimum rectangular region quantized in four bins; (10) Histogram of the intensities of the contour segment pixels, quantized in four bins; and (11) Histogram of the path values of the contour segment pixels, quantized in four bins.

Table 1. Best results among feature combinations

SVM was trained and validated utilizing two separated sets of brain image slices with ground-truth segmentation of the human brain. From a random initial contour pixel, the best method is selected, being the one that outputs the longest correct contour. The concatenated descriptor of all methods is extracted and used for training and validation. Sets containing any number from one to all descriptors were used to investigate their relevance. The mean accuracy and the correlation coefficient \(R^2\) were computed and the best generated results are shown in Table 1. It shows that only descriptor 2 is irrelevant for classifying the best method for tracking a given contour.

4 Experiments

The dataset used in our experiments consists of 360 2D slices extracted from 18 3D magnetic resonance images of human brains (6 sagittal, 6 axial, and 8 coronal slices from each 3D image) from the International Brain Segmentation Repository (IBSR)Footnote 1. The slices used for this experiment were different from the ones used for descriptor evaluation. Figure 2 shows a sample axial slice and its corresponding segmented brain mask.

Fig. 2.
figure 2

(Left) Original MRI T1 axial slice. (Right) Binary segmentation brain mask.

To verify the efficiency of the framework, we implemented a robot user which simulates the human behavior for the problem. The task is to segment the human brain using the contour tracking tool. It selects a random initial pixel and then increases the size of the contour while at least one of the methods is following the correct contour according to the ground-truth. Note that the ground-truth is just used to select the segment length. After that, the descriptors from the segment of all methods are extracted, concatenated, and classified by SVM classifier using the leave-one-out method. For each round of the experiment, the segments of 17 images are used for training leaving the segments of the other image for test. If SVM outputs the correct method, this counts as a hit. Table 2 shows the amount of hits over the total amount of segments. In total, we used 7048 test segments.

Table 2. Accuracy results of experiments with robot user. \(\overline{X}\) is the average value.

5 Conclusion

In this paper, we propose a novel interactive framework for contour tracking which allows a more precise execution based on three techniques that complement each another. The user may choose the most accurate method for each segment of the contour by pressing a mouse button. Also, we developed an automated classifier which suggests the best technique with more than 67% of accuracy. Future works include testing other classifiers and descriptors.