Keywords

1 Introduction

Image segmentation is a challenging task that often requires user’s assistance for corrections [11]. Deep neural networks can provide impressive object approximations [10], but object delineation is still not accurate, even when the user provides careful object localization [12] (Fig. 1). On the other hand, the combination of interactive object localization and graph-based object delineation may solve the problem in a few iterations of corrections with simple user effort (Fig. 1c).

Among many interesting approaches, graph-based object delineation has become quite popular with methods based on Random Walks [8], Graph Cuts [2], Watershed Cuts [6], and Image Foresting Transform (IFT) [7]. These frameworks interpret an image as a graph and, often from some hard constraints (e.g., seed nodes that were chosen by the user to locate the objects), the methods delineate the objects by optimizing some energy function [3, 13].

In this work, we explore the optimum-path trees that dynamically evolve from seed nodes during the IFT algorithm for more effective object delineation. This defines a new framework, named Dynamic IFT (DynIFT), with methods that can estimate the arc weights in the graph during object delineation by exploiting the object knowledge that increases at each moment. The methods are compared with state-of-the-art graph-based delineation approaches [2, 3, 6] using a natural image dataset with two types of seed sets provided by users. The experimental results using color information only already show considerable effectiveness gains in object delineation using the DynIFT algorithm.

The next sections present the proposed framework, with algorithm and examples of dynamic arc-weight estimation methods, the experimental results, discussion, and conclusion.

Fig. 1.
figure 1

(a) Original image with four extreme points (magenta) for the method in [12] and orange and green markers for the proposed algorithm. (b) Result of the method in [12] with errors and (c) the desired segmentation using the proposed algorithm. (Color figure online)

2 Dynamic Image Foresting Transform

A 2D image is a pair \((D_I,\mathbf{I})\) in which \(\mathbf{I}(p)\) assigns a set of local image features (e.g., color components) to each pixel \(p\in D_I \subset \mathcal {Z}^2\). An image may be interpreted as a graph \((\mathcal {N},\mathcal {A})\) in various distinct ways by defining the nodes in \(\mathcal {N}\subseteq D_I\), for example, as pixels, superpixels, or pixel vertices, and using some adjacency relation \(\mathcal {A} \subset \mathcal {N}\times \mathcal {N}\) in the image domain and/or feature space to define the arcs. For the sake of simplicity, we focus on pixels as nodes (\(\mathcal {N}=D_I\)), with \(\mathbf {I(p)}\) being the CIELab color components of pixel p, and the 4-neighborhood relation to define the arcs.

Fig. 2.
figure 2

(a) Original image with markers (yellow and blue circles) drawn by the user. (b–e) Dynamic tree evolution in some iterations (each color being one different tree). Notice that each marker has multiple trees (one for each root pixel), but some roots may conquer most pixels in the region dominated by the marker. (Color figure online)

For a given seed set \(\mathcal {S}\)—e.g., labeled scribbles (markers) drawn by the user in each object (including background) for object localization and/or segmentation correction—we wish to partition the image into objects such that each object consists of the pixels more closely connected to its internal seeds than to any other. Each seed \(p\in \mathcal {S}\) is then uniquely identified as belonging to one among c objects by a labeling function \(\lambda _O(p)\in \{0,1,2,\ldots ,c\}\), being 0 the background. Similarly, one can also identify the marker \(\lambda _M(p)\in \{1,2,\ldots ,m\}\) among m markers that contains the seed p. This can be used, for instance, to control marker deletion and addition during segmentation correction. Therefore, a connectivity function f measures how closely connected are seed and node through any given path in the image graph from the former to the latter. A path \(\pi _q = \langle p_1, p_2, \ldots , p_n=q\rangle \) with terminus q is a sequence of nodes, such that \((p_i,p_{i+1})\in \mathcal {A}\), \(i=1,2,\ldots ,n-1\), being trivial when \(\pi _q=\langle q\rangle \). A path \(\pi _q\) is optimum when \(f(\pi _q) \le f(\tau _q)\) for any other path \(\tau _q\), irrespective to its starting node. Defining \(\varPi \) as the set of all possible paths in the graph, the Image Foresting Transform (IFT) algorithm [7] minimizes a path cost map C,

$$\begin{aligned} C(q)= & {} \min _{\forall \pi _q\in \varPi } \{f(\pi _q)\}, \end{aligned}$$
(1)

by computing an optimum-path forest P—i.e., an acyclic predecessor map that assigns to every node q its predecessor \(P(q)\in D_I\) in the optimum path \(\pi ^{P}_q\) with terminus q or a marker \(P(q)=nil\not \in D_I\), when q is a root (starting node) of the map (i.e., \(\pi ^{P}_q=\langle q\rangle \) is optimum). The algorithm can also propagate to every node \(p\in D_I\), the root \(R(p)\in \mathcal {S}\) in the optimum-path forest, the object label map \(L(p)=\lambda _O(R(p))\in \{0,1,2,\ldots ,c\}\) (resulting segmentation), and the marker label map \(M(p)=\lambda _M(R(p))\). The roots of the map are drawn from \(\mathcal {S}\), such that each object is defined by the optimum-path forest rooted in its internal seeds. Connectivity functions may be defined in different ways, which do not always guarantee the optimum cost mapping conditions [4], but can produce effective object delineation [14]. In this work, we explore the connectivity function

$$\begin{aligned} f_{\max }(\langle q \rangle )&= \left\{ \begin{array}{ll} 0 &{} \text{ if } q\in \mathcal {S}, \nonumber \\ +\infty &{} \text{ otherwise, } \end{array}\right. \nonumber \\ f_{\max }(\pi _p \cdot \langle p, q\rangle )&= \max \{f_{\max }(\pi _p),w(p,q)\}, \end{aligned}$$
(2)

where \(\pi _p\cdot \langle p, q\rangle \) indicates the extension of path \(\pi _p\) by an arc \(\langle p,q \rangle \) and w(pq) is an arc weight usually estimated from \(\mathbf{I}\) (e.g., \(w(p,q)=\Vert \mathbf{I}(q)-\mathbf{I}(p)\Vert \)). The IFT algorithm with \(f_{\max }\) computes optimum paths from \(\mathcal {S}\) to the remaining nodes by growing one optimum-path tree \(\mathcal {T}_r\) for each seed \(r\in \mathcal{S}\).

The dynamic IFT essentially exploits the sets \(\mathcal {T}_r\) to estimate the arc weights w(pq) as the costs of including q, through \(\pi _p\cdot \langle p,q\rangle \), as part of the same object that contains p at the time the optimum path \(\pi _p\) is found (Fig. 2).

2.1 DynIFT Algorithm for \(f_{\max }\)

The dynamic IFT algorithm for \(f_{\max }\) is a variant of the IFT algorithm, which maintains the dynamic trees \(\mathcal {T}_r\), for all \(r\in \mathcal {S}\), object label map L, path cost map C, marker label map M, predecessor map P, and root map R for possible use during the segmentation process, especially for arc weight estimation.

figure a

Lines 1–5 of the DynIFT algorithm initialize the maps, being all pixels \(p\in D_I\) defined as trivial paths \(\langle p \rangle \) in P and inserted in Q. The main loop (Lines 6–13) computes in P optimum paths from the minima of the cost map (i.e., the seeds in \(\mathcal {S}\)) to the remaining pixels in \(D_I\setminus \mathcal {S}\). When a pixel p is removed from Q in Line 7, the current path \(\pi ^{P}_p\), that can be obtained backward in P, is optimum and p is inserted in the dynamic tree \(\mathcal {T}_{R(p)}\) of the root of p. The internal loop (Lines 8–13) considers only the adjacent pixels \(q\in Q\) that does not belong to any dynamic set yet. It can estimate the arc weight w(pq) by extracting object information from the maps and dynamic sets (Sect. 2.2). The remaining lines compute the cost of the extended path \(\pi ^{P}_p \cdot \langle p,q\rangle \) and if this cost is lower than the current path cost C(q), it updates the maps and the path \(\pi ^{P}_q\) becomes \(\pi ^{P}_p \cdot \langle p,q\rangle \) in P.

Next we explore simple and yet effective ways to estimate the arc weights.

2.2 Dynamic Arc Weight Estimation

Algorithm 1 executes in \(|D_I|\) iterations of the main loop. By the time a pixel p is removed from Q in Line 7, the tree \(\mathcal {T}_{R(p)}\) contains information about the region conquered by the root \(R(p)\in \mathcal {S}\) (which includes p), the map P contains the optimum path \(\pi ^{P}_p\), the map M contains the union \(\bigcup _{\forall r\in \mathcal {S}| M(r)=M(p)} \mathcal {T}_r\) of trees rooted in each marker \(M(p)\in \{1,2,\ldots ,m\}\), and the map L contains the union \(\bigcup _{\forall r\in \mathcal {S}| L(r)=L(p)} \mathcal {T}_r\) of trees rooted in each object \(L(p)\in \{1,2,\ldots ,c\}\). Therefore, color, texture, and shape information about the object or its regions can be explored for dynamic arc weight estimation—i.e., to estimate the cost of including q as part of the object that contains p. We then evaluate the following dynamic arc weight functions based on the tree mean color \({\mu }_{\mathbf {R(p)}}\) of the pixels \(p\in \mathcal {T}_{R(p)}\) and the object mean color \({\mu }_{\mathbf {L(p)}}\) of the pixels \(p\in \bigcup _{\forall r\in \mathcal {S}| L(r)=L(p)} \mathcal {T}_r\).

$$\begin{aligned} w_1(p,q)&= \Vert {\mu }_{\mathbf {R(p)}}-{\mathbf {I(q)}}\Vert , \end{aligned}$$
(3)
$$\begin{aligned} w_2(p,q)&= \min _{\forall r\in \mathcal {S}|L(r)=L(p)} \{\Vert {\mu }_{\mathbf {r}}-{\mathbf {I(q)}}\Vert \}, \end{aligned}$$
(4)
$$\begin{aligned} w_3(p,q)&= \Vert {\mu }_{\mathbf {L(p)}}-{\mathbf {I(q)}}\Vert , \end{aligned}$$
(5)
$$\begin{aligned} w_4(p,q)&= w_1(p,q) + \Vert \mathbf {I(q)}-\mathbf {I(p)}\Vert , \end{aligned}$$
(6)
$$\begin{aligned} w_5(p,q)&= w_2(p,q) + \Vert \mathbf {I(q)}-\mathbf {I(p)}\Vert , \end{aligned}$$
(7)
$$\begin{aligned} w_6(p,q)&= w_3(p,q) + \Vert \mathbf {I(q)}-\mathbf {I(p)}\Vert . \end{aligned}$$
(8)

DynIFT with \(w_1\) assumes that the mean color of the region of the object that contains p (i.e., the dynamic tree \(\mathcal {T}_{R(p)}\)) is more representative than \(\mathbf {I(p)}\). However, it also assumes that each seed can only conquer pixels q whose color is similar to the mean color of that region. The purpose of \(w_2\) is to relax this criterion by considering the closest mean color of all dynamic trees rooted in the same object. This allows, for instance, to delineate object regions not necessarily connected to their most similar seeds (Fig. 3). Function \(w_3\) extends the concept of \(w_1\) for the entire object, which should not be a good idea since the object may be represented by different parts. The remaining functions essentially add the local arc weight \(\Vert \mathbf {I(q)}-\mathbf {I(p)}\Vert \) to the previous ones in order to evaluate the importance of the local contrast between regions.

Fig. 3.
figure 3

(a) Original image with markers (red and blue) and ground-truth delineation (magenta). Segmentation results using arc weight functions (b) \(w_4\) and (c) \(w_5\). Even without object markers on the swimmer’s legs, \(w_5\) is still able to delineate it, because of the global similar tree search. (Color figure online)

3 Experimental Results

For comparison, we use the power watershed (\(PW_{q=2}\)) algorithm [5] (i.e., image segmentation based on minimum spanning forest and random walk), the IFT algorithm for \(f_{\max }\) with arc weight function \(w(p,q) = \Vert \mathbf {I(q)}-\mathbf {I(p)}\Vert \) (i.e., a watershed transform [7]), and the min-cut/max-flow algorithm [2, 15]. The IFT-based watershed transform provides a cut in the graph given hard constraints (seed set) such that the lowest arc weight in the cut is maximum (i.e., it is an energy maximizer, \(GC_{\max }\), as defined in [3] or a watershed cut as defined in [6]). Its counterpart is the energy minimizer, \(GC_{sum}\), as defined in [3], which uses the min-cut/max-flow algorithm [2, 15] and obtains a cut in the graph given the seed set such that the sum of the arc weights in the cut is minimum. For that, one can simply use the normalized complementary arc weight function \(\bar{w}(p,q)=|\frac{w_{\max }-w(p,q)}{w_{\max }}|^\alpha \), where \(\alpha \ge 1\) and \(w_{\max }\) is the maximum value of w(pq) in the graph, in the min-cut/max-flow algorithm with source and sink connected to the seed set.

Table 1. Experimental results.
Fig. 4.
figure 4

Examples of segmentation results using DynIFT and the baselines. First row shows the images with the chosen markers for the objects (blue and yellow) and background (red), and the borders of the ground-truth segmentations (magenta and green). The remaining rows show the segmentations from the considered methods for the chosen markers. Note that \(GC_{sum}\) is only able to produce binary segmentations. (Color figure online)

As proposed by Andrade and Carrera [1], our experiments run on two predefined sets of markers to avoid bias of prior knowledge of the process of segmenting with each algorithm. The first is the dataset from [9], in which about four markers cover a small area of both the background and foreground on each image. The second dataset contains a more carefully selected set of scribbles [1].

Table 1 shows the mean and standard deviation of the results over the 50 images of the GrabCut dataset and their respective markers, two baselines (with \(\alpha = 100\) for the \(GC_{sum}\) algorithm), and the six proposed arc weight functions. Computations were performed on an Intel Core i7-7700 CPU 3.60 GHz.

In all cases, except for \(w_3\) and \(w_6\), as predicted, the DynIFT-based methods can considerably improve the accuracy of object delineation in comparison with the baselines. Note that the relaxed versions, represented by \(w_2\) and \(w_5\), can obtain better results than using the local mean color only of the tree \(\mathcal {T}_{R(p)}\), represented by \(w_1\) and \(w_4\). Figure 4 illustrates these results on a few examples.

4 Conclusion

In this paper, we present a new framework, named Dynamic IFT (DynIFT), that explores the object knowledge from the evolution of optimum-path trees during the IFT algorithm for more effective object delineation. We evaluated the DynIFT with some arc-weight estimation methods using color information. Experimental results show that DynIFT attains considerable accuracy gains in object delineation when compared to three well-established baselines. As future work, we intend to explore the dynamic proprieties of the optimum-path forest and its combination with pattern recognition algorithms to better understand how image delineation can be improved.