1 Introduction

In Mathematical Morphology several image operators are defined by the difference between two operators, such as morphological gradient which is defined as the difference between dilation and erosion. Such operators are called residual operators. In more general approach, some residual operators are defined by extraction of residual values from of an indexed family of operators, for example, the skeleton by maximal discs and the ultimate attribute opening. It is worth noting that ultimate attribute openings belong to a wider class of residual operators called ultimate levelings [1]. This class of residual operators analyzes the evolution of the residual values between two consecutive operators on a scale-space of levelings. Residual values of these operators can reveal important contrast information in images. In a addition to residues, other information associated with them can be obtained at the moment of the residual extraction, such as operator properties that produced the maximum residual value.

Examples of ultimate levelings include the maximum difference of openings (resp., closings) by reconstruction [2], differential morphological profiles [3], ultimate attribute openings (resp., closings) [4], differential attribute profiles [5], shape ultimate attribute openings (resp., closings) [6] and ultimate grain filters [7]. They have successfully been used as a preprocessing step in various applications such as texture features extraction, segmentation of high-resolution satellite imagery, text location, segmentation of building façades and restoration of historical documents.

Given the above considerations, the main contribution of this paper is the proposal of an energy filter to be a criterion to select tree nodes in order to perform the ultimate levelings [1]. This filter is based on the seminal work of Mumford and Shah [8] and the energy attribute introduced by Xu et al. [9]. In addition, plant image detection and segmentation tasks were selected to test the proposed energy filter.

2 Theoretical Background

Image representations through trees have been proposed in recent years to carry out tasks of image processing and analysis such as filtering, segmentation, pattern recognition, contrast extraction, registration, compression and others. In this scenario, the first step consists of constructing a representation of the input image by means of a tree; then the task of image processing or analysis is performed through information extraction from or modifications in the tree itself, and finally an image is reconstructed from the modified tree.

In order to build the trees considered in this paper, we need the following definitions. First, we consider images as mappings from a Cartesian grid \(\mathcal {D} \subset \mathbb {Z}^2\) to a discrete set of \(k\ge 1\) integers \(\mathbb {K} = \{0,1, \ldots , k-1\}\). (More precisely, \(f: \mathcal {D}\rightarrow \mathbb {K}\)). These mappings can be decomposed into lower (strict) and upper (large) level sets, i.e., for any \(\lambda \in \mathbb {K}\), \(\mathcal {X}_{\downarrow }^{\lambda }(f) = \{p \in \mathcal {D} : f(p) < \lambda \}\) and \(\mathcal {X}^{\uparrow }_{\lambda }(f) = \{p \in \mathcal {D}: f(p) \ge \lambda \}\). One important property of these level sets is that they are nested, i.e., \(\mathcal {X}_{\downarrow }^{0}(f) \subseteq \mathcal {X}_{\downarrow }^{1}(f) \subseteq ... \subseteq \mathcal {X}_{\downarrow }^{k}(f)\) and \(\mathcal {X}^{\uparrow }_{k}(f) \subseteq \mathcal {X}^{\uparrow }_{k-1}(f) \subseteq ... \subseteq \mathcal {X}^{\uparrow }_{0}(f)\). It is possible to show that any image f can be reconstructed using either the family of lower or upper sets, i.e.,

$$\forall p \in \mathcal {D}, f(p) = \sup \{\lambda : p\in \mathcal {X}^{\uparrow }_{\lambda }(f) \} = \inf \{\lambda -1: p \in \mathcal {X}_{\downarrow }^{\lambda }(f) \}.$$

From these sets, we define two other sets \(\mathcal {L}(f)\) and \(\mathcal {U}(f)\) composed by the connected components (CCs) of the lower and upper level sets of f, i.e., \(\mathcal {L}(f)= \{\tau \in \mathcal {CC}(\mathcal {X}_{\downarrow }^\lambda (f)): \lambda \in \mathbb {K}\}\) and \(\mathcal {U}(f)= \{\tau \in \mathcal {CC}(\mathcal {X}^{\uparrow }_\lambda (f)): \lambda \in \mathbb {K}\}\), where \(\mathcal {CC}(\mathcal {X})\) denotes the sets of either 4 or 8-CCs of \(\mathcal {X}\), respectively. The ordered pairs consisting of the CCs of the lower and upper level sets and the usual inclusion set relation, i.e., \((\mathcal {L}(f), \subseteq )\) and \((\mathcal {U}(f),\subseteq )\), induce two dual trees [10, 11] called component trees. It is possible to combine them into a single tree in order to obtain the so-called tree of shapes. In fact, note that, if \(\tau \in \mathcal {L}(f)\) then the CCs of \(\mathcal {D} \backslash \tau \) are in \(\mathcal {U}(f)\) (or vice versa, \(\tau \in \mathcal {U}(f) \Rightarrow \mathcal {CC}(\mathcal {D} \backslash \tau ) \subseteq \mathcal {L}(f)\)). Then, let \(\mathcal {P}(\mathcal {D})\) denote the powerset of \(\mathcal {D}\) and let \(sat: \mathcal {P}(\mathcal {D}) \rightarrow \mathcal {P}(\mathcal {D})\) be the operator of saturation [11] (or filling holes). Then, \(\mathcal {SAT}(f) = \{sat(\tau ): \tau \in \mathcal {L}(f) \cup \mathcal {U}(f)\}\) be the family of CCs of the upper and lower level sets with holes filled. The elements of \(\mathcal {SAT}(f)\), called shapes, are nested by the inclusion relation and thus the pair \((\mathcal {SAT}(f), \subseteq )\) induces a tree which is called tree of shapes [11].

Tree of shapes, and also component trees, are complete representations of images and they can be stored using compact and non-redundant data structures. In the case of component trees \((\mathcal {L}(f), \subseteq )\) and \((\mathcal {U}(f),\subseteq )\), these structures are known, respectively, as min-trees and max-trees. In these structures, each pixel \(p \in \mathcal {D}\) is associated only to the smallest CC of the tree containing it; and through parenthood relationship, it is also associated to all its ancestor nodes. Then, we denote by \(\mathcal {SC}(\mathcal {T}, p)\) the smallest CC containing p in tree \(\mathcal {T}\). Similarly, we say \(p \in \mathcal {D}\) is a compact node pixel (CNP) of a given CC \(\tau \in \mathcal {T}\) if and only if \(\tau \) is the smallest CC containing p, i.e., \(\tau = \mathcal {SC}(\mathcal {T}, p)\). Moreover, we denote by \(\hat{\tau }\) the set composed by only CNPs of \(\tau \), i.e., \(\hat{\tau } = \{p\in \mathcal {D}: \tau = \mathcal {SC}(\mathcal {T}, p)\}\). Figure 1 shows examples of min-tree, max-tree, and tree of shapes, where CNPs are highlighted in red.

Fig. 1.
figure 1

Min-tree, max-tree and tree of shapes as compact representations of \((\mathcal {L}(f), \subseteq )\) and \((\mathcal {U}(f), \subseteq )\), and \((\mathcal {SAT}(f), \subseteq )\), respectively, of the input image f. Only CNPs are stored and they are highlighted in red. (Color figure online)

3 Ultimate Levelings Based on Energy Functions

Ultimate levelings constitute a wider class of residual operators defined from a scale-space of levelings \(\{\psi _{i}: i \in \mathcal {I}\}\) [1, 7, 12]. An ultimate leveling analyzes the evolution of residual values from a family of consecutive primitives, i.e. \(r_i^{+}(f) = [\psi _{i}(f) - \psi _{i+1}(f) \vee 0]\) and \(r_i^{-}(f) = [\psi _{i}(f) - \psi _{i+1}(f) \vee 0]\), keeping the maximum positive and negative residues for each pixel. Thus, contrasted objects can be detected if a relevant residue is generated when they are filtered out by one of these levelings. More precisely, the ultimate leveling \(\mathcal {R}\) is defined for any image f as follows:

$$\begin{aligned} \mathcal {R}(f) = \mathcal {R}^{+}(f) \vee \mathcal {R}^{-}(f), \end{aligned}$$
(1)

where \(\mathcal {R}^{+}(f) = \sup _{i \in \mathcal {I}}\{r^+_{i}(f)\}\) and \(\mathcal {R}^{-}(f) = \sup _{i \in \mathcal {I}}\{r^-_{i}(f)\}\).

Residual values of these operators can reveal important contrasted structures in the image. In addition to these residues, other associated information can be obtained such as properties of the operators that produced the maximum residual value. For example, Beucher [13] introduced a function \(q_{\mathcal {I}_{\max }}: \mathcal {D} \rightarrow \mathcal {I}\) that associates to each pixel the major index that produces the maximum non-null residue, i.e.,

$$\begin{aligned} p \in \mathcal {D}, q_{\mathcal {I}_{\max }}(p) = \left\{ \begin{array}{ll} q_{\mathcal {I}_{\max }}^{+}(p), &{} \text {if}\ [\mathcal {R}_{\theta }(f)](p) > [\mathcal {R}^{+}_{\theta }(f)](p), \\ q_{\mathcal {I}_{\max }}^{-}(p), &{} \text {otherwise}. \end{array}\right. \end{aligned}$$
(2)

where \(q_{\mathcal {I}_{\max }}^+(p) = \max \{i + 1: [r^+_i(f)](p) = [\mathcal {R}^{+}_{\theta }(f)](p) > 0 \}\) and \(q_{\mathcal {I}_{\max }}^-(p) = \max \{i + 1: [r^-_i(f)](p) = [\mathcal {R}^{-}_{\theta }(f)](p) > 0 \}\).

The ultimate levelings can be efficiently implemented thanks to the theorem proposed in Alves et. al [10], which shows that an increasing family of levelings \(\{\psi _{i}: i \in \mathcal {I}\}\) can be obtained through a sequence of pruned trees \((\mathcal {T}_f^0, \mathcal {T}_f^1, \ldots , \mathcal {T}_f^{\mathcal {I}_{MAX}})\) from the structure of the max-tree, min-tree or tree of shapes \(\mathcal {T}_f\) constructed from the image f. Then, the i-th positive (resp. negative) residue \(r^+_i(f)\) can be obtained from the nodes \(\mathcal {N}r(i)= \mathcal {T}_f^i \diagdown \mathcal {T}_f^{i+1}\), i.e., \(\forall \tau \in \mathcal {N}r(i)\),

(3)

where level and Parent are functions that represent the gray level and the parent node of \(\tau \) in \(\mathcal {T}_f\), respectively. Thus, the i-th positive (resp. negative) residue \(r^+_i(f)\) is given as follows:

$$\begin{aligned} \forall p \in \mathcal {D}, [r^+_i(f)](p) = \left\{ \begin{array}{ll} r^+_{\mathcal {T}^i_f}(\mathcal {SC}(\mathcal {T}^i_f, p)), &{} \text {if}\ \mathcal {SC}(\mathcal {T}^i_f, p) \in \mathcal {N}r(i),\\ 0,&{} \text {otherwise}.\\ \end{array}\right. \end{aligned}$$
(4)

Those facts lead to efficient algorithms for computing ultimate levelings and its variations [7, 14].

Ultimate levelings are operators that extract residual information from primitive families. During the residual extraction process, it is very common that undesirable regions of the input image contain residual information that should be filtered out. These undesirable residual regions often include desirable residual regions due to the design of the ultimate levelings which consider maximum residues. Thus, we improve the residual information by filtering out residues extracted from undesirable regions. To decide whether a residue \(r^+_i(f)\) (resp., \(r^-_i(f)\)) is filtered out or not, just checking nodes \(\tau \in \mathcal {N}r(i)\) that satisfy a given filtering criterion \(\varOmega : \mathcal {P}(\mathcal {D}) \rightarrow \{\text {desirable}, \text {undesirable}\}\). Thus, we only calculate the ultimate leveling \(\mathcal {R}\) for residues \(r^+_i\) (resp., \(r^-_i\)) such that satisfy the criterion \(\varOmega \). So, positive (resp. negative) residues are redefined as follows:

$$\begin{aligned} \forall \tau \in \mathcal {N}r(i), r^{\varOmega +}_{\mathcal {T}^i_f}(\tau ) = \left\{ \begin{array}{ll} r^+_{\mathcal {T}^i_f}(\tau ), &{} \text {if}\ \exists C \in \mathcal {N}r(i)\\ &{}\text {such that}\ \varOmega (C)\ \text {is desirable}; \\ 0,&{} \text {otherwise},\\ \end{array}\right. \end{aligned}$$
(5)

and thus redefined the ultimate levelings with strategy for filtering from undesirable residues as follows: \(\mathcal {R}_{\varOmega }(f) = \mathcal {R}_{\varOmega }^{+}(f) \vee \mathcal {R}_{\varOmega }^{-}(f)\), where,\(\mathcal {R}_{\varOmega }^{+}(f) = \sup \{r^{\varOmega +}_i(f): i\in \mathcal {I} \}\) and \(\mathcal {R}_{\varOmega }^{-}(f) = \sup \{r^{\varOmega -}_i(f): i\in \mathcal {I} \}\).

3.1 Filtering Based on Mumford-Shah Energy

In this section, we present the main contribution of this paper, that is, the construction of a criterion \(\varOmega \) to filter undesirable residues from the ultimate levelings based on energy functions. This means that some desirable residual regions are selected according to the energy functional attribute \(\kappa : \mathcal {P}(\mathcal {D}) \rightarrow \mathbb {R}^+\) and a parameter threshold \(\varepsilon \in \mathbb {R}^+\). Thus, to decide whether a residue \(r^+_i(f)\) (resp., \(r^-_i(f)\)) is desirable, simply check if there exists at least one node \(\tau \in \mathcal {N}r(i)\) that satisfies the filtering criterion:

$$\begin{aligned} \varOmega (\tau ) = \left\{ \begin{array}{ll} \text {desirable,} &{} \text {if}\ \kappa (\tau ) > \varepsilon ,\\ \text {undesirable,} &{} \text {otherwise}. \end{array}\right. \end{aligned}$$
(6)

Thus, we present a brief review of how we obtained the attribute \(\kappa \) from a morphological tree \(\mathcal {T}\). First, we remember that a large and growing body of literature has investigated the use of energy functions to segment regions of images [8, 9, 15]. One class of interested energy-based segmentation problem is the one that involves the minimization of a two-term energy based function \(E_{\nu } = D + \nu C\), where D is called goodness-of-fit term, C denotes the regularization term and \(\nu \) is a parameter. For example, one seminal instance of two-term energy based function is the piecewise-constant Mumford-Shah functional [8] that is given for any image f as follows:

$$\begin{aligned} E_{\nu }(f, P) = \sum _{R \in P}(D(R) + \nu \cdot C(R)), \end{aligned}$$
(7)

where \(P=\{R_1, R_2, \ldots , R_n\} \) is the set of regions of a given partition on \(\mathcal {D}\), \(C(R) = |\partial R|\) is the cardinality of the set of contour pixels of \(R \in P\) and \(D(R) = \sum _{p \in R}( f(p) - \mu _R)^2\) is the sum of squared errors and \(\mu _R\) is the mean of f within the region R.

In fact, a tree \(\mathcal {T}\) provides an associated partition of the image f through the CNPs, i.e., \(P_{\mathcal {T}} = \{\hat{\tau }: \tau \in \mathcal {T} \}\). Thus, considering Eq. 7 as a minimization problem, we are interested in finding the simplified version \(\mathcal {T}^{*}\) of \(\mathcal {T}\) such that has minimum energy, i.e., \(\mathcal {T}^{*}=\arg \min \{ E_{\nu }(f, P_{\mathcal {T}'}):\mathcal {T}' \text { is a simplified version of } \mathcal {T}\}\).

On one hand, previous studies have primarily concentrated on minimizing the Mumford-Shah energy function using curve evolution methods. On the other hand, based on Ballester et al. [15], Xu et al. [9] proposed an efficient greedy algorithm to minimize energy \(E_\nu \) subordinated to the tree of shapes and it is based on a measure called energy variation. This measure is defined as the difference between the energy of the tree \(\mathcal {T}\) with and without a node \(\tau \in \mathcal {T}\), and it is given by:

$$\begin{aligned} \varDelta E_\nu (\mathcal {T}, \tau ) = E_\nu (f, P_{\mathcal {T}}) - E_\nu (f, P_{\mathcal {T} \backslash \{\tau \}}). \end{aligned}$$
(8)

Thus, since only the regions (denoted by \(\hat{\tau }\), \(\hat{\tau }'\) and \(\hat{\tau }''\)) obtained by the CNPs of \(\tau \in \mathcal {T}\), and are, respectively, in the symmetric difference between \(P_{\mathcal {T}}\) and \(P_{\mathcal {T} \backslash \{\tau \}}\), we can rewrite Eq. 8 by:

$$\begin{aligned} \begin{array}{lll} \varDelta E_{\nu }(\mathcal {T}, \tau ) &{} = &{} E_\nu (f, P_{\mathcal {T}}) - E_\nu (f, P_{\mathcal {T} \backslash \{\tau \}}) \\ &{} = &{} [D(\hat{\tau }) + \nu \cdot C(\hat{\tau })] + [D(\hat{\tau }') + \nu \cdot C(\hat{\tau }')] - [D(\hat{\tau }'') + \nu \cdot C(\hat{\tau }'')] \\ &{} = &{} D(\hat{\tau }) + D(\hat{\tau }') -D(\hat{\tau }'') + \nu \cdot (C(\hat{\tau }) + C(\hat{\tau }') - C(\hat{\tau }'')). \end{array} \end{aligned}$$
(9)

Recently, Xu et al. [9] provided a new approach to minimize the piecewise-constant Mumford-Shah energy functional. Thus, based on the energy variation \(\varDelta E_{\nu }\), the energy functional attribute \(\kappa \) is defined:

$$\begin{aligned} \begin{array}{lll} \kappa (\tau ) &{} = &{} \dfrac{-\varDelta E_{\nu }(\mathcal {T}, \tau )}{C(\hat{\tau }) + C(\hat{\tau }') - C(\hat{\tau }'')} + \nu \\ &{} = &{} \dfrac{-[D(\hat{\tau }) + D(\hat{\tau }') - D(\hat{\tau }'')]}{C(\hat{\tau }) + C(\hat{\tau }') - C(\hat{\tau }'')} - \dfrac{\nu \cdot (C(\hat{\tau }) + C(\hat{\tau }') - C(\hat{\tau }''))}{C(\hat{\tau }) + C(\hat{\tau }') - C(\hat{\tau }'')} + \nu \\ &{} = &{} -[D(\hat{\tau }) + D(\hat{\tau }') - D(\hat{\tau }'')] \cdot \dfrac{1}{C(\hat{\tau }) + C(\hat{\tau }') - C(\hat{\tau }'')} \\ &{} = &{} \dfrac{[S(f, \hat{\tau })]^2}{|\hat{\tau }|} + \dfrac{[S(f, \hat{\tau }')]^2}{|\hat{\tau }'|} - \dfrac{[S(f, \hat{\tau }'')]^2}{|\hat{\tau }''|} \cdot \dfrac{1}{|\partial \hat{\tau }| + |\partial \hat{\tau }'| - |\partial \hat{\tau }''|}, \end{array} \end{aligned}$$
(10)

where S(fR) is the sum of graylevels of all pixels of R and \(|\cdot |\) denotes the cardinality. The energy functional attribute \( \kappa \) is obtained by the greedy algorithm proposed by Xu et al. [9]. As already mentioned before, in this paper, this measure is used in Eq. 6 just for the filtering criterion.

4 Results and Experiments

One interesting vision task problem is the plant bounding box detection. Actually an important image plant dataset is provided by Minervine et al. [16]. In respect of plant image bounding box detection task, the dataset is composed by 70 images divided in three subsets: Ara2012; Ara2013-Canon and Ara2013-RPi. In addition, all images have size of \(3108 \times 2324\) and \(2592 \times 1944\) pixels.

The representation of our approach is showed in Fig. 2. First, the max-tree of the green channel of the input image is constructed. After, the ultimate leveling based on Mumford-Shah energy functional is applied. As some images present incomplete regions, then a post-processing to merge those regions is computed, and, finally, we can obtain the bounding box coordinates.

Fig. 2.
figure 2

Representation of our approach. The image \(f_g\) is the green channel of the input image \(f_{rgb}\) and \( f_{ule} \) is the output image generated by the energy-based ultimate leveling. (Color figure online)

In order to evaluate our approach, we chose the measurements presented by Minervine et al. [16]. Those measurements are: SBD that provides a measure about accuracy; DIC that is the difference between the plant detection using the bounding box annotated and the bounding box predicted and |DIC| is the absolute value of DIC. The results obtained using our approach are shown in Table 1 and they reveal its robustness. In addition, to our knowledge, these results are the unique ones in bounding box plant detection task obtained from Minervine et al. [16] plant dataset presented in the literature. The value choice for the parameter \( \varepsilon \) for the filtering attribute was determined by cross-validation tuning parameter. The best result showed in Table 1 was obtained with \( \varepsilon =190\). In addition, Fig. 3 shows the result of the cross-validation tuning parameter for different values of \( \varepsilon \). It reveals the importance of the energy filter to increase the accuracy in our approach.

Fig. 3.
figure 3

Accuracy values obtained with the cross-validation tuning. It shows the importance of the filter to increase the accuracy.

Table 1. Results obtained with \( \varepsilon =190 \). Reported values are mean and standard deviation (in parenthesis).

5 Conclusion

This paper presented a filter based on the Mumford-Shah energy functional applied to ultimate levelings operators. As it is known, during the residual extraction process, it is very common that undesirable regions of the input image contain residual information that should be filtered out. In this sense, we design a filter based on the Mumford-Shah energy functional to filter out residues extracted from undesirable regions. The results obtained applying the filter in plant detection reveal the robustness of our approach. In future works, we intend to explore other energy functions as Snakes or active contour.