Skip to main content
Log in

Minimizing Energies with Hierarchical Costs

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Computer vision is full of problems elegantly expressed in terms of energy minimization. We characterize a class of energies with hierarchical costs and propose a novel hierarchical fusion algorithm. Hierarchical costs are natural for modeling an array of difficult problems. For example, in semantic segmentation one could rule out unlikely object combinations via hierarchical context. In geometric model estimation, one could penalize the number of unique model families in a solution, not just the number of models—a kind of hierarchical MDL criterion. Hierarchical fusion uses the well-known α-expansion algorithm as a subroutine, and offers a much better approximation bound in important cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 2
Algorithm 3
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Note that α-expansion itself does not require D p (⋅)≥0; this assumption is only needed for analysis of worst-case bounds.

  2. A tree is irreducible if all its internal nodes have at least two children, i.e. there are no ‘redundant’ parent nodes and so for each i there exists some γ,ζ such that lca(γ,ζ)=i.

  3. Due to our assumption that V is semi-metric and so V(,)=0, we can simply sum over all \(pq \in \mathcal{A}_{j}\) instead of only where \(f^{*}_{p} \neq f^{*}_{q}\).

References

  • Aggarwal, C. C., Orlin, J. B., & Tai, R. P. (1997). Optimized crossover for the independent set problem. Operations Research, 45(2), 226–234.

    Article  MathSciNet  MATH  Google Scholar 

  • Ahuja, R. K., Ergun, Ö., Orlin, J. B., & Punnen, A. P. (2002). A survey of very large-scale neighborhood search techniques. Discrete Applied Mathematics, 123(1–3), 75–202.

    Article  MathSciNet  MATH  Google Scholar 

  • Barinova, O., Lempitsky, V., & Kohli, P. (2010). On the detection of multiple object instances using Hough transforms. In IEEE conference on computer vision and pattern recognition (CVPR), June 2010.

    Google Scholar 

  • Bartal, Y. (1998). On approximating arbitrary metrics by tree metrics. In ACM symposium on theory of computing (STOC).

    Google Scholar 

  • Birchfield, S., & Tomasi, C. (1999). Multiway cut for stereo and motion with slanted surfaces. In International conference on computer vision (ICCV).

    Google Scholar 

  • Boros, E., & Hammer, P. L. (2002). Pseudo-boolean optimization. Discrete Applied Mathematics, 123(1–3), 155–225.

    Article  MathSciNet  MATH  Google Scholar 

  • Boykov, Y., & Jolly, M.-P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In International conference on computer vision (ICCV).

    Google Scholar 

  • Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Recognition and Machine Intelligence, 29(9), 1124–1137.

    Article  Google Scholar 

  • Boykov, Y., & Veksler, O. (2006). Graph cuts in vision and graphics: theories and applications. In N. Paragios, Y. Chen, & O. Faugeras (Eds.), Handbook of mathematical models in computer vision (pp. 79–96). New York: Springer.

    Chapter  Google Scholar 

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Recognition and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Choi, M. J., Lim, J. J., Torralba, A., & Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In IEEE conference on computer vision and pattern recognition (CVPR), June 2010.

    Google Scholar 

  • Cunningham, W., & Tang, L. (1999). Optimal 3-terminal cuts and linear programming. In LNCS, Vol. 1610: Integer programming and combinatorial optimization (pp. 114–125).

    Chapter  Google Scholar 

  • Delong, A. (2011). Advances in graph-cut optimization. PhD thesis, University of Western Ontario.

  • Delong, A., Gorelick, L., Schmidt, F. R., Veksler, O., & Boykov, Y. (2011). Interactive segmentation with super-labels. In Energy minimization methods in computer vision and pattern recognition (EMMCVPR), July 2011.

    Google Scholar 

  • Delong, A., Osokin, A., Isack, H. N., & Boykov, Y. (2012). Fast approximate energy minimization with label costs. International Journal of Computer Vision, 96(1), 1–27 (Earlier version in CVPR 2010).

    Article  MathSciNet  MATH  Google Scholar 

  • Feige, U. (1998). A threshold of lnn for approximating set cover. Journal of the ACM, 45(4), 634–652.

    Article  MathSciNet  MATH  Google Scholar 

  • Felzenszwalb, P. F., Pap, G., Tardos, É., & Zabih, R. (2010). Globally optimal pixel labeling algorithms for tree metrics. In IEEE conference on computer vision and pattern recognition (CVPR).

    Google Scholar 

  • Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.

    Article  MathSciNet  Google Scholar 

  • Givoni, I. E., Chung, C., & Frey, B. J. (2011). Hierarchical affinity propagation. In Uncertainty in artificial intelligence (UAI), July 2011.

    Google Scholar 

  • Goldberg, A. V., & Tarjan, R. E. (1988). A new approach to the maximum-flow problem. Journal of the Association for Computing Machinery, 35(4), 921–940.

    Article  MathSciNet  MATH  Google Scholar 

  • Gorelick, L., Delong, A., Veksler, O., & Boykov, Y. (2011). Recursive MDL via graph cuts: application to segmentation. In International conference on computer vision (ICCV), November 2011.

    Google Scholar 

  • Greig, D., Porteous, B., & Seheult, A. (1989). Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society B, 51(2), 271–279.

    Google Scholar 

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    Google Scholar 

  • Hochbaum, D. S. (1982). Heuristics for the fixed cost median problem. Mathematical Programming, 22(1), 148–162.

    Article  MathSciNet  MATH  Google Scholar 

  • Isack, H. N., & Boykov, Y. (2012). Energy-based geometric multi-model fitting. International Journal on Computer Vision, 97(2), 123–147.

    Article  MathSciNet  MATH  Google Scholar 

  • Kalogerakis, E., Hertzmann, A., & Singh, K. (2010). Learning 3D mesh segmentation and labeling. In ACM SIGGRAPH.

    Google Scholar 

  • Kantor, E., & Peleg, D. (2009). Approximate hierarchical facility location and applications to the bounded depth Steiner tree and range assignment problems. Journal of Discrete Algorithms, 7(3), 341–362.

    Article  MathSciNet  MATH  Google Scholar 

  • Kleinberg, J., & Tardos, É. (2002). Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. Journal of the ACM, 49, 5.

    Article  MathSciNet  Google Scholar 

  • Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.

    Article  Google Scholar 

  • Kolmogorov, V., & Rother, C. (2007). Minimizing non-submodular functions with graph cuts—a review. IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI), 29(7), 1274–1279

    Article  Google Scholar 

  • Kolmogorov, V., & Zabih, R. (2004). What energy functions can be optimized via graph cuts. IEEE Transactions on Pattern Recognition and Machine Intelligence, 26(2), 147–159.

    Article  Google Scholar 

  • Kumar, M. P., & Koller, D. (2009). MAP estimation of semi-metric MRFs via hierarchical graph cuts. In Conference on uncertainty in artificial intelligence (pp. 313–320), June 2009.

    Google Scholar 

  • Ladický, L., Russell, C., Kohli, P., & Torr, P. H. S. (2010a). Graph cut based inference with co-occurrence statistics. In European conference on computer vision (ECCV), September 2010.

    Google Scholar 

  • Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., & Torr, P. H. S. (2010b) Joint optimisation for object class segmentation and dense stereo reconstruction. In British machine vision conference (BMVC).

    Google Scholar 

  • Lazic, N., Givoni, I., Frey, B. J., & Aarabi, P. (2009). FLoSS: facility location for subspace segmentation. In International conference on computer vision (ICCV).

    Google Scholar 

  • Lempitsky, V., Rother, C., Roth, S., & Blake, A. (2010). Fusion moves for Markov random field optimization. IEEE Transactions on Pattern Analysis and Machine Inference, 32, 1392–1405.

    Article  Google Scholar 

  • Li, S. Z. (1994). Markov random field modeling in image analysis. Berlin: Springer.

    Google Scholar 

  • Li, H. (2007). Two-view motion segmentation from linear programming relaxation. In IEEE conference on computer vision and pattern recognition (CVPR).

    Google Scholar 

  • Meyers, C., & Orlin, J. B. (2007). Very large-scale neighborhood search techniques in timetabling problems. In Practice and theory of automated timetabling (Vol. VI, p. 24).

    Google Scholar 

  • Olsson, C., Byröd, M., Overgaard, N. C., & Kahl, F. (2009). Extending continuous cuts: anisotropic metrics and expansion moves. In International conference on computer vision, October 2009.

    Google Scholar 

  • Pock, T., Schoenemann, T., Graber, G., Bischof, H., & Cremers, D. (2008). A convex formulation of continuous multi-label problems. In European conference on computer vision (ECCV), October 2008.

    Google Scholar 

  • Pock, T., Chambolle, A., Bischof, H., & Cremers, D. (2009). A convex relaxation approach for computing minimal partitions. In IEEE conference on computer vision and pattern recognition (CVPR), June 2009.

    Google Scholar 

  • Potts, R. B. (1952). Some generalized order-disorder transformations. Mathematical Proceedings of the Cambridge Philosophical Society, 48, 106–109.

    Article  MathSciNet  MATH  Google Scholar 

  • Rother, C., Kumar, S., Kolmogorov, V., & Blake, A. (2005). Digital tapestry. In IEEE conference on computer vision and pattern recognition (CVPR).

    Google Scholar 

  • Rother, C., Kolmogorov, V., Lempitsky, V., & Szummer, M. (2007). Optimizing binary MRFs via extended roof duality. In IEEE conference on computer vision and pattern recognition (CVPR), June 2007.

    Google Scholar 

  • Sahin, G., & Süral, H. (2007). A review of hierarchical facility location models. Computers and Operations Research, 34(8), 2310–2331.

    Article  MathSciNet  MATH  Google Scholar 

  • Sefer, E., & Kingsford, C. (2011). Metric labeling and semi-metric embedding for protein annotation prediction. In Research in computational molecular biology.

    Google Scholar 

  • Shmoys, D. B., Tardos, É., & Aardal, K. (1998). Approximation algorithms for facility location problems. In ACM symposium on theory of computing (STOC) (pp. 265–274).

    Google Scholar 

  • Strandmark, P., & Kahl, F. (2010). Parallel and distributed graph cuts by dual decomposition. In IEEE conference on computer vision and pattern recognition (CVPR), June 2010.

    Google Scholar 

  • Svitkina, Z., & Tardos, É. (2006). Facility location with hierarchical facility costs. In ACM-SIAM symposium on discrete algorithms (SODA).

    Google Scholar 

  • Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In European conference on computer vision (ECCV) (pp. 16–29).

    Google Scholar 

  • Torr, P. H. S. (1998). Geometric motion segmentation and model selection. In Philosophical transactions of the royal society A (pp. 1321–1340).

    Google Scholar 

  • Torr, P. H. S., & Murray, D. (1994). Stochastic motion clustering. In European conference on computer vision (ECCV).

    Google Scholar 

  • Veksler, O. (1999). Efficient graph-based energy minimization methods in computer vision. PhD thesis, Cornell University.

  • Werner, T. (2008). High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF). In IEEE conference on computer vision and pattern recognition (CVPR), June 2008.

    Google Scholar 

  • Woodford, O. J., Rother, C., & Kolmogorov, V. (2009). A global perspective on map inference for low-level vision. In International conference on computer vision (ICCV), October 2009.

    Google Scholar 

  • Yuan, J., & Boykov, Y. (2010). TV-based multi-label image segmentation with label cost prior. In British machine vision conference (BMVC), September 2010.

    Google Scholar 

  • Zhou, Q., Wu, T., Liu, W., & Zhu, S.-C. (2011). Scene parsing by data-driven cluster sampling. International Journal of Computer Vision.

  • Zhu, S.-C., & Yuille, A. L. (1996). Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(9), 884–900.

    Article  Google Scholar 

Download references

Acknowledgements

We wish to thank the anonymous reviewers for careful reading and helpful comments. This work was supported by NSERC Discovery Grant R3584A02, the Canadian Foundation for Innovation (CFI), and the Early Researcher Award (ERA) program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Delong.

Appendices

Appendix A: Proof of Metric Relationships

Pair (V,π) forms a tree metric if V represents an edge-weighted distance in tree π. This means that V(α,β)=d(α,β) where d(α,β) is the sum of edge weights d ij ≥0 along a path from leaf α to leaf β. A tree metric (V,π) is therefore entirely parameterized by its edge weights d ij where j=π(i). An r-hst metric is just a tree metric where edge costs get cheaper by a factor of \(\frac{1}{r}<1\) as we descend the tree, i.e. \(d_{ij} \leq\frac{1}{r} d_{jk}\) for j=π(i),k=π(j). So, r-hst metrics are a subclass of tree metrics by definition.

[Tree metrics ⊂ h-metrics]: For a tree metric to be an h-metric, d must satisfy (according to Definition 3, p. 7)

(19)

For each \(i \in\mathcal{L}\cup\mathcal{S}\) use shorthand j=π(i) and consider that

(20)
(21)
(22)
(23)

Use inequalities (20) and (21) to replace the left-hand side of (19) and cancel terms with (22) and (23) to get d(α 1,i)+d(i,α 2)≤d(α 1,j)+d(j,α 2), which is clearly satisfied since d ij ≥0. To see that some (non-h-Potts) h-metrics are not tree metrics, consider the tree and symmetric smoothness cost V(⋅,⋅) below.

figure l

[(h-Potts ∩ h-metrics) ⊈ tree metrics]: The example below is a simple h-Potts potential which is also an h-metric but is not a tree metric.

figure m

The fact that it is not a tree metric can be verified by setting up a linear program relating edge costs d ij to node costs w i , and noting that the system is infeasible if d ij ≥0.

[(h-Potts with w i w π(i)) ⊂ (h-Potts ∩ tree metrics)]: If node costs \(\{w_{i}\}_{i \in\mathcal{S}\cup\{r\}}\) are non-negative and do not increase as we descend the tree (i.e. w i w π(i)) then we can construct a tree metric by induction. Given some node \(j \in\mathcal{S}\cup\{r\}\), assume we have non-negative edge costs so that, for each child \(i \in\mathcal{I}(j)\), \(d(\alpha,i) = \frac {1}{2}w_{i}\) for all \(\alpha\in\mathcal{L}_{i}\). Then we can assign cost \(d_{ij} = \frac{1}{2}(w_{j} - w_{i})\) to each child edge of j to get \(d(\alpha,j) = \frac{1}{2}w_{j}\) for all \(\alpha\in\mathcal{L}_{j}\). Since w i w j we also have a tree metric for subtree j. It is not necessary to assume w i w π(i) for an h-Potts potential to be a tree metric, as the example below demonstrates (edge costs are shown on the tree).

figure n

[(h-Potts ∩ r-hst metrics) ⊂ (h-Potts with w i w π(i))]: As described by Kumar and Koller (2009), an r-hst metric has a constant edge cost d ij between node j and all of its children \(i \in\mathcal{I}(j)\). In other words, an r-hst metric is actually parameterized by one common ‘edge’ cost per parent node \(\{d_{j}\}_{j \in\mathcal{S}\cup\{r\} }\), where \(0 \leq d_{i} \leq\frac{1}{r} d_{\pi(i)}\) for all \(i \in \mathcal{S}\). It is easy to see that, for an h-Potts potential to be an r-hst metric, it must have w i =w j −2d j where j=π(i). Thus d j ≥0 implies w i w j . Also note that r>1 means quantity w j w i must decrease at a rate of \(\frac {1}{r}\) as we descend the tree.  □

Appendix B: Proof of Theorem 5

Proof

Without loss of generality we assume that all weights w pq =1. Consider any local minimum \(\hat{f}^{j}\) computed by h-fusion at internal node j, and let us choose some child node \(i \in\mathcal{I}(j)\). We first define a useful set of pixels for i with respect to a global optimum f

$$ \mathcal{P}_i = \bigl\{ p : f_p^{*} \in\mathcal{L}_i \bigr\}. $$

This set contains all pixels assigned a label within subtree i, and so for any other child i′≠i we know that \(\mathcal{P}_{i} \cap \mathcal{P}_{i'} = \emptyset\).

We can produce a labeling \(\hat{f}^{j \otimes i}\) within one h-fusion move from local minimum \(\hat{f}^{j}\) as follows:

$$ \hat{f}^{j \otimes i}_p = \left\{ \begin{array}{l@{\quad}l} \hat{f}^i_p & \text{if $p \in\mathcal{P}_{i}$}\\ [3pt] \hat{f}^j_p & \text{otherwise}. \end{array} \right. $$

Since each \(\hat{f}^{j}\) is known to be a local optimum w.r.t. expansion moves for each \(i \in\mathcal{I}(j)\) we know that

$$ E\bigl(\hat{f}^j\bigr) \leq E\bigl( \hat{f}^{j \otimes i}\bigr). $$
(24)

The general strategy to use (24) for different i to build an inequality that is ultimately of the form \(E(\hat{f}^{j}) \leq E(f^{*}) + \mathrm{error}\). This will be achieved by breaking the energy terms in E into parts in such a way that a recursive inequality can be established. The recursive inequality will then be expanded until all terms can be bounded relative to E(f ).

Let \(E(\cdot)|_{\mathcal{A}}\) denote a restriction of the summands of energy (1) to only the following terms:

$$ E(f)|_\mathcal{A} = \sum_{p \in\mathcal{A}} D_p( f_p ) + \sum _{pq \in\mathcal{A}} V( f_p, f_q). $$

We separate the unary and pairwise terms of E(f) via interior, exterior, and boundary sets with respect to pixels \(\mathcal{P}_{i}\):

Let E H (f) denote the total label cost incurred by a labeling f, i.e. the sum of label cost terms. The following facts now hold:

(25)
(26)

We have not accounted for the label costs yet, but for simplicity we break this proof into two parts: part 1 derives the coefficient c related to smoothness costs V, and part 2 derives the coefficient c 2 related to label costs H. For part 1 we can assume there are no label costs at all.

Part 1. Derive Coefficient c for Smoothness Cost Bound

Using (25) and (26) we can cancel out all the \(\overline{\mathcal{A}}_{i}\) terms and rewrite (24) as

$$ E\bigl(\hat{f}^j\bigr)|_{\mathcal{A}_i \cup \partial\mathcal{A}_i} \leq E \bigl(\hat{f}^i\bigr)|_{\mathcal{A}_i} + E\bigl(\hat{f}^{j \otimes i}\bigr)|_{\partial\mathcal{A}_i}. $$
(27)

For each \(i \in\mathcal{I}(j)\) inequality (27) contains a subset of all the energy terms in \(E(\hat{f}^{j})|_{\mathcal{A}_{j}}\) pertaining to pixels \(\mathcal {P}_{i}\). Let \(\mathcal{I}^{*} = \{ i \in\mathcal{I}(j) : \mathcal{P}_{i} \neq \emptyset\}\) be the set of children whose sub-trees contain a label used by f . If we sum inequality (27) over all \(i \in\mathcal{I}^{*}\), the left-hand side will contain all the terms in \(E(\hat{f}^{j})|_{\mathcal{A}_{j}}\) (and more). Adding up all the left-hand sides we have

(28)

Using (28) and likewise adding up the right-hand sides of (27) we have

(29)

The first important observation about (29) is that each \(E(\hat{f}^{i})_{\mathcal{A}_{j}}\) term on the right-hand side can be substituted by recursively applying the inequality itself. We can recursively substitute, branching further and further down the tree, until the path finally stops at a leaf \(\ell\in\mathcal{L}\) giving us base case \(E(\hat{f}^{\ell})|_{\mathcal{A}_{\ell}} = \sum_{p \in \mathcal{P}_{\ell}} D_{p}(f_{p}^{*})\). The sets \(\{\mathcal{P}_{\ell}\}_{\ell\in\mathcal{L}}\) must be disjoint and their union is \(\mathcal{P}_{j}\) so expression (29), when fully expanded, becomes roughly

(30)

The second observation about (29) is that each edge pq on an outer boundary \(\partial\mathcal{A}_{i} \cap\partial \mathcal{A}_{j}\) appears once in the sum over \(\mathcal{I}^{*}\) whereas each edge on an interior boundary \(\partial\mathcal{A}_{i} \setminus\partial\mathcal{A}_{j}\) appears twice: once for \(p \in\mathcal{A}_{i}\) and once for some \(q \in\mathcal{A}_{i'}\). By careful accounting we collect all the \(V(\hat{f}^{i}_{p},\hat{f}^{\pi(i)}_{q})\) terms generated by the recursive substitution and express (29) asFootnote 3

(31)

where we define \(\mathcal{J}(\ell;\ell')\) to be the set of nodes along the path from a label \(\ell\in\mathcal{L}\) up to, but not including, the lowest common ancestor of and ′, namely

All that remains is to bound each \(V(\hat{f}^{i}_{p},\hat{f}^{\pi(i)}_{q})\) in terms of \(V(f^{*}_{p},f^{*}_{q})\) using b i described in Definition 7. From now on we use \(a_{i} = V ^{\max}_{i}\) and \(d_{i} = V^{\min}_{i}\) as shorthand. For a particular edge pq shown in (31) we must have each \(V(\hat{f}^{i}_{p},\hat{f}^{\pi(i)}_{q}) \leq a_{\pi(i)}\) and so their sum is

(32)

We also know that \(V(f^{*}_{p},f^{*}_{q}) \geq d_{\mathrm{lca}(f^{*}_{p},f^{*}_{q})}\) so we can use ratio \(\frac{b_{\mathrm{lca}(f^{*}_{p},f^{*}_{q})}}{d_{\mathrm {lca}(f^{*}_{p},f^{*}_{q})}}\) to bound the approximation error at each edge pq appearing in (31), giving upper-bound

(33)

If j is the root of the tree, then \(\{p \in\mathcal{A}_{j} \} = \mathcal{P}\) and \(\{ pq \in\mathcal{A}_{j} \} = \mathcal{N}\). Using the fact that any ratio \(\frac{b_{i}}{d_{i}}\) is bounded from above by quantity c (Definition 7) we arrive at

(34)
(35)
(36)

This completes the proof of Part 1. When there are only smoothness costs, \(E(\hat{f}) \leq2c E(f^{*})\) where \(\hat{f}\) is the labeling generated at the root of the tree.

Part 2. Derive Coefficient c 2 for Label Cost Bound

We now revisit from (27) onward but with the assumption that there are hierarchical label costs.

Let E H (f) denote the total label cost incurred by a labeling f, i.e.the sum of label cost terms. We can bound the label cost \(E_{H}(\hat{f}^{j \otimes i})\) of our fused labeling by

$$ E_{H}\bigl(\hat{f}^{j \otimes i}\bigr) \leq E_{H}\bigl(\hat{f}^j\bigr) + \sum_{\substack{L \subseteq \mathcal{L}\setminus\hat{\mathcal{L}}_j\\{L}\cap\hat{\mathcal {L}}_i \neq \emptyset}} H(L) $$
(37)

where \(\hat{\mathcal{L}}_{j}\) and \(\hat{\mathcal{L}}_{i}\) are the sets of unique labels appearing in \(\hat{f}^{j}\) and \(\hat{f}^{i}\) respectively.

Recall from Part 1 that, looking at the key inequality (24), we can break the energy terms on each side into parts based on sets \(\mathcal{A}_{i}, \overline{\mathcal {A}}_{i}\), and \(\partial\mathcal{A}_{i}\). Because \(E(\hat{f}^{j \otimes i})|_{\overline{\mathcal{A}}_{i}} = E(\hat{f}^{j})|_{\overline{\mathcal{A}}_{i}}\) these terms cancel out, and we can substitute \(E(\hat{f}^{j \otimes i})|_{\mathcal{A}_{i}} = E(\hat{f}^{i})_{\mathcal{A}_{i}}\). Along with bound (37) and canceling the \(E_{H}(\hat{f}^{j})\) terms we can now rewrite (24) as

$$ E\bigl(\hat{f}^j\bigr)|_{\mathcal{A}_i \cup \partial\mathcal{A}_i} \leq E\bigl(\hat{f}^i\bigr)|_{\mathcal{A}_i} + E\bigl( \hat{f}^{j \otimes i}\bigr)|_{\partial\mathcal{A}_i} + \sum _{\substack{L \subseteq\mathcal{L}\setminus\hat{\mathcal{L}}_j\\ {L}\cap \hat{\mathcal{L}}_i \neq\emptyset}} H(L). $$
(38)

Again, let \(\mathcal{I}^{*} = \{ i \in\mathcal{I}(j) : \mathcal{P}_{i} \neq\emptyset\}\) be the set of child nodes that contain a label used by f in their subtree. We sum inequality (38) over all \(i \in\mathcal {I}^{*}\) to arrive at a recursive expression, this time incorporating errors incurred by label costs. The key observation is that a particular label cost H(L) appears once on the right-hand side for each element in the set \(\mathcal{I}^{*}_{L} = \{ i \in\mathcal{I}^{*} : L \cap\hat{\mathcal{L}} _{i} \neq\emptyset\}\). The sum of inequalities (38) thus implies

(39)

where the quantity in parentheses is identical to that of Part 1.

The above inequality can be recursively expanded for each \(E(\hat{f} ^{i})|_{\mathcal{A}_{i}}\) until the recursion stops at a label used by f . We already know that, after recursive substitution, the quantity in parentheses is bounded by (33). We now must bound the total label cost accumulated by recursive application of (39). The central question is whether a particular subset L that appears in (39) with \(|\mathcal{I}^{*}_{L}|>0\) for node j can appear again when we recursively substitute the children \(i \in\mathcal{I}^{*}\). If the answer were ‘yes’ then each label cost H(L) could appear more than \(|\mathcal{I}^{*}_{L}|\) total times by the end of recursive expansion, leading to a worse bound. Fortunately, Lemma 1 (after this proof) says that this is not the case; each L appearing in the sum for j and child i (38) can never reappear in the sums for i or its children.

From now on we assume j is the root of the tree structure, and so \(\hat{f}^{j} = \hat{f}\), i.e.the final labeling output by h-fusion. If we let \(\mathcal{H}^{*}\) denote the set of all subsets L generated by recursive substitution of (39), we can thereby write

(40)

Note that the left-hand side of (40) is still \(E(\hat{f}^{j})|_{\mathcal{A}_{j}}\) which does not include the label costs incurred by \(\hat{f}^{j}\). By adding \(E_{H}(\hat{f}^{j})\) to both sides we have \(E(\hat{f} ^{j})|_{\mathcal{A}_{j}} + E_{H}(\hat{f}^{j}) = E(\hat{f})\) on the left side, giving a new inequality below.

(41)

All that is left is to re-group the summands in the last three terms (the label cost terms) in a way that proves our theorem. First we rewrite the three sums more explicitly, using \(\hat{\mathcal{L}}\) and L to denote the unique labels used by \(\hat{f}= \hat{f}^{j}\) and f respectively.

(42)

First note that if \(|\mathcal{I}^{*}_{L}|>1\) then this means \(L \supset \mathcal{L}_{i}\) for some \(\mathcal{L}_{i} \cap L^{*} \neq\emptyset\) and so LL ≠∅ also. We can break the last sum in (42) into two parts based on whether LL ≠∅.

(43)

We can also show that \(L \in\mathcal{H}^{*} \Rightarrow L \cap \hat{\mathcal{L}}= \emptyset\) as follows. If \(L \in\mathcal{H}^{*}\) then there must be some node i such that \(L \cap\hat{\mathcal{L}}_{i} = \emptyset\) and \(L \subset\mathcal{L}_{i}\). We know from (52) in Lemma 1 that \(\hat{\mathcal{L}}\cap\mathcal{L}_{i} \subseteq\hat{\mathcal {L}}_{i}\), so this implies \(\emptyset= L \cap\hat{\mathcal{L}}_{i} \supseteq L \cap(\hat {\mathcal{L}}\cap \mathcal{L}_{i}) = L \cap\hat{\mathcal{L}}\). This means the two leftmost sums of (43) have disjoint L and can be bounded by simply \(\sum_{L \in\mathcal{H}} H(L)\). It furthermore implies that, for every L appearing in the rightmost sum of (43), the same L must appear in the negative sum. Putting these together we have upper bound on label costs

(44)

We can therefore revise bound (41) to

(45)

Inequality (45) is main result of our theorem.  □

Lemma 1

If label subset L appears in the summand of (38) for node j and child i, then L does not appear in the summands of (38) for any k∈subtree(i).

Proof

To be clear, let us restate the claim more formally. Let \(\mathcal{H}^{j \otimes i}\) denote all subsets L appearing in the label cost summands of (38) when applied to node j and child i, i.e.

$$ \mathcal{H}^{j \otimes i} \> \lower1pt \hbox{$\buildrel\mathrm{def} \over= $}\>\{ L : L \cap \hat{\mathcal{L}}_j = \emptyset, L \cap\hat{\mathcal{L}}_i \neq \emptyset\}. $$
(46)

We must prove that \(L \in\mathcal{H}^{j \otimes i} \Rightarrow L \notin \mathcal{H}^{k \otimes l}\) for any k∈subtree(i) and \(l \in \mathcal{I}(k)\).

First note that for each \(L \in\mathcal{H}^{j \otimes i}\) we have

(47)
(48)

By the hierarchical label cost assumption (Definition 4) we can use (47) and (48) to conclude that \(L \in \mathcal{H}^{j \otimes i} \Rightarrow L \subset\mathcal{L}_{j}\).

Now consider the set \(\mathcal{H}^{j \otimes i} \cap\mathcal{H}^{k \otimes l}\). By the definition (46) an element L of this joint set must satisfy at least the following conditions:

(49)
(50)
(51)

However, no subset L can satisfy all three conditions, as we now show. In the h-fusion algorithm, if \(\hat{f}^{i}\) contains a label \(\ell\in\mathcal{L}_{k}\), then \(\hat{f}^{k}\) must contain as well—after all, there is no other way that a label in \(\mathcal{L}_{k}\) could have propagated up to \(\hat{f}^{i}\). This relation can be restated as

$$ \hat{\mathcal{L}}_i \cap\mathcal{L}_k \subseteq\hat{\mathcal{L}}_k \quad\forall k \in\mathrm{subtree}(i). $$
(52)

Starting from (49) we can say

which contradicts requirement (50). Thus \(\mathcal{H}^{j \otimes i} \cap\mathcal{H}^{k \otimes l} = \emptyset\) for all k∈subtree(i) and so L cannot reappear. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delong, A., Gorelick, L., Veksler, O. et al. Minimizing Energies with Hierarchical Costs. Int J Comput Vis 100, 38–58 (2012). https://doi.org/10.1007/s11263-012-0531-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-012-0531-x

Keywords

Navigation