Inference Methods for CRFs with Co-occurrence Statistics

Ladický, Ľubor; Russell, Chris; Kohli, Pushmeet; Torr, Philip H. S.

doi:10.1007/s11263-012-0583-y

Inference Methods for CRFs with Co-occurrence Statistics

Published: 09 November 2012

Volume 103, pages 213–225, (2013)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Ľubor Ladický¹,
Chris Russell²,
Pushmeet Kohli³ &
…
Philip H. S. Torr⁴

1070 Accesses
34 Citations
Explore all metrics

Abstract

The Markov and Conditional random fields (CRFs) used in computer vision typically model only local interactions between variables, as this is generally thought to be the only case that is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the CRF. We show how they can be readily optimised using standard graph cut algorithms at little extra expense compared to a standard pairwise field. This result can be directly used for the problem of class based image segmentation which has seen increasing recent interest within computer vision. Here the aim is to assign a label to each pixel of a given image from a set of possible object classes. Typically these methods use random fields to model local interactions between pixels or super-pixels. One of the cues that helps recognition is global object co-occurrence statistics, a measure of which classes (such as chair or motorbike) are likely to occur in the same image together. There have been several approaches proposed to exploit this property, but all of them suffer from different limitations and typically carry a high computational cost, preventing their application on large images. We find that the new model we propose produces a significant improvement in the labelling compared to just using a pairwise model and that this improvement increases as the number of labels increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decision Tree Fields: An Efficient Non-parametric Random Field Model for Image Labeling

Human Pose Estimation with Fields of Parts

The Potts Model with Different Piecewise Constant Representations and Fast Algorithms: A Survey

References

Benson, H. Y.,& Shanno, D. F. (2007). An exact primal—dual penalty method approach to warmstarting interior-point methods for linear programming. Computational Optimization and Applications, 38(3), 371–399.
Google Scholar
Borenstein, E.,& Malik, J. (2006). Shape guided object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, (pp. 969–976) New York.
Boykov, Y., Veksler, O.,& Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Google Scholar
Choi, M. J., Lim, J. J., Torralba, A.,& Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In IEEE Conference on Computer Vision and Pattern Recognition, San Francisco.
Comaniciu, D.,& Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 603–619.
Google Scholar
Csurka, G.,& Perronnin, F. (2008). A simple high performance approach to semantic segmentation. In British Machine Vision Conference08, Leeds.
Delong, A., Osokin, A., Isack, H.,& Boykov, Y. (2010). Fast approximate energy minimization with label costs. In IEEE Conference on Computer Vision and Pattern Recognition, San Francisco.
Felzenszwalb, P. F.,& Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Google Scholar
Galleguillos, C., Rabinovich, A.,& Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In IEEE Conference on Computer Vision and Pattern Recognition, Anchorage.
Gould, S., Fulton, R.,& Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In International Conference on Computer Vision, Kyoto.
Heitz, G.,& Koller, D. (2008). Learning spatial context: Using stuff to find things. In European Conference on Computer Vision, Marseille.
Hoiem, D., Rother, C.,& Winn, J. M. (2007). 3d layoutcrf for multi-view object class recognition and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, San Diego.
Kleinberg, J.,& Tardos, E. (2002). Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. Journal of the ACM, 49(5), 616–639.
Google Scholar
Kohli, P., Ladicky, L.,& Torr, P. H. S. (2008). Robust higher order potentials for enforcing label consistency. In IEEE Conference on Computer Vision and Pattern Recognition, Anchorage.
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.
Google Scholar
Kolmogorov, V.,& Rother, C. (2006). Comparison of energy minimization algorithms for highly connected graphs. In Proceedings of European Conference on Computer Vision (pp. 1–15). Heidelberg: Springer.
Komodakis, N., Tziritas, G.,& Paragios, N. (2007). Fast, approximately optimal solutions for single and dynamic mrfs. In IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN.
Kumar, M.,& Torr, P. H. S. (2008). Efficiently solving convex relaxations for map estimation. In International Conference on Machine Learning. New York: ACM.
Kumar, M. P., Veksler, O.,& Torr, P. H. S. (2011). Improved moves for truncated convex models. Journal of Machine Learning Research, 12, 31–67.
Google Scholar
Ladicky, L., Russell, C., Kohli, P.,& Torr, P. H. S. (2009). Associative hierarchical crfs for object class image segmentation. In International Conference on Computer Vision.
Ladicky, L., Russell, C., Sturgess, P., Alahari, K.,& Torr, P. H. S. (2010). What, where and how many? combining object detectors and crfs. European Conference on Computer Vision.
Lafferty, J., McCallum, A.,& Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In International Conference on Machine Learning.
Larlus, D.,& Jurie, F. (2008). Combining appearance models and markov random fields for category level object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.
Narasimhan, M.,& Bilmes, J. A. (2005). A submodular-supermodular procedure with applications to discriminative structure learning. In Uncertainty in Artificial Intelligence (pp. 404–412).
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E.,& Belongie, S. (2007). Objects in context. In International Conference on Computer Vision, Rio de Janeiro.
Ren, X., Fowlkes, C.,& Malik, J. (2005). Mid-level cues improve boundary detection. Tech. Rep. UCB/CSD-05-1382, EECS Department, University of California, Berkeley.
Rother, C., Kumar, S., Kolmogorov, V.,& Blake, A. (2005). Digital tapestry. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 589–596).
Russell, B., Freeman, W., Efros, A., Sivic, J.,& Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In IEEE Conference on Computer Vision and Pattern Recognition.
Russell, C., Ladicky, L., Kohli, P.,& Torr, P. H. S. (2010). Exact and approximate inference in associative hierarchical networks using graph cuts. Uncertainty in Artificial Intelligence, Catalina Island, CA.
Schlesinger, M. (1976). Syntactic analysis of two-dimensional visual signals in noisy conditions. Kibernetika, 4, 113–130. (in Russian).
Google Scholar
Schölkopf, B.,& Smola, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Adoptive Computation& Machine Learning. Cambridge, MA: MIT Press.
Shi, J.,& Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell, 22(8), 888–905.
Google Scholar
Shotton, J., Winn, J., Rother, C.,& Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European Conference on Computer Vision (Vol. 1, pp 1–15).
Sturgess, P., Ladicky, L., Crook, N.,& Torr, P. H. S. (2012). Scalable cascade inference for semantic image segmentation. In British Machine Vision Conference.
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., et al. (2006). A comparative study of energy minimization methods for markov random fields. In European Conference on Computer Vision.
Torr, P. H. S. (1998). Geometric motion segmentation and model selection [and discussion]. Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 356(1740), 1321–1340.
Google Scholar
Torralba, A., Murphy, K. P., Freeman, W. T.,& Rubin, M. A. (2003). Context-based vision system for place and object recognition. In Proceedings of the Nineth IEEE International Conference on Computer Vision.
Toyoda, T.,& Hasegawa, O. (2008). Random field model for integration of local information and global information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(8), 1483–1489.
Google Scholar
Wainwright, M., Jaakkola, T.,& Willsky, A. (2002). Map estimation via agreement on (hyper)trees: Messagepassing and linear programming approaches. Cambridge, MA: MIT Press.
Wainwright, M., Jaakkola, T.,& Willsky, A. (2005). Map estimation via agreement on trees: Message-passing and linear programming. IEEE Transactions on Information Theory (pp. 3697–3717).
Weiss, Y.,& Freeman, W. (2001). On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47(2), 723–735.
Google Scholar
Werner, T. (2007). A linear programming approach to max-sum problem: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(7), 1165–1179.
Google Scholar
Yang, L., Meer, P.,& Foran, D. J. (2007). Multiple class segmentation using a unified framework over mean-shift patches. In IEEE Conference on Computer Vision and Pattern Recognition.

Download references

Acknowledgments

This study was supported by EPSRC research grants, HMGCC, the IST Programme of the European Community, under the PASCAL2 Network of Excellence, IST-2007-216886. P. H. S. Torr is in receipt of Royal Society Wolfson Research Merit Award.

Author information

Authors and Affiliations

University of Oxford, Oxford, UK
Ľubor Ladický
Queen Mary College, University of London, London, UK
Chris Russell
Microsoft Research, Cambridge, UK
Pushmeet Kohli
Oxford Brookes University, Oxford, UK
Philip H. S. Torr

Authors

Ľubor Ladický
View author publications
You can also search for this author in PubMed Google Scholar
Chris Russell
View author publications
You can also search for this author in PubMed Google Scholar
Pushmeet Kohli
View author publications
You can also search for this author in PubMed Google Scholar
Philip H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ľubor Ladický.

Additional information

Ľubor Ladický, Chris Russell contributed equally and have joint first authorship.

Appendix

Proof of Lemma 1

First we show that:

$$\begin{aligned} E_\alpha (\mathbf{t})&= \min _{z_\alpha } [(C_{\alpha \beta } - C_{\beta }) (1-z_\alpha ) \nonumber \\&+ \sum _{i \in \mathcal{V}_{\alpha \beta }} (C_{\alpha \beta } - C_{\beta }) (1-t_i) z_\alpha ] \nonumber \\&= \left\{ \begin{array}{ll} 0&\text{ if} \forall i \in \mathcal{V}_{\alpha \beta }: t_i = 1,\\ C_{\alpha \beta } - C_\beta&\text{ otherwise}. \end{array} \right. \end{aligned}$$

(61)

If $\forall i \in \mathcal{V}_{\alpha \beta } : t_i = 1$ then $\sum _{i \in \mathcal{V}_{\alpha \beta }} (C_{\alpha \beta } - C_{\beta }) (1-t_i) z_\alpha = 0$ and the minimum cost cost $0$ occurs when $z_\alpha =1.$ If $\exists i \in \mathcal{V}_{\alpha \beta } , t_i = 0$ the minimum cost labelling occurs when $z_\alpha =0$ and the minimum cost is $C_{\alpha \beta }-C_\beta .$ Similarly:

$$\begin{aligned} E_\beta (\mathbf{t})&= \min _{z_\beta } [(C_{\alpha \beta } - C_{\alpha }) z_\beta \nonumber \\&+ \sum _{i \in \mathcal{V}_{\alpha \beta }} (C_{\alpha , \beta } - C_{\alpha }) t_i (1 - z_\beta )] \nonumber \\&= \left\{ \begin{array}{l@{\quad }l} 0&\text{ if} \forall i \in \mathcal{V}_{\alpha \beta }: t_i = 0,\\ C_{\alpha \beta } - C_\alpha&\text{ otherwise}. \end{array} \right. \end{aligned}$$

(62)

By inspection, if $\forall i \in \mathcal{V}_{\alpha \beta } : t_i = 0$ then $\sum _{i \in \mathcal{V}_{\alpha \beta }}(C_{\alpha , \beta } - C_{\alpha }) t_i (1 - z_\beta ) = 0$ and the minimum cost cost 0 occurs when $z_\beta =0.$ If $\exists i \in \mathcal{V}_{\alpha \beta } , t_i = 1$ the minimum cost labelling occurs when $z_\beta =1$ and the minimum cost is $C_{\alpha \beta } - C_\alpha .$

For all three cases (all pixels take label $\alpha ,$ all pixels take label $\beta $ and mixed labelling) $E(\mathbf{t}) = E_\alpha (\mathbf{t}) + E_\beta (\mathbf{t}) + C_{\alpha } + C_{\beta } - C_{\alpha \beta }.$ The construction of the $\alpha \beta {\text{-swap} }$ move is similar to the Robust $P^N$ model (Kohli et al. 2008).$\square $

See Figs. 2 and 3 for graph construction.

Proof of Lemma 2

Similarly to the $\alpha \beta {\text{-swap} }$ proof we can show:

$$\begin{aligned} E_\alpha (\mathbf{t})&= \min _{z_\alpha } \bigg [k^{\prime }_\alpha (1-z_\alpha ) + \sum _{i \in \mathcal{V}} k^{\prime }_\alpha (1-t_i) z_\alpha \bigg ]\nonumber \\&= \left\{ \begin{array}{l@{\quad }l} k^{\prime }_\alpha&\text{ if} \exists i \in \mathcal{V} \text{ s.t.} t_i = 0,\\ 0&\text{ otherwise} . \end{array} \right. \end{aligned}$$

(63)

If $\exists i \in \mathcal{V} s.t. t_i = 0,$ then $\sum _{i \in \mathcal{V}} k^{\prime }_\alpha (1-t_i) \ge k^{\prime }_\alpha ,$ the minimum is reached when $z_\alpha = 0$ and the cost is $k^{\prime }_\alpha .$

If $\forall i \in \mathcal{V} : t_i = 1$ then $k^{\prime }_\alpha (1-t_i) z_\alpha = 0,$ the minimum is reached when $z_\alpha = 1$ and the cost becomes $0.$

For all other $l \in A$:

$$\begin{aligned} E_b(\mathbf{t})&= \min _{z_l} \bigg [k^{\prime \prime }_l z_l + \sum _{i \in \mathcal{V}_l} k^{\prime \prime }_l t_i (1 - z_l) \bigg ]\nonumber \\&= \left\{ \begin{array}{l@{\quad }l} k^{\prime \prime }_l&\text{ if} \exists i \in \mathcal{V}_l \text{ s.t.} t_i = 1,\\ 0&\text{ otherwise} . \end{array} \right. \end{aligned}$$

(64)

If $\exists i \in \mathcal{V}_l$ s.t. $t_i = 1,$ then $\sum _{i \in \mathcal{V}_l} k^{\prime \prime }_l t_i \ge k^{\prime \prime }_l,$ the minimum is reached when $z_l = 1$ and the cost is $k^{\prime \prime }_l.$

If $\forall i \in \mathcal{V}_l : t_i = 0$ then $\sum _{i \in \mathcal{V}_l} k^{\prime \prime }_l t_i (1 - z_l) = 0,$ the minimum is reached when $z_l = 1$ and the cost becomes $0.$

By summing up the cost $E_\alpha (\mathbf{t})$ and $|A|$ costs $E_l(\mathbf{t})$ we get $E^{\prime }(\mathbf{t}) = E_\alpha (\mathbf{t}) + \sum _{l \in A} E_l(\mathbf{t}).$ If $\alpha $ is already present in the image $k^{\prime }_\alpha = 0$ and edges with this weight and variable $z_\alpha $ can be ignored. $\square $

See Figs. 2 and 3 for graph construction.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ladický, Ľ., Russell, C., Kohli, P. et al. Inference Methods for CRFs with Co-occurrence Statistics. Int J Comput Vis 103, 213–225 (2013). https://doi.org/10.1007/s11263-012-0583-y

Download citation

Received: 21 January 2011
Accepted: 30 September 2012
Published: 09 November 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11263-012-0583-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inference Methods for CRFs with Co-occurrence Statistics

Abstract

Access this article

Similar content being viewed by others

Decision Tree Fields: An Efficient Non-parametric Random Field Model for Image Labeling

Human Pose Estimation with Fields of Parts

The Potts Model with Different Piecewise Constant Representations and Fast Algorithms: A Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Proof of Lemma 1

Proof of Lemma 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inference Methods for CRFs with Co-occurrence Statistics

Abstract

Access this article

Similar content being viewed by others

Decision Tree Fields: An Efficient Non-parametric Random Field Model for Image Labeling

Human Pose Estimation with Fields of Parts

The Potts Model with Different Piecewise Constant Representations and Fast Algorithms: A Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proof of Lemma 1

Proof of Lemma 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation