Probabilistic Joint Image Segmentation and Labeling by Figure-Ground Composition

Ion, Adrian; Carreira, João; Sminchisescu, Cristian

doi:10.1007/s11263-013-0663-7

Probabilistic Joint Image Segmentation and Labeling by Figure-Ground Composition

Published: 17 November 2013

Volume 107, pages 40–57, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Adrian Ion¹,
João Carreira² &
Cristian Sminchisescu^4,3

1166 Accesses
15 Citations
Explore all metrics

Abstract

We propose a layered statistical model for image segmentation and labeling obtained by combining independently extracted, possibly overlapping sets of figure-ground (FG) segmentations. The process of constructing consistent image segmentations, called tilings, is cast as optimization over sets of maximal cliques sampled from a graph connecting all non-overlapping figure-ground segment hypotheses. Potential functions over cliques combine unary, Gestalt-based figure qualities, and pairwise compatibilities among spatially neighboring segments, constrained by T-junctions and the boundary interface statistics of real scenes. Building on the segmentation layer, we further derive a joint image segmentation and labeling model (JSL) which, given a bag of FGs, constructs a joint probability distribution over both the compatible image interpretations (tilings) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, followed by sampling labelings conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on maximum likelihood with a novel estimation procedure we refer to as incremental saddle-point approximation. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that are rated as probable by candidate models during learning. State of the art results are reported on the Berkeley, Stanford and Pascal VOC datasets, where an improvement of 28 % was achieved for the segmentation task only (tiling), and an accuracy of 47.8 % was obtained on the test set of VOC12 for semantic labeling (JSL).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parametric Image Segmentation of Humans with Structural Shape Priors

Object Delineation by Iterative Dynamic Trees

Automatic Image Semantic Segmentation by MRF with Transformation-Invariant Shape Priors

Notes

The segments are defined by the contained pixels and have fixed positions in the image–they cannot be moved like puzzle pieces. Moreover, while disallowing overlap increases the exposure to imperfect boundary alignments between segments selected in any single tiling, it leads to a dramatic reduction in the solution space and does not raise additional issues with assigning pixels lying on segment intersections.
We call a segmentation assembled from non-overlapping figure-ground segments a tiling, and the tiling together with the set of corresponding labels for its segments a labeling (rather than a labeled tiling). Assigning a label to a segment also assigns this label to all the pixels of the segment.
This approximation is similar in spirit to the one in Sect. 2.2. By enforcing at least one tiling to be retained for each segment, we aim for a uniform spread of the sampled tilings, which at the same time correspond to modes of the probability distribution.
For our implementations, the actual running-times where 180.3 h for the PMA and 1.3 h for the incremental saddle-point. The used computer was an Intel Xeon workstation.
An alternative strategy to approximate the partition function by using samples from the target distribution is contrastive divergence (Hinton 2002). Samples are obtained by running an MCMC chain for a limited number of steps. The obtained estimate is biased, but has been observed to perform well in practice.
As previously done by the method we compare to, when evaluating FG-Tiling, only the annotated regions are considered.
The tiling parameters have been learned for BSDS on the training set, for Stanford over \(5\) folds, and for VOC2009 on the training set, respectively, using the methodology described in Sect. 2.2.
We also selected the scale parameter that optimized the First score on each dataset.
The 1 min slot given to Enum (1min) is about 7.5\(\times \) the average run-time of FG-Tiling on the BSDS test set. Without the time constraint, Enum did not finish enumerating cliques after 48 h on an image where a pool of \(|\mathcal {S}|=120\) figure-ground segmentations were used.
Recall, the VOC score is defined as the average per-class overlap between pixels labeled in each class and the respective ground truth annotation.

References

Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2009). From contours to regions: An empirical evaluation. In: IEEE International Conference on Computer Vision and Pattern Recognition.
Arbelaez, P., Hariharan, B., Gu, C., Gupta, S., Bourdev, L. D., & Malik, J. (2012). Semantic segmentation using regions and parts. In: IEEE International Conference on Computer Vision and Pattern Recognition.
Bagon, S., Boiman, O., & Irani, M. (2008). What is a good image segment? a unified approach to segment extraction. In: European Conference on Computer Vision.
Barbu, A., & Zhu, S. C. (2005). Generalizing swendsen-wang to sampling arbitrary posterior probabilities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1239–1253.
Article Google Scholar
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.
MATH Google Scholar
Bomze, I., Budinich, M., Pardalos, P., & Pelillo, M. (1999). Handbook of combinatorial optimization (pp. 1–74). Dordrecht: Kluwer Academic.
Book Google Scholar
Bomze, I., Pelillo, M., & Stix, V. (2000). Approximating the maximum weight clique using replicator dynamics. IEEE Transactions on Neural Networks, 11(6), 1228–1241.
Google Scholar
Brendel, W., & Todorovic, S. (2010). Segmentation as maximum-weight independent set. In: Advances in Neural Information Processing Systems.
Carreira, J., & Sminchisescu, C. (2012). Cpmc: Automatic object segmentation using constrained parametric min-cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7), 1312–1328.
Google Scholar
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012a). Semantic segmentation with second-order pooling. In: European Conference on Computer Vision.
Carreira, J., Li, F., & Sminchisescu, C. (2012b). Object recognition by sequential figure-ground ranking. International Journal of Computer Vision, 98(3), 243–262.
Google Scholar
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Article Google Scholar
Cour, T., Gogin, N., & Shi, J. (2005). Learning spectral graph segmentation. In: Artificial Intelligence and Statistics.
Csurka, G., & Perronnin, F. (2010). An efficient approach to semantic segmentation. International Journal of Computer Vision, 88, 1–15.
Google Scholar
Dann, C., Gehler, P. V., Roth, S., & Nowozin, S. (2012). Pottics—the potts topic model for semantic image segmentation. In: Proceedings of DAGM/OAGM Symposium.
Endres, I., & Hoiem, D. (2010). Category independent object proposals. In: European Conference on Computer Vision.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J. & Zisserman, A. (2012). The PASCAL visual object classes challenge (VOC) results. http://www.pascal-network.org/challenges/VOC/.
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1915–1929.
Article Google Scholar
Felzenszwalb, P., & Huttenlocher, D. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Article Google Scholar
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In: IEEE International Conference on Computer Vision.
Ghose, T., & Palmer, S. (2005). Surface convexity and extremal edges in depth and figure-ground perception. Journal of Vision, 5(8), 970–970.
Article Google Scholar
Gonfaus, J. M., Boix, X., van de Weijer, J., Bagdanov, A. D., Serrat, J., & Gonzalez, J. (2010). Harmony potentials for joint classification and segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition.
Gould, S., Rodgers, J., Cohen, D., Elidan, G., & Koller, D. (2008). Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80(3), 300–316.
Article Google Scholar
Gould, S., Fulton, R., & Koller, D. (2009a). Decomposing a scene into geometric and semantically consistent regions. In: IEEE International Conference on Computer Vision.
Gould, S., Gao, T., & Koller, D. (2009b). Region-based segmentation and object detection. In: Advances in Neural Information Processing Systems.
He, X., Zemel, R. S., & Carreira-Perpinan, M. (2004). Multiscale conditional random fields for image labeling. In: IEEE International Conference on Computer Vision and Pattern Recognition.
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.
Article MATH MathSciNet Google Scholar
Hoiem, D., Efros, A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.
Article Google Scholar
Huggins, P., Chen, H., Belhumeur, P., & Zucker, S. (2001). Finding folds: On the appearance and identification of occlusion. In: IEEE International Conference on Computer Vision and Pattern Recognition.
Ion, A., Carreira, J., & Sminchisescu, C. (2011a). Image segmentation by figure-ground composition into maximal cliques. In: IEEE International Conference on Computer Vision.
Ion, A., Carreira, J., & Sminchisescu, C. (2011b). Probabilistic joint image segmentation and labeling. In: Advances in Neural Information Processing Systems.
Kohli, P., Ladicky, L., & Torr, P. (2008). Robust higher order potentials for enforcing label consistency. In: IEEE International Conference on Computer Vision and Pattern Recognition.
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.
Article Google Scholar
Kumar, M. P., & Koller, D. (2010). Efficiently selecting regions for scene understanding. In: IEEE International Conference on Computer Vision and Pattern Recognition.
Kumar, S., August, J., & Hebert, M. (2005). Exploiting inference for approximate parameter learning in discriminative fields: An empirical study. In: Energy Minimization Methods in Computer Vision and Pattern Recognition.
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009). Associative hierarchical crfs for object class image segmentation. In: IEEE International Conference on Computer Vision.
Ladicky, L., Sturgess, P., Alaharia, K., Russel, C., & Torr, P. (2010). What, where & how many ? combining object detectors and crfs. In: European Conference on Computer Vision.
Leichter, I. & Lindenbaum, M., (2009). Boundary ownership by lifting to 2.1d. In: IEEE International Conference on Computer Vision.
Li, F., Ionescu, C., & Sminchisescu, C. (2010). Random Fourier approximations for skewed multiplicative histogram kernel. In: Proceedings of DAGM Symposium.
Lim, J., Arbelaez, P., Gu, C., & Malik, J. (2009). Context by region ancestry. In: IEEE International Conference on Computer Vision.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91– 110.
Article Google Scholar
Malisiewicz, T., & Efros, A. (2007). Improving spatial support for objects via multiple segmentations. In: British Machine Vision Conference.
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: IEEE International Conference on Computer Vision.
Nowozin, S., Gehler, P., & Lampert, C. (2010). On parameter learning in crf-based approaches to object class image segmentation. In: European Conference on Computer Vision.
Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In: European Conference on Computer Vision.
Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems.
Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In: IEEE International Conference on Computer Vision.
Ren, X., Fowlkes, C., & Malik, J. (2006). Figure/ground assignment in natural images. In: European Conference on Computer Vision.
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.
Article Google Scholar
Sarawagi, S., & Cohen, W. W. (2004). Semi-markov conditional random fields for information extraction. In: Advances in Neural Information Processing Systems.
Sharon, E., Galun, M., Sharon, D., Basri, R., & Brandt, A. (2006). Hierarchy and adaptivity in segmenting visual scenes. Nature, 442(7104), 719–846.
Article Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81, 2–23.
Google Scholar
Tu, Z., Chen, X., Yuille, A., & Zhu, S. C. (2003). Image parsing: unifying segmentation, detection, and recognition. In: IEEE International Conference on Computer Vision.
Xia, W., Song, Z., Feng, J., Cheong, L.F. & Yan, S. (2012). Segmentation over detection by coupled global and local sparse representations. In: European Conference on Computer Vision.
Yang, Y., Hallman, S., Ramanan, D., & Fowlkes, C. C. (2012). Layered object models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1731–1743.
Article Google Scholar

Download references

Acknowledgments

This work was supported, in part, by CNCS-UEFICSDI, under PCE-2011-3-0438, and CT-ERC-2012-1, and by FCT under PTDC/EEA-CRO/122812/2010. The authors thank the anonymous reviewers for their useful comments and suggestions.

Author information

Authors and Affiliations

Faculty of Informatics, Vienna University of Technology, Vienna, Austria
Adrian Ion
Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
João Carreira
Department of Mathematics, Faculty of Engineering, Lund University, Lund, Sweden
Cristian Sminchisescu
Institute of Mathematics of the Romanian Academy, Bucharest, Romania
Cristian Sminchisescu

Authors

Adrian Ion
View author publications
You can also search for this author in PubMed Google Scholar
João Carreira
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Sminchisescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristian Sminchisescu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ion, A., Carreira, J. & Sminchisescu, C. Probabilistic Joint Image Segmentation and Labeling by Figure-Ground Composition. Int J Comput Vis 107, 40–57 (2014). https://doi.org/10.1007/s11263-013-0663-7

Download citation

Received: 14 March 2013
Accepted: 30 September 2013
Published: 17 November 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11263-013-0663-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic Joint Image Segmentation and Labeling by Figure-Ground Composition

Abstract

Access this article

Similar content being viewed by others

Parametric Image Segmentation of Humans with Structural Shape Priors

Object Delineation by Iterative Dynamic Trees

Automatic Image Semantic Segmentation by MRF with Transformation-Invariant Shape Priors

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probabilistic Joint Image Segmentation and Labeling by Figure-Ground Composition

Abstract

Access this article

Similar content being viewed by others

Parametric Image Segmentation of Humans with Structural Shape Priors

Object Delineation by Iterative Dynamic Trees

Automatic Image Semantic Segmentation by MRF with Transformation-Invariant Shape Priors

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation