Elsevier

Pattern Recognition Letters

Volume 33, Issue 16, 1 December 2012, Pages 2245-2253
Pattern Recognition Letters

Combining boundary and region features inside the combinatorial pyramid for topology-preserving perceptual image segmentation

https://doi.org/10.1016/j.patrec.2012.07.009Get rights and content

Abstract

Combinatorial pyramids represent the image as a stack of successively reduced combinatorial maps, which encode the whole image at different levels of abstraction. Within this framework, this paper proposes to conduct the perceptual organization of the image content in two consecutive stages. The first stage builds the lower set of levels of the hierarchy according to simple face (regions) features (colour and size). On the top of this hierarchy, the second stage will mainly employ boundary features, encoded in the darts of the combinatorial maps, to obtain a second set of levels of abstraction. The Berkeley data set BSDS300 is used to quantitatively compare the performance of the proposal to a number of perceptual grouping approaches, showing that it yields better or similar results than most of these algorithms while offering two interesting features: computation at multiple image resolutions and preservation of the image topology.

Highlights

Image content is perceptually organized in a bottom-up fashion. ► Image topology is preserved at all levels of the hierarchy. ► Edges and regions descriptors are efficiently combined. ► The approach has been successfully evaluated using the challenging BSDS300.

Introduction

When the goal of an image processing algorithm is to divide the input image in a manner similar to human beings, the adopted strategy cannot simply be the grouping of image pixels into clusters (regions or boundaries) taking into account low-level photometric properties (Martin et al., 2004, Arbeláez et al., 2011). Natural images are generally composed of physically disjoint objects whose associated groups of image pixels may not be visually uniform. Hence, it is very difficult to formulate what should be recovered as a region or boundary from an image or to separate complex objects from a natural scene (Lau and Levine, 2002). With the aim of organizing low-level image features into higher level relational structures, the perceptual organization of the image content is usually thought as a process of grouping visual information into a hierarchy of levels of abstraction. Starting from the lower level of the hierarchy (i.e. the input image or an initial partition), each new layer groups the regions of the level below into a reduced set of regions. This grouping needs to define a region model (the features that describe each image region) and a dissimilarity measure (the metric on those features) (Brox and Farin, 2001). Moreover, it is interesting for efficiency reason that the grouping can simultaneously merge more than two regions.

According to the aforementioned properties, many heuristics have been proposed. The simplest model describes region by luminance and size (Beaulieu, 1989). On this model, the dissimilarity measure is usually the squared difference or the Ward-criterion. Regions can be also described by information of their boundaries. Thus, the gPb-owt-ucm approach (Arbeláez et al., 2011) transforms the output of the gPb contour detector into a hierarchical region tree. The approach employs the Oriented Watershed Transform (OWT) to obtain a set of initial regions from the output of the contour detector, and builds an Ultrametric Contour Map (UCM) from the boundaries of these initial regions. The dissimilarity measure between two regions is defined by the average strength of their common boundaries. The initial segmentation can be also obtained through a watershed (Meyer, 2005). Watershed algorithms presents the advantage of providing closed contours, which lead to a proper definition of regions (Brun et al., 2005). The hierarchical watershed approaches assume that the over-segmentations usually produced by the watershed algorithms include the correct boundaries on the image. Then, if these boundaries are properly valuated, the initial partition provided by the over-segmentation of the input image can be decimated to build the hierarchy of levels (Najman and Schmitt, 1996, Brun et al., 2005). Information of the basins (regions) is typically conjointly used with the contour attributes to perform this decimation. Once the region model and dissimilarity measure have been defined, the algorithm can proceed by continuously searching for the lowest dissimilarity value and merging the two corresponding regions until a stopping criterion is satisfied or there is only one region (Arbeláez et al., 2011). If the hierarchy of partitions is encoded using irregular pyramids, several regions can be simultaneously merged between two consecutive layers (Brun et al., 2005).

Irregular pyramids represent the image as a stack of graphs with decreasing number of vertices. Some irregular pyramids use a simple graph (i.e. a region adjacency graph (RAG)) to encode each level of the hierarchy. Region adjacency graphs do not permit to know if two adjacent regions have one or more common boundaries, and they do not allow to differentiate an adjacency relationship between two regions from an inclusion relationship. Instead of simple graphs, each level of the hierarchy can be represented using a pair of dual graphs or a combinatorial map. Thus, the combinatorial pyramid (Brun and Kropatsch, 2001) is defined by an initial combinatorial map that can be successively reduced using the general scheme proposed by Kropatsch (1995). In the multiscale framework provided by the combinatorial pyramid, this paper presents an approach to perceptual image segmentation that combines information coming from regions and boundaries. Contributions include:

  • A novel, multi-stage algorithm to combine boundary and region information inside the hierarchy of the combinatorial pyramid.

  • Region merging is conducted using two different metrics inside the same hierarchy, generating a representation of the image at different levels of abstraction or scales. At low scales, only region features (size and colour information) are considered in the model. The resulting blobs or superpixels (Ren and Malik, 2003) reduce image complexity while avoiding undersegmentation. These superpixels are then grouped into larger structures using boundary and region properties.

  • The proposed approach has been extensively evaluated using the precision-recall framework introduced by Martin et al. (2004) on the Berkeley Segmentation Data Set (BSDS300). Results show that it can be favorably compared with other leading approaches.

The main advantage of the proposed framework is that the combinatorial pyramid preserves at all levels of the hierarchy the topological relationships of the original image. Thus, the decomposition of the image into regions at each level is represented by a combinatorial map that encodes correctly these relationships (Brun and Kropatsch, 2001, Brun and Kropatsch, 2006). It should be noted that this paper improves a previous version proposed by the authors (Antúnez, 2011a), where only face attributes of the combinatorial maps were used for segmentation. With respect to this first algorithm, the new algorithm directly associates the darts of the combinatorial map with edge information. It should be noted that the use of edges in a hierarchy is not completely new. They were used, for example, by Burge and Kropatsch (1999) in the dual graph-based irregular pyramid framework. In our case, this will be the main factor employed to perform the perceptual grouping at high scales of the hierarchy. The rest of the paper is organized as follows: an overview of the approach is presented in Section 2. Section 3 describes it in detail. Experimental results revealing the efficiency of the proposed method are presented in Section 4. Finally, the paper concludes along with discussions and future work in Section 5.

Section snippets

Overview of the proposed approach

The key idea in the proposed approach is to reduce the perceptual grouping computation to an efficiently solvable clustering problem. This clustering process will be hierarchically conducted in two sequentially conducted stages (Antúnez, 2011a):

  • A pre-segmentation stage that accumulates local evidences from the original image (level 0 of the hierarchy) to a combinatorial map (level lp). This map will encode a decomposition of the image into superpixels. This initial stage of the clustering

The perceptual image segmentation approach

A combinatorial map is a mathematical model describing the subdivision of a space. It encodes all the vertices which compound this subdivision and all the incidence and adjacency relationships among them. A combinatorial pyramid is a hierarchical stack of combinatorial maps successively reduced by a sequence of contraction or removal operations (see (Brun and Kropatsch, 2001, Brun and Kropatsch, 2006) for further details). In our implementation, two-dimensional (2D) combinatorial maps are

Quantitative evaluation of the pre-segmentation stage

In order to evaluate how well superpixel boundaries align to image edges, the Berkeley Segmentation Dataset and Benchmark (BSDB300)1 (Martin et al., 2001) has been used. The methodology for evaluating the performance of segmentation techniques using this dataset is mainly based in the comparison of machine detected boundaries with respect to human-marked boundaries (ground truth data) using the precision-recall framework (Martin et

Conclusions and future work

This paper presents a new perception-based segmentation approach which consists of two stages: a pre-segmentation stage and a perceptual grouping stage. In our proposal, both stages are conducted in the framework of a hierarchy of successively reduced combinatorial maps. Experimental results conducted on the BSDS300 shows that the performance of the proposed approach is good, although it is still under the values provided by the current state-of-art on the literature: the UCM (Arbeláez, 2006)

Acknowledgments

We would like to thank the people at PRIP for providing us the source code of the MST Pyramid and for their help and useful comments. We would also like to thank the reviewers for their constructive and detailed comments.

References (33)

  • Boruvka, O., 1926. O jistem problemu minimalnim. Prace Mor. Prirodved. Spol. v Brne (Acta Societ. Scienc. Natur....
  • Brox, T., Farin, D., de With, P., 2001. Multi-stage region merging for image segmentation. In: Proc. 22nd Symposium on...
  • L. Brun et al.

    Introduction to combinatorial pyramids

    Lect. Notes Comput. Sci.

    (2001)
  • L. Brun et al.

    Hierarchical matching using combinatorial pyramid framework

  • L. Brun et al.

    Hierarchical watersheds within the combinatorial pyramid framework

    Lect. Notes Comput. Sci.

    (2005)
  • M. Burge et al.

    A minimal line property preserving representation of line images

    Computing

    (1999)
  • Cited by (4)

    • Hierarchical segmentation of range images inside the combinatorial pyramid

      2015, Neurocomputing
      Citation Excerpt :

      In this paper, we employ the second type of encoding, being the input scene represented by a stack of combinatorial maps of reduced resolution. This hierarchy is called a combinatorial pyramid [4,1]. Specifically, the combinatorial pyramid [4] is defined by an initial combinatorial map that can be successively reduced using the general scheme proposed by Kropatsch [13].

    • Merging attention and segmentation: Active foveal image representation

      2014, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • An s-layered grade decomposition of images

      2013, Advances in Intelligent Systems and Computing

    This work has been partially granted by the Spanish Ministerio de Ciencia e Innovación (MICINN) and FEDER funds under projects TIN2011-27512-C05-01 and AT2009-0026 and by the Junta de Andalucía under Project P07-TIC-03106.

    View full text