Convexity constrained efficient superpixel and supervoxel extraction

https://doi.org/10.1016/j.image.2015.02.005Get rights and content

Highlights

  • An iterative superpixel extraction method is proposed where only boundary pixels are updated for an efficient computation.

  • Color term in the cost function enables strong adaptation on object boundaries and distance term enforces a geometry constrain.

  • Extension of the method on spatio-temporal volume shows region adaptation in video footage.

  • Superior or competing quantitative performance is observed in comparison with the state-of-the-art techniques.

Abstract

This paper presents an efficient superpixel (SP) and supervoxel (SV) extraction method that aims improvements over the state-of-the-art in terms of both accuracy and computational complexity. Segmentation performance is improved through convexity constrained distance utilization, whereas computational efficiency is achieved by replacing complete region processing by a boundary adaptation technique. Starting from the uniformly distributed, rectangular (cubical) equal size (volume) superpixels (supervoxels), region boundaries are iteratively adapted towards object edges. Adaptation is performed by assigning the boundary pixels to the most similar neighboring SPs (SVs). At each iteration, SP (SV) regions are updated; hence, progressively converging to compact pixel groups. Detailed experimental comparisons against the state-of-the-art competing methods validate the performance of the proposed technique considering both accuracy and speed.

Introduction

Pixel representation of an image is often redundant due to the spatial coherence within the image. In order to reduce this redundancy, a preprocessing stage is pioneered by Ren and Malik [1]. Their method clusters pixels into homogeneous image regions, called superpixels (SPs). Afterwards, utilization of SPs has become important in many image processing applications. Since SP regions possess similar color and texture characteristics, they provide an efficient representation of the image. This property supports the assumption that pixels in the same SP belong to the same semantic object. Stemming from this idea, all pixels in a SP can be assigned to specific models representing motion, depth or segmentation structures. Such a representation can replace the use of pixels in various applications [2], [3], [4]. By the utilization of SPs as an image representation, the inter-pixel details are captured and preserved in the image. Furthermore, the proposed SP structure is also crucial for graph-based approaches. When the graph nodes are constructed with SPs instead of pixels, graph complexity and computation time would substantially reduce.

SP extraction involves four main challenges: Firstly, a successful method should preserve local structure by adapting to the local object and region boundaries. Secondly, undersegmentation of the regions should be avoided for realizing an expressive image representation. Thirdly, regular region identification is targeted with quasi-uniform SP regions. Finally, computational complexity should be kept at minimum. The first two challenges are related with the local information encapsulation that enforces adaptation of SP boundaries to the object boundaries. Uniform localization and compactness are required to form regular grid structure among graph models with unbiased neighbor relations. This property has an influence on the precision and accuracy of graph based solutions, especially in image segmentation problem. Computational efficiency is crucial for practical usability of the method.

In this study, a novel and efficient SP and SV extraction algorithm is presented addressing the four fundamental constraints mentioned above. Local structure is preserved with the selected energy function. Adaptation on the object boundary is satisfied by a color-based similarity measurement and the proposed distance metric takes care of the convexity constraint by penalizing irregularly shaped regions. Computational efficiency is achieved by processing only the pixels at the region boundaries. Following the related work in Section 1.1 for SP extraction, details of the proposed algorithm are presented in Section 2. The extension of method on the video is also explained in Section 2. Section 3 is devoted to experimental results and Section 4 concludes the paper with final remarks and restatement of the contributions. In the rest of the paper, the word “SP” is used for explaining the algorithmic details for extraction of both SP and SV on the spatio-temporal volume.

The previous work on SP and SV extraction dates back to less than a decade. We explore the related work in two categories: graph based and gradient based methods.

In graph based approaches, SP extraction is achieved by partitioning the graph whose nodes correspond to individual pixels and edge weights are assigned according to a cost function relating inter-pixel similarities. In [5], the graph is partitioned recursively, as in Normalized Cuts segmentation [6], in order to minimize a global cost function based on color and texture cues until desired number of SPs is achieved. This approach satisfies the compactness constraint required for SPs in order to provide efficient graph representation. However, it suffers from computational complexity. In [7], SP extraction is improved in terms of complexity by grouping nodes of the graph via greedy decisions through pairwise region comparisons on edge measures of minimum spanning trees. This method, on the other hand, does not enforce a control on region compactness and number of SPs. In [8], a lattice structure is enforced by finding horizontal and vertical seams that cut the image optimally via graph cuts. The seams determine the SP boundaries based on region compactness and total SP number. The work in [9] aims to preserve the image topology for SP generation. A recent study [10] proposes a novel method to generate 2D SPs and 3D supervoxels (SV) in an energy minimization framework utilizing graph cuts. It provides various controls on the SP structure and distribution; however, it suffers from computational complexity during the optimization stage.

On the contrary, gradient-based approaches start from initial seeds of rough SPs. Pixel groups are refined iteratively, depending on the local similarities. Mean-Shift [11], which is one of the well known methods in image segmentation, is adapted for SP extraction by the use of recursive smoothing kernel over pixel feature space. The main weakness of this method is that it does not have a control on the SP properties, such as compactness, distribution and total region number. In [12], an image is considered as a topographic structure and intensity gradient vectors are utilized to form pixel groups. Watershed approach also lacks control on SP properties. Even the recent studies [13], [14] extend watersheds to a graph based approach, in general they follow gradient-based flow for SP generation. TurboPixels concept [5] introduces geometric-flow over initial seeds which are considered as the starting points of the SPs. Level set method is exploited to update and refine SPs based on local image gradients. This approach enables regular distribution of compact SPs with less complexity compared to graph-based approaches. In [15], geodesic distance [16] is exploited to group neighboring pixels iteratively starting from the initial seeds as proposed in TurboPixels [5]. Utilization of geodesic distance enables higher structure sensitivity compared to geometric-flow with almost similar complexity. Initial seed placement in [5], [15] is refined in [17] by rectangular shaped SPs. Instead of geometric-flow, boundary pixels are re-assigned to SPs iteratively based on color similarity and spatial distance. At each iteration, SP mean intensity locations and color models are updated and hence enable compact and almost regularly distributed pixels groups. This approach refines SPs only through boundary pixels which also significantly decreases the computational complexity. In [18], a similar method is proposed, where all pixels are updated during the refinement rather than only boundary pixels. A recent method [19] proposes a similar boundary update idea for region segmentation; however, in this method the SPs are not constrained to be convex and regularly distributed. A top-down partitioning is proposed on the initial large rectangular SPs and an iterative process is exploited to refine SPs, based on color similarity and SP histogram.

The extension of SP methods to SV on temporal domain is a recent problem that addresses consistency of SPs in time. Extraction of SV requires special attention on temporal continuity that is not addressed in most of the SP extraction literature. A recent paper [20] presents a detailed evaluation of various SV methods.

A broad look at the previous literature is useful for defining the priorities towards a successful SP extraction scheme. These challenges can be summarized as follows: (1) adaptation on the object boundary; (2) efficient region representation; (3) quasi-uniform distribution on the image; (4) fast execution capability; (5) extension to spatio-temporal volume. For this purpose, the iterative boundary refinement approach previously presented in [17], [21] is widely extended in this study by constructing a general framework that utilizes color and locational similarity for pixel label assignment. Different parametric evaluations of Euclidean and geodesic distances are achieved in order to create structure sensitive SPs and an extension to SVs is also provided to provide temporally consistent pixel groups. Hence, through the utilization of alternative energy metrics, a trade-off between compactness and edge adaptation, as well as computational complexity and segmentation performance is accomplished. Possible use cases and applications are also covered in the prior study [22].

Section snippets

Proposed method

The proposed algorithmic flow of the method can be explored in four main steps: (1) initialization of the SPs; (2) SP boundary update; (3) SP structure update; (4) termination. These steps are illustrated in Fig. 1.

Superpixel initialization: In the first step, image is divided into equal sized regions according to the desired number of SPs. Each region initially has a rectangular or hexagonal shape and the centers are equally spaced among the image in the region centroids. In the prior methods,

Experimental results

This section presents quantitative and qualitative results regarding the proposed two SP extraction methods (using Euclidean and geodesic distance) in comparison with the state-of-the-art. The known SP extraction methods in the literature, namely, Graph-based [7], TurboPixels [5], Structure Sensitive Geo [15], SLIC [18] and SEEDS [19] are evaluated in terms of accuracy and computation time. SV evaluation is performed using the supervoxel benchmark presented in [20].

The performance of the

Conclusion and discussion

This paper presents a novel superpixel, as well as a supervoxel extraction method with contributions in terms of both computational efficiency and segmentation performance. In the proposed technique, SPs and SVs are updated iteratively through the boundary pixels based on color and spatial similarity. The boundary adaptation idea and energy function selection are the two main contributions1 of the proposed method enabling efficient implementation. The experiments are conducted for different

References (31)

  • X. Ren, J. Malik, Learning a classification model for segmentation, in: International Conference on Computer Vision,...
  • H.E. Tasli, A.A. Alatan, User assisted stereo image segmentation, in: 3DTV-Conference: The True Vision-Capture,...
  • S.S. Ayvaci, A., Motion segmentation with occlusions on the superpixel graph, in: Proceedings of the Workshop on...
  • H.E. Tasli, K.Ugur, Interactive 2D 3D image conversion method for mobile devices, in: 3DTV-Conference: The True...
  • A. Levinshtein et al.

    TurboPixels: Fast Superpixels Using Geometric Flows

    (2009)
  • J. Shi, J. Malik, Normalized cuts and image segmentation, in: IEEE Computer Vision and Pattern Recognition,...
  • P. Felzenswalb, D. Huttenlocher, Efficient graph based image segmentation, Int. J. Comput. Vis....
  • A. Moore, S. Prince, J. Warrell, U. Mohammed, G. Jones, Superpixel lattices, in: International Conference on Computer...
  • D. Tang, H. Fu, X. Cao, Topology preserved regular superpixel, in: IEEE International Conference on Multimedia and...
  • O. Veksler, Y. Boykov, P. Mehrani, Superpixels and supervoxels in an energy optimization framework, in: Perspectives in...
  • D. Comaniciu, P. Meer, S. Member, Mean shift: a robust approach toward feature space analysis, IEEE Trans. Pattern...
  • L. Vincent, P. Soille, Watersheds in digital spaces: an efficient algorithm based on immersion simulations, IEEE Trans....
  • F. Meyer, An overview of morphological segmentation, Int. J. Pattern Recogniti. Artif. Intell....
  • C. Couprie, L. Grady, L. Najman, H. Talbot, Power watershed: a unifying graph-based optimization framework, IEEE Trans....
  • G. Zeng, P. Wang, J. Wang, R. Gan, H. Zha, Structure-sensitive superpixels via geodesic distance, in: International...
  • Cited by (11)

    • SuperPixel based mid-level image description for image recognition

      2015, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      For SP extraction, several methods have been proposed with different advantages [40–43]. In our paper we use the method in [44,45] mainly due to its computation efficiency and structural segmentation performance. A previous method for efficient representation of the images has been previously studied in [46].

    • An Extensive Survey on Superpixel Segmentation: A Research Perspective

      2023, Archives of Computational Methods in Engineering
    • A hybrid method for traumatic brain injury lesion segmentation

      2022, International Journal of Electrical and Computer Engineering
    • Variational Fuzzy Superpixel Segmentation

      2022, IEEE Transactions on Fuzzy Systems
    View all citing articles on Scopus
    View full text