Convexity constrained efficient superpixel and supervoxel extraction
Introduction
Pixel representation of an image is often redundant due to the spatial coherence within the image. In order to reduce this redundancy, a preprocessing stage is pioneered by Ren and Malik [1]. Their method clusters pixels into homogeneous image regions, called superpixels (SPs). Afterwards, utilization of SPs has become important in many image processing applications. Since SP regions possess similar color and texture characteristics, they provide an efficient representation of the image. This property supports the assumption that pixels in the same SP belong to the same semantic object. Stemming from this idea, all pixels in a SP can be assigned to specific models representing motion, depth or segmentation structures. Such a representation can replace the use of pixels in various applications [2], [3], [4]. By the utilization of SPs as an image representation, the inter-pixel details are captured and preserved in the image. Furthermore, the proposed SP structure is also crucial for graph-based approaches. When the graph nodes are constructed with SPs instead of pixels, graph complexity and computation time would substantially reduce.
SP extraction involves four main challenges: Firstly, a successful method should preserve local structure by adapting to the local object and region boundaries. Secondly, undersegmentation of the regions should be avoided for realizing an expressive image representation. Thirdly, regular region identification is targeted with quasi-uniform SP regions. Finally, computational complexity should be kept at minimum. The first two challenges are related with the local information encapsulation that enforces adaptation of SP boundaries to the object boundaries. Uniform localization and compactness are required to form regular grid structure among graph models with unbiased neighbor relations. This property has an influence on the precision and accuracy of graph based solutions, especially in image segmentation problem. Computational efficiency is crucial for practical usability of the method.
In this study, a novel and efficient SP and SV extraction algorithm is presented addressing the four fundamental constraints mentioned above. Local structure is preserved with the selected energy function. Adaptation on the object boundary is satisfied by a color-based similarity measurement and the proposed distance metric takes care of the convexity constraint by penalizing irregularly shaped regions. Computational efficiency is achieved by processing only the pixels at the region boundaries. Following the related work in Section 1.1 for SP extraction, details of the proposed algorithm are presented in Section 2. The extension of method on the video is also explained in Section 2. Section 3 is devoted to experimental results and Section 4 concludes the paper with final remarks and restatement of the contributions. In the rest of the paper, the word “SP” is used for explaining the algorithmic details for extraction of both SP and SV on the spatio-temporal volume.
The previous work on SP and SV extraction dates back to less than a decade. We explore the related work in two categories: graph based and gradient based methods.
In graph based approaches, SP extraction is achieved by partitioning the graph whose nodes correspond to individual pixels and edge weights are assigned according to a cost function relating inter-pixel similarities. In [5], the graph is partitioned recursively, as in Normalized Cuts segmentation [6], in order to minimize a global cost function based on color and texture cues until desired number of SPs is achieved. This approach satisfies the compactness constraint required for SPs in order to provide efficient graph representation. However, it suffers from computational complexity. In [7], SP extraction is improved in terms of complexity by grouping nodes of the graph via greedy decisions through pairwise region comparisons on edge measures of minimum spanning trees. This method, on the other hand, does not enforce a control on region compactness and number of SPs. In [8], a lattice structure is enforced by finding horizontal and vertical seams that cut the image optimally via graph cuts. The seams determine the SP boundaries based on region compactness and total SP number. The work in [9] aims to preserve the image topology for SP generation. A recent study [10] proposes a novel method to generate 2D SPs and 3D supervoxels (SV) in an energy minimization framework utilizing graph cuts. It provides various controls on the SP structure and distribution; however, it suffers from computational complexity during the optimization stage.
On the contrary, gradient-based approaches start from initial seeds of rough SPs. Pixel groups are refined iteratively, depending on the local similarities. Mean-Shift [11], which is one of the well known methods in image segmentation, is adapted for SP extraction by the use of recursive smoothing kernel over pixel feature space. The main weakness of this method is that it does not have a control on the SP properties, such as compactness, distribution and total region number. In [12], an image is considered as a topographic structure and intensity gradient vectors are utilized to form pixel groups. Watershed approach also lacks control on SP properties. Even the recent studies [13], [14] extend watersheds to a graph based approach, in general they follow gradient-based flow for SP generation. TurboPixels concept [5] introduces geometric-flow over initial seeds which are considered as the starting points of the SPs. Level set method is exploited to update and refine SPs based on local image gradients. This approach enables regular distribution of compact SPs with less complexity compared to graph-based approaches. In [15], geodesic distance [16] is exploited to group neighboring pixels iteratively starting from the initial seeds as proposed in TurboPixels [5]. Utilization of geodesic distance enables higher structure sensitivity compared to geometric-flow with almost similar complexity. Initial seed placement in [5], [15] is refined in [17] by rectangular shaped SPs. Instead of geometric-flow, boundary pixels are re-assigned to SPs iteratively based on color similarity and spatial distance. At each iteration, SP mean intensity locations and color models are updated and hence enable compact and almost regularly distributed pixels groups. This approach refines SPs only through boundary pixels which also significantly decreases the computational complexity. In [18], a similar method is proposed, where all pixels are updated during the refinement rather than only boundary pixels. A recent method [19] proposes a similar boundary update idea for region segmentation; however, in this method the SPs are not constrained to be convex and regularly distributed. A top-down partitioning is proposed on the initial large rectangular SPs and an iterative process is exploited to refine SPs, based on color similarity and SP histogram.
The extension of SP methods to SV on temporal domain is a recent problem that addresses consistency of SPs in time. Extraction of SV requires special attention on temporal continuity that is not addressed in most of the SP extraction literature. A recent paper [20] presents a detailed evaluation of various SV methods.
A broad look at the previous literature is useful for defining the priorities towards a successful SP extraction scheme. These challenges can be summarized as follows: (1) adaptation on the object boundary; (2) efficient region representation; (3) quasi-uniform distribution on the image; (4) fast execution capability; (5) extension to spatio-temporal volume. For this purpose, the iterative boundary refinement approach previously presented in [17], [21] is widely extended in this study by constructing a general framework that utilizes color and locational similarity for pixel label assignment. Different parametric evaluations of Euclidean and geodesic distances are achieved in order to create structure sensitive SPs and an extension to SVs is also provided to provide temporally consistent pixel groups. Hence, through the utilization of alternative energy metrics, a trade-off between compactness and edge adaptation, as well as computational complexity and segmentation performance is accomplished. Possible use cases and applications are also covered in the prior study [22].
Section snippets
Proposed method
The proposed algorithmic flow of the method can be explored in four main steps: (1) initialization of the SPs; (2) SP boundary update; (3) SP structure update; (4) termination. These steps are illustrated in Fig. 1.
Superpixel initialization: In the first step, image is divided into equal sized regions according to the desired number of SPs. Each region initially has a rectangular or hexagonal shape and the centers are equally spaced among the image in the region centroids. In the prior methods,
Experimental results
This section presents quantitative and qualitative results regarding the proposed two SP extraction methods (using Euclidean and geodesic distance) in comparison with the state-of-the-art. The known SP extraction methods in the literature, namely, Graph-based [7], TurboPixels [5], Structure Sensitive Geo [15], SLIC [18] and SEEDS [19] are evaluated in terms of accuracy and computation time. SV evaluation is performed using the supervoxel benchmark presented in [20].
The performance of the
Conclusion and discussion
This paper presents a novel superpixel, as well as a supervoxel extraction method with contributions in terms of both computational efficiency and segmentation performance. In the proposed technique, SPs and SVs are updated iteratively through the boundary pixels based on color and spatial similarity. The boundary adaptation idea and energy function selection are the two main contributions1 of the proposed method enabling efficient implementation. The experiments are conducted for different
References (31)
- X. Ren, J. Malik, Learning a classification model for segmentation, in: International Conference on Computer Vision,...
- H.E. Tasli, A.A. Alatan, User assisted stereo image segmentation, in: 3DTV-Conference: The True Vision-Capture,...
- S.S. Ayvaci, A., Motion segmentation with occlusions on the superpixel graph, in: Proceedings of the Workshop on...
- H.E. Tasli, K.Ugur, Interactive 2D 3D image conversion method for mobile devices, in: 3DTV-Conference: The True...
- et al.
TurboPixels: Fast Superpixels Using Geometric Flows
(2009) - J. Shi, J. Malik, Normalized cuts and image segmentation, in: IEEE Computer Vision and Pattern Recognition,...
- P. Felzenswalb, D. Huttenlocher, Efficient graph based image segmentation, Int. J. Comput. Vis....
- A. Moore, S. Prince, J. Warrell, U. Mohammed, G. Jones, Superpixel lattices, in: International Conference on Computer...
- D. Tang, H. Fu, X. Cao, Topology preserved regular superpixel, in: IEEE International Conference on Multimedia and...
- O. Veksler, Y. Boykov, P. Mehrani, Superpixels and supervoxels in an energy optimization framework, in: Perspectives in...
Cited by (11)
Fuzzy Superpixel-based Image Segmentation
2023, Pattern RecognitionSuperPixel based mid-level image description for image recognition
2015, Journal of Visual Communication and Image RepresentationCitation Excerpt :For SP extraction, several methods have been proposed with different advantages [40–43]. In our paper we use the method in [44,45] mainly due to its computation efficiency and structural segmentation performance. A previous method for efficient representation of the images has been previously studied in [46].
A survey on the utilization of Superpixel image for clustering based image segmentation
2023, Multimedia Tools and ApplicationsAn Extensive Survey on Superpixel Segmentation: A Research Perspective
2023, Archives of Computational Methods in EngineeringA hybrid method for traumatic brain injury lesion segmentation
2022, International Journal of Electrical and Computer EngineeringVariational Fuzzy Superpixel Segmentation
2022, IEEE Transactions on Fuzzy Systems