Elsevier

Pattern Recognition

Volume 92, August 2019, Pages 52-63
Pattern Recognition

SQL: Superpixels via quaternary labeling

https://doi.org/10.1016/j.patcog.2019.03.012Get rights and content

Highlights

  • This paper proposes SQL algorithm to generate superpixel lattice. The number of labels is reduced from the number of superpixels to 4, and this small number assures efficient labeling.

  • Extensive experiments demonstrate that SQL outperforms the other superpixel lattice methods and is competitive with state-of-the-art methods without lattice guarantee.

  • SQL facilitates either MRFs or CNN to be build upon superpixels in the subsequent image analysis tasks.

Abstract

This paper formulates superpixel segmentation as a pixel labeling problem and proposes a quaternary labeling algorithm to generate superpixel lattice. It is achieved by seaming overlapped patches regularly placed on the image plane. Patch seaming is formulated as a pixel labeling problem, where each label indexes one patch. Once the optimal seaming is completed, all pixels covered by one retained patch constitute one superpixel. Further, four kinds of patches are distinguished and assembled into four layers correspondingly, and the patch indexes are mapped to the quaternary layer indexes. It significantly reduces the number of labels and greatly improves labelling efficiency. Furthermore, an objective function is developed to achieve optimal segmentation. Lattice structure is guaranteed by fixing patch centers to be superpixel centers, compact superpixels are assured by horizontal and vertical constraints enforced on the smooth terms, and coherent superpixels are achieved by iteratively refining the data terms. Extensive experiments on BSDS data set demonstrate that SQL algorithm significantly improves labeling efficiency, outperforms the other superpixel lattice methods, and is competitive with state-of-the-art methods without lattice guarantee. Superpixel lattice allows contextual relationships among superpixels to be easily modeled by either MRFs or CNN.

Introduction

Superpixels have become an effective alternative to pixels since their birth [1]. Superpixels have two prime advantages over pixels. One is the perceptual meaning, the other is the complexity. In contrast with raw pixels generated by digital sampling, superpixels are formed by pixel grouping, whose principles are based on the classical Gestalt theory [2] assuring superpixels enhanced perceptual meaning. Since many pixels are grouped into one superpixel, the number of superpixels is much smaller than that of pixels. When superpixels instead of pixels serve as atoms, the size of an image with respect to the atoms is reduced greatly. The size reduction can accelerate the processing in subsequent tasks, and in turn, it is possible to employ some advanced methods which might be computationally infeasible for the huge number of pixels. For example, compared with pixel-based convolutional neural network (CNN), superpixel-based CNN (SuperCNN) enables efficient analysis of large context and is much more effective for salient region detection [3]. A variety of computer vision and pattern recognition problems have benefited from above advantages [4]: feature extraction [5], clustering [6], classification [7], segmentation [8], [9], [10], saliency detection [11], contour detection [12], stereo computation [13], [14], [15], objectness measure [16], proposal generation [17], object localization [18] and object tracking [19], [20], [21] to name a few. They also cover some domain specific applications such as remotely sensed image analysis [22], [23] and medical image analysis [24], [25].

Few approaches produce superpixels that conform to a regular lattice [26], [27], [28]. Lattice assures superpixels the same neighborhood system as that of pixels, and it is much more convenient to establish their contextual relationships in Markov random fields (MRFs) modeling [29]. Moreover, lattice is a prerequisite for some models such as CNN. For example, without lattice structure, SuperCNN has to treat the segmented image as a 1D array of superpixels but not a 2D array of superpixels. As a result, the structure information of the image is destroyed in SuperCNN [3]. On the other hand, the segmentation performance may be sacrificed to some extent in the maintenance of grid structure. The impaired performance prevents superpixel lattice from being widely used. It is an important issue to improve segmentation performance while maintain lattice structure.

Pixel labeling has been widely studied in the community of image analysis [29]. The labels for each pixel can denote quantities such as gray, disparity, category and so on. The label fields for each image can be elegantly expressed as Markov random fields (MRFs). In a Bayesian framework, pixel labeling is formulated as maximum a posterior estimation of the MRF (MAP-MRF), which results in an energy minimization problem. In the past two decades, Graph Cut algorithms and Belief propagation algorithms have been developed to solve the energy minimization problem [30]. Benefited from these effective energy minimizers, pixel labeling has become an effective formulation for many problems such as image restoration [31], stereo matching [32], image segmentation [33], etc.

The powerfulness of Markov random fields (MRFs) benefit from the contextual relationships among sites. These relationships depend on a neighborhood system, which is easy to define on a regular gird but not a general structure. Pixel labeling can be easily extended to superpixel labeling if superpixels conform to a regular lattice [26], [27], [28], [34]. Superpixel labeling can serve as a natural framework for superpixel-based image analysis.

Superpixel segmentation methods can be grouped into different categories based on different criteria. Graph partition, boundary evolution and data clustering can be distinguished as their formulations. The first type includes Normalized Cut (NC) [1], Graph-based Superpixels (GS) [35], Lattice Cut (LC) [27], Entropy Rate Superpixels (ERS) [36], Compact Superpixels (CS), Variable Patch Superpixels (VPS) and Constant Intensity Superpixels (CIS) [37], etc. The second formulation covers Turbopixel (TP) [38], Structure-sensitive Superpixels (SS) [39], Superpixel Extracted via Energy-Driven Sampling (SEEDS) [40]. Simple Linear Iterative Clustering (SLIC) [41] and Vcells [42] belong to the last formulation. SLIC, Vcells, Mean Shift (MS) [43] and Quick Shift (QS) [44] work on the feature space while most methods work on the original space as TP. SLIC and SEEDS need region centers and boundaries respectively to initialize the segmentation while NC and GS need no initialization. Most methods such as GS have no explicit constraints of superpixels’ spatial extent while SLIC, CIS and Superpixels via Pseudo-Boolean Optimization (SPBO) [28] prevent each superpixel to cover outside a predefined rectangle. Superpixel Lattice (SL) [26], LC and SPBO guarantee a lattice structure of superpixels while the others produce general superpixels.

Although the initial states of some methods are regular grids, the lattice structure is not maintained in their procedure of segmentation and postprocessing. For example, the seeds for region growing in TP are distributed as a regular grid [38], the centers of clusters in SLIC are also selected regularly [41], the initial boundary to be adjusted by SEEDS is a regular lattice [40], the initial patches to be seemed by CS, VPS and CIS are the cells of a lattice [37]. However, no constraints are enforced to maintain a lattice structure of the superpixels in the segmentation. Such regularity may be further impaired by the postprocessing as in SLIC.

Superpixel segmentation conforming to a lattice structure can be viewed as a strip seaming problem, which will be clearly described in Section 2.1. An image is covered by some overlapped horizontal and vertical strips as shown in Fig. 1a. Strip seaming is to stitch the strips such that the seams are encouraged to align with image edges. Seams determine the strip borders and generate nonoverlapped strips. All pixels covered by same nonoverlapped strips constitute one superpixel. SL employs dynamic programming or s-t min-cut method to find an optimal vertical or horizontal seam with respect to a boundary map of the image [26]. These vertical and horizontal seams are found alternatively and progressively. As an improved version, LC finds all the vertical seems or horizontal seams as a whole by assigning each pixel a label corresponding to a vertical strip or a horizontal strip [27]. The basic graph construction for this multi-label MRF is similar to Ishikawa [45], and three extra constraints are introduced to maintain lattice regularity. Superpixels are expected to be coherent based on local color models, and they are encouraged to align with image edges computed as a boundary map. They are introduced into the objective function as data and smooth terms respectively. The constructed objective function is optimized via a single graph cut computation. These methods usually produce superpixels with nonuniform sizes since they are dominated by the pre-computed boundary map. SPBO is a different formulation of aforementioned pixel labeling problem [28]. All strips have fixed width and neighboring strips overlap with each other. Both horizontal strips and vertical strips are indexed by continuous numbers and they are separated into odd group and even group. Both horizontal seaming and vertical seaming are achieved by binary labeling, that is assigning a pixel to either odd strip or even strip. The horizontal seams and vertical seams are found independently. The constructed objective function is a pseudo-Boolean function without data term and its smooth terms are either submodular or nonsubmodular. Instead of finding a whole horizontal or a vertical seams, Iterative Refining Superpixel Lattice (IRSL) refines local seams iteratively [34]. The seams are refined to align with image edges indicated by large gradients. All above methods find horizontal and vertical seams separately and independently. It is a negative factor for global optimality.

Patch seaming is an alternative to strip seaming. As depicted in Fig. 1b, an image is covered by many overlapped patches instead of strips. CS, VPS and CIS find all seams as a whole [37]. However, the seams are neither horizontal nor vertical. No lattice structure is guaranteed by the seams. Moreover, the number of labels equals to the number of patches (superpixels), which is quite large in practice. The large number of labels prevents efficient labeling since the optimization algorithms take either linear or quadratic time with respect to the number of labels [46].

This paper proposes superpixels via quaternary labeling to generate superpixel lattice. The basic formulation is patch seaming. Four kinds of patches are distinguished and assembled into four layers correspondingly as illustrated in Fig. 1c. Superpixel segmentation is achieved via layer seaming instead of patch seaming, that is, each pixel is assigned to one layer instead of one patch. In this way, only 4 labels are involved. The complexity of optimization is reduced greatly and the efficiency of segmentation is improved significantly. Further, superpixel centers are fixed and seams are enforced to be either horizontal or vertical. A lattice structure is guaranteed by such centers and seams. Furthermore, the color of each patch is measured by its averaged color but not the color of its center, and it is iteratively refined in the procedure of segmentation. It assures more coherent superpixels. Finally, horizontal constraint and vertical constraints are developed to produce more compact superpixels.

Section snippets

Segmentation via seaming

Without loss of generality, suppose that the image shown on the right of Fig. 2 need to be segmented into up and bottom parts as indicated by their colors. Two overlapping patches are selected to cover the up and bottom parts as shown on the left of Fig. 2. These patches are stitched to eliminate their overlap. Seaming is the procedure of finding the optimal dividing line between two parts and clipping the patches along the dividing line to discard the areas under and above seam in up and

Initial patch placement

A regular and even partition of the image plane is depicted in Fig. 3a, where each small square represents one ideal superpixel and its center is depicted as a circle. When their centers are fixed and their lengths are multiplied by 2, these squares serve as the initial patches as depicted in Fig. 3b, where each patch is filled with a specific color. Since the length of squares are multiplied by 2, the enlarged squares overlap with each other. The pixels in the center area are covered by four

Experiment setup

All experiments are carried out on a computer with an Intel Core i7-4770 CPU @ 3.40 GHz, 8 Gb RAM running Windows 7. Both SQL algorithm and Superpixel evaluation method are implemented in C++, and no parallelization is employed.

For simplicity, λ in Eq. (1) is set to be 1. To be adaptive to variable patch sizes, k in Eq. (6) is set to be 20N/K, where N is the total number of pixels in the image and K is the expected number of superpixels.

Conclusion

This paper formulates superpixel segmentation as patch seaming and pixel labeling problem. It assembles all patches into four layers according to the parities of their row indexes and column indexes, and generates superpixels via layer seaming. With such patch assembling, the number of labels is reduced from the number of patches to four. All original labels indexing patches are mapped to quaternary labels indexing layers. The original labels can be recovered from quaternary labels by inverse

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 41571335).

Dengfeng Chai received the Bachelors degree in surveying engineering from Wuhan University, Wuhan, China, the Masters degree in photogrammetry and remote sensing from the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, and the Doctors degree in applied mathematics from the State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, China, in 1997, 2000, and 2006, respectively. He is an associate professor in the Institute of Spatial

References (50)

  • A. Lucchi et al.

    Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features

    IEEE Trans. Med. Imaging

    (2012)
  • A.P. Moore et al.

    Superpixel lattices

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008

    (2008)
  • F.Y. Wu

    The Potts model

    Rev. Mod. Phys.

    (1982)
  • X. Ren et al.

    Learning a classification model for segmentation

    Proceedings of the Ninth IEEE International Conference on Computer Vision

    (2003)
  • M. Wertheimer

    Laws of organization in perceptual forms

    A Sourcebook of Gestalt Psycychology

    (1938)
  • S. He et al.

    Supercnn: a superpixelwise convolutional neural network for salient object detection

    Int. J. Comput. Vis.

    (2015)
  • D. Stutz et al.

    Superpixels: an evaluation of the state-of-the-art

    Comput. Vis. Image Underst.

    (2018)
  • X. Boix et al.

    Harmony potentials

    Int. J. Comput. Vis.

    (2012)
  • S. Yin et al.

    Unsupervised hierarchical image segmentation through fuzzy entropy maximization

    Pattern Recognit.

    (2017)
  • C.L. Zitnick et al.

    Stereo for image-based rendering using image over-segmentation

    Int. J. Comput. Vis.

    (2007)
  • B. Mičušík et al.

    Multi-view superpixel stereo in urban environments

    Int. J. Comput. Vis.

    (2010)
  • F. Cheng et al.

    Cross-trees, edge and superpixel priors-based cost aggregation for stereo matching

    Pattern Recognit.

    (2015)
  • J. Hosang et al.

    What makes for effective detection proposals?

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • B. Fulkerson et al.

    Class segmentation and object localization with superpixel neighborhoods

    Proceedings of the IEEE Twelfth International Conference on Computer Vision

    (2009)
  • S. Wang et al.

    Superpixel tracking

    Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    (2011)
  • Cited by (4)

    Dengfeng Chai received the Bachelors degree in surveying engineering from Wuhan University, Wuhan, China, the Masters degree in photogrammetry and remote sensing from the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, and the Doctors degree in applied mathematics from the State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, China, in 1997, 2000, and 2006, respectively. He is an associate professor in the Institute of Spatial Information Technique, Zhejiang University. From 2010 to 2011, he was a postdoctoral fellow in the Department of Photogrammetry, University of Bonn, Bonn, Germany. His research interest include many topics in computer vision and pattern recognition, photogrammetry and remote sensing, such as image segmentation, object recognition, object extraction and stereo matching. He published many papers in top level journals and conferences including Pattern Recognition, ICCV, CVPR and etc.

    View full text