Elsevier

Pattern Recognition Letters

Volume 140, December 2020, Pages 135-142
Pattern Recognition Letters

End-to-end trainable network for superpixel and image segmentation

https://doi.org/10.1016/j.patrec.2020.09.016Get rights and content

Highlights

  • Our network is end-to-end trainable and can be easily assembled into other deep networks.

  • Our network can generate superpixels and get segmentation results simultaneously.

  • Our algorithm has excellent performance and has higher precision.

  • Our model is smaller and less computationally intensive.

Abstract

Image segmentation and superpixel generation have been studied for many years, and they are still active research topics in computer vision. Although many advanced computer vision algorithms have been used for image segmentation and superpixel generation, there is no end-to-end trainable algorithm that generates superpixels and segment images simultaneously. We propose an end-to-end trainable network to solve this problem. We train a differentiable clustering algorithm module to produce accurate superpixels. Based on the generated superpixels, the superpixel pooling operation is performed to obtain superpixel features, and then we calculate the similarity of two adjacent superpixels. If the similarity is greater than the preset threshold, we merge the two superpixels. Finally, we get the segmented image. We conduct our experiments in the BSDS500 dataset and get good results.

Introduction

As a key component of many different computer vision tasks, image segmentation intends to divide an image into large perceptual regions, where the pixels within each region typically belong to the same visual object with minor feature differences. It has been widely used in object proposal generation [2], [3], object detection/recognition [4], [5] and other fields.

Different from large perceptual regions resulting from image segmentation, superpixel segmentation performs an over-segmentation on input image. It segments an image into small, regular, and compact regions, which are often composed of pixels having similar spatial position, texture, color, brightness, etc. Meanwhile, it preserves the salient features of a pixel-based representation. Compared to image segmentation, superpixel usually has strong boundary coherence and the produced segments can be easy to control. Due to the efficiency in both representation and computation, superpixels have been widely used in computer vision algorithms, such as object detection [6], [7], semantic segmentation [8], [9], [10], [11], tracking [12], [13], [14], and saliency estimates [15], [16], [17].

In recent years, people attempt to use superpixels as a reasonable start for image segmentation, e.g. [18], to reduce the computational complexity. However, the interreaction of these two kinds of segmentation is seldom considered. On one hand, superpixel indeed can be viewed as a good start for segmentation due to its good adherence to strong boundaries; On the other hand, results of image segmentation can also provide clues for superpixel generation. Consider performing superpixel segmentation on images with similar foreground and background color for example, it is difficult for algorithm to determine the boundary with only local, low-level pixel features. By integrating object-level features into superpixel segmentation, it is supposed to achieve better result.

In this paper, we try to explore the mutual promotion between image segmentation and superpixel segmentation. Specifically, we propose an end-to-end trainable network which can generate superpixels and perform image segmentation simultaneously. By combining these two tasks under the same neural network framework, we try to demonstrate they promote each other. The architecture of our network is shown in Fig. 1. We use a fully convolutional network and the iterative differentiable clustering algorithm [19] to obtain superpixels. Next, we adopt the superpixel pooling layer [20] to get the superpixel features, with which the similarity between adjacent superpixels can be calculated. If the similarity is greater than the preset threshold, we merge them according to a simple procedure to get object segments. Note that since superpixel does not contain any semantic information while object segmentation does, we get them using different features (shown in Fig. 1 using light blue- and light pink-rectangle). We train the network using BSDS500 segmentation benchmark dataset [21] and compare our result with state-of-the-art ones. Experiment results show our network can produce boundary-adherent superpixels and semantic meaningful objects.

The proposed network has the following characteristics:

  • Our network is end-to-end trainable and can be easily assembled into other deep network structures for subsequent applications.

  • Our network can generate superpixels and get segmentation results, which is more efficient. The proposed algorithm has excellent performance and has higher precision than existing algorithms.

Section snippets

Related work

Since Ren and Malik proposed the concept of superpixel in 2003 [22], researchers have contributed to the field. The simple linear iterative clustering (SLIC) algorithm [23] converts the color image into the CIELAB color spaces and the 5-dimensional feature vectors in XY coordinates, and then constructs a distance metric for the 5-dimensional feature vector to locally cluster the pixels. The idea is simple and easy to implement. Compared with other superpixel segmentation methods, the SLIC is

Our approach

As shown in Fig. 1, our approach learns image features from the CNN deep network [27] firstly, then we use an iterative differentiable clustering algorithm module to get superpixels. Next, we calculate the superpixel feature vectors by superpixel pooling layer and we learn the similarity between the adjacent superpixels. Finally, we judge whether the adjacent two superpixels are merged according to the similarity. In this section, we will introduce our approach in detail.

Experiments

In this section, we conduct experiments on the widely used BSDS500 dataset [21] to evaluate our approach. The BSDS500 dataset consists of 500 images, including 200 for trainning, 100 for validating, and 200 for testing. It has become a standard benchmark for image segmentation, over-segmentation and edge detection.

Image segmentation and superpixel segmentation are important directions in the field of computer vision, and there are already many publicly available evaluation benchmarks. In our

Conclusion

In this paper, we have proposed an end-to-end trainable network that can generate both superpixel and image segmentation. Specifically, we use a fully convolutional network to extract features of the image and then use the differentiable clustering algorithm module to produce accurate superpixels. The superpixel feature is obtained by using the superpixel pooling operation, and the similarity of two adjacent superpixels is calculated to determine whether to merge or not to obtain the sensing

Declaration of Competing Interest

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in

References (37)

  • P.-H. Conze et al.

    Unsupervised learning-based long-term superpixel tracking

    Image Vis. Comput.

    (2019)
  • Y. Wu et al.

    Group normalization

    Proc. of ECCV

    (2018)
  • J. Pont-Tuset et al.

    Multiscale combinatorial grouping for image segmentation and object proposal generation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • J.R. Uijlings et al.

    Selective search for object recognition

    Int. J. Comput. Vis.

    (2013)
  • M. Juneja et al.

    Blocks that shout: distinctive parts for scene classification

    Proc. of IEEE CVPR

    (2013)
  • P. Kohli et al.

    Robust higher order potentials for enforcing label consistency

    Int. J. Comput. Vis.

    (2009)
  • G. Shu et al.

    Improving an object detector and extracting regions using superpixels

    Proc. of IEEE CVPR

    (2013)
  • J. Yan et al.

    Object detection by labeling superpixels

    Proc. of IEEE CVPR

    (2015)
  • S. Gould et al.

    Multi-class segmentation with relative location prior

    Int. J. Comput. Vis.

    (2008)
  • R. Gadde et al.

    Superpixel convolutional networks using bilateral inceptions

    Proc. of ECCV

    (2016)
  • A. Sharma et al.

    Recursive context propagation network for semantic scene labeling

    Proc. of NeurIPS

    (2014)
  • X. Xie et al.

    Automatic image segmentation with superpixels and image-level labels

    IEEE Access

    (2019)
  • S. Wang et al.

    Superpixel tracking

    Proc. of ICCV

    (2011)
  • F. Yang et al.

    Robust superpixel tracking

    IEEE Trans. Image Process.

    (2014)
  • S. He et al.

    SuperCNN: a superpixelwise convolutional neural network for salient object detection

    Int. J. Comput. Vis.

    (2015)
  • C. Yang et al.

    Saliency detection via graph-based manifold ranking

    Proc. of IEEE CVPR

    (2013)
  • W. Zhu et al.

    Saliency optimization from robust background detection

    Proc. of IEEE CVPR

    (2014)
  • Y. Liu et al.

    DEL: deep embedding learning for efficient image segmentation

    Proc. of IJCAI

    (2018)
  • Cited by (10)

    • Two-stream interactive network based on local and global information for No-Reference Stereoscopic Image Quality Assessment

      2022, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      However, it is difficult to make an accurate prediction of the quality of images if the SIQA models were built based on hand-crafted features. In recent years, convolutional neural networks (CNNs) have become a mainstream technique in image processing and are widely used in image quality assessment [20,21], video quality of experience [22], image distortion detection, image depth perception detection [23], image segmentation [24], image aesthetic evaluation [25,26], feature selection [27,28], feature fusion [29,30] and so on. To overcome the problem of inadequate IQA datasets, Yang et al. [31] established an IQA model using adaptive migration learning, which can suppress the insignificant features in IQA while strengthening the perception of salient features.

    • Detail preserving conditional random field as 2-D RNN for gland segmentation in histology images

      2022, Pattern Recognition Letters
      Citation Excerpt :

      Inaccurate gland segmentation may lead to incorrect grading of cancers affecting the treatment plan and the prognosis. In recent years, with the advancement of deep learning-based segmentation approaches [4,5] including medical image analysis [6], researchers have put significant efforts to automate gland segmentation [7–9]. However, designing an automated segmentation method remains a challenging problem due to a variety of reasons.

    • CGFFCM: Cluster-weight and Group-local Feature-weight learning in Fuzzy C-Means clustering algorithm for color image segmentation [Formula presented]

      2021, Applied Soft Computing
      Citation Excerpt :

      CIELAB color space provides a perceptually uniform space, meaning that the Euclidean distance between two color points in CIELAB color space corresponds to the perceptual difference between the two colors by the human visual system [49]. Thus, in various machine vision applications, the CIELAB color space has shown superior performance compared to other spaces [17,52,53]. Also, previous research results show that CIELAB is the most suitable color space for color image segmentation using FCM [16,17,54].

    • A Brain MRI Segmentation Method Using Feature Weighting and a Combination of Efficient Visual Features

      2023, Applied Computer Vision and Soft Computing with Interpretable AI
    View all citing articles on Scopus
    View full text