End-to-end trainable network for superpixel and image segmentation
Introduction
As a key component of many different computer vision tasks, image segmentation intends to divide an image into large perceptual regions, where the pixels within each region typically belong to the same visual object with minor feature differences. It has been widely used in object proposal generation [2], [3], object detection/recognition [4], [5] and other fields.
Different from large perceptual regions resulting from image segmentation, superpixel segmentation performs an over-segmentation on input image. It segments an image into small, regular, and compact regions, which are often composed of pixels having similar spatial position, texture, color, brightness, etc. Meanwhile, it preserves the salient features of a pixel-based representation. Compared to image segmentation, superpixel usually has strong boundary coherence and the produced segments can be easy to control. Due to the efficiency in both representation and computation, superpixels have been widely used in computer vision algorithms, such as object detection [6], [7], semantic segmentation [8], [9], [10], [11], tracking [12], [13], [14], and saliency estimates [15], [16], [17].
In recent years, people attempt to use superpixels as a reasonable start for image segmentation, e.g. [18], to reduce the computational complexity. However, the interreaction of these two kinds of segmentation is seldom considered. On one hand, superpixel indeed can be viewed as a good start for segmentation due to its good adherence to strong boundaries; On the other hand, results of image segmentation can also provide clues for superpixel generation. Consider performing superpixel segmentation on images with similar foreground and background color for example, it is difficult for algorithm to determine the boundary with only local, low-level pixel features. By integrating object-level features into superpixel segmentation, it is supposed to achieve better result.
In this paper, we try to explore the mutual promotion between image segmentation and superpixel segmentation. Specifically, we propose an end-to-end trainable network which can generate superpixels and perform image segmentation simultaneously. By combining these two tasks under the same neural network framework, we try to demonstrate they promote each other. The architecture of our network is shown in Fig. 1. We use a fully convolutional network and the iterative differentiable clustering algorithm [19] to obtain superpixels. Next, we adopt the superpixel pooling layer [20] to get the superpixel features, with which the similarity between adjacent superpixels can be calculated. If the similarity is greater than the preset threshold, we merge them according to a simple procedure to get object segments. Note that since superpixel does not contain any semantic information while object segmentation does, we get them using different features (shown in Fig. 1 using light blue- and light pink-rectangle). We train the network using BSDS500 segmentation benchmark dataset [21] and compare our result with state-of-the-art ones. Experiment results show our network can produce boundary-adherent superpixels and semantic meaningful objects.
The proposed network has the following characteristics:
- •
Our network is end-to-end trainable and can be easily assembled into other deep network structures for subsequent applications.
- •
Our network can generate superpixels and get segmentation results, which is more efficient. The proposed algorithm has excellent performance and has higher precision than existing algorithms.
Section snippets
Related work
Since Ren and Malik proposed the concept of superpixel in 2003 [22], researchers have contributed to the field. The simple linear iterative clustering (SLIC) algorithm [23] converts the color image into the CIELAB color spaces and the 5-dimensional feature vectors in XY coordinates, and then constructs a distance metric for the 5-dimensional feature vector to locally cluster the pixels. The idea is simple and easy to implement. Compared with other superpixel segmentation methods, the SLIC is
Our approach
As shown in Fig. 1, our approach learns image features from the CNN deep network [27] firstly, then we use an iterative differentiable clustering algorithm module to get superpixels. Next, we calculate the superpixel feature vectors by superpixel pooling layer and we learn the similarity between the adjacent superpixels. Finally, we judge whether the adjacent two superpixels are merged according to the similarity. In this section, we will introduce our approach in detail.
Experiments
In this section, we conduct experiments on the widely used BSDS500 dataset [21] to evaluate our approach. The BSDS500 dataset consists of 500 images, including 200 for trainning, 100 for validating, and 200 for testing. It has become a standard benchmark for image segmentation, over-segmentation and edge detection.
Image segmentation and superpixel segmentation are important directions in the field of computer vision, and there are already many publicly available evaluation benchmarks. In our
Conclusion
In this paper, we have proposed an end-to-end trainable network that can generate both superpixel and image segmentation. Specifically, we use a fully convolutional network to extract features of the image and then use the differentiable clustering algorithm module to produce accurate superpixels. The superpixel feature is obtained by using the superpixel pooling operation, and the similarity of two adjacent superpixels is calculated to determine whether to merge or not to obtain the sensing
Declaration of Competing Interest
The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in
References (37)
- et al.
Unsupervised learning-based long-term superpixel tracking
Image Vis. Comput.
(2019) - et al.
Group normalization
Proc. of ECCV
(2018) - et al.
Multiscale combinatorial grouping for image segmentation and object proposal generation
IEEE Trans. Pattern Anal. Mach. Intell.
(2016) - et al.
Selective search for object recognition
Int. J. Comput. Vis.
(2013) - et al.
Blocks that shout: distinctive parts for scene classification
Proc. of IEEE CVPR
(2013) - et al.
Robust higher order potentials for enforcing label consistency
Int. J. Comput. Vis.
(2009) - et al.
Improving an object detector and extracting regions using superpixels
Proc. of IEEE CVPR
(2013) - et al.
Object detection by labeling superpixels
Proc. of IEEE CVPR
(2015) - et al.
Multi-class segmentation with relative location prior
Int. J. Comput. Vis.
(2008) - et al.
Superpixel convolutional networks using bilateral inceptions
Proc. of ECCV
(2016)
Recursive context propagation network for semantic scene labeling
Proc. of NeurIPS
Automatic image segmentation with superpixels and image-level labels
IEEE Access
Superpixel tracking
Proc. of ICCV
Robust superpixel tracking
IEEE Trans. Image Process.
SuperCNN: a superpixelwise convolutional neural network for salient object detection
Int. J. Comput. Vis.
Saliency detection via graph-based manifold ranking
Proc. of IEEE CVPR
Saliency optimization from robust background detection
Proc. of IEEE CVPR
DEL: deep embedding learning for efficient image segmentation
Proc. of IJCAI
Cited by (10)
Two-stream interactive network based on local and global information for No-Reference Stereoscopic Image Quality Assessment
2022, Journal of Visual Communication and Image RepresentationCitation Excerpt :However, it is difficult to make an accurate prediction of the quality of images if the SIQA models were built based on hand-crafted features. In recent years, convolutional neural networks (CNNs) have become a mainstream technique in image processing and are widely used in image quality assessment [20,21], video quality of experience [22], image distortion detection, image depth perception detection [23], image segmentation [24], image aesthetic evaluation [25,26], feature selection [27,28], feature fusion [29,30] and so on. To overcome the problem of inadequate IQA datasets, Yang et al. [31] established an IQA model using adaptive migration learning, which can suppress the insignificant features in IQA while strengthening the perception of salient features.
Detail preserving conditional random field as 2-D RNN for gland segmentation in histology images
2022, Pattern Recognition LettersCitation Excerpt :Inaccurate gland segmentation may lead to incorrect grading of cancers affecting the treatment plan and the prognosis. In recent years, with the advancement of deep learning-based segmentation approaches [4,5] including medical image analysis [6], researchers have put significant efforts to automate gland segmentation [7–9]. However, designing an automated segmentation method remains a challenging problem due to a variety of reasons.
CGFFCM: Cluster-weight and Group-local Feature-weight learning in Fuzzy C-Means clustering algorithm for color image segmentation [Formula presented]
2021, Applied Soft ComputingCitation Excerpt :CIELAB color space provides a perceptually uniform space, meaning that the Euclidean distance between two color points in CIELAB color space corresponds to the perceptual difference between the two colors by the human visual system [49]. Thus, in various machine vision applications, the CIELAB color space has shown superior performance compared to other spaces [17,52,53]. Also, previous research results show that CIELAB is the most suitable color space for color image segmentation using FCM [16,17,54].
A Brain MRI Segmentation Method Using Feature Weighting and a Combination of Efficient Visual Features
2023, Applied Computer Vision and Soft Computing with Interpretable AISuperpixel Image Classification with Graph Convolutional Neural Networks Based on Learnable Positional Embedding
2022, Applied Sciences (Switzerland)Point Label Aware Superpixels for Multi-Species Segmentation of Underwater Imagery
2022, IEEE Robotics and Automation Letters