Elsevier

Pattern Recognition Letters

Volume 153, January 2022, Pages 159-167
Pattern Recognition Letters

A robust image segmentation framework based on total variation spectral transform

https://doi.org/10.1016/j.patrec.2021.12.001Get rights and content

Highlights

Abstract

Image segmentation is a critical process that influences subsequent tasks in computer vision. Despite years of research, popular segmentation methods usually cause over-segmentation or under-segmentation when segmenting images with noises. To address the issue, this paper proposes an image segmentation framework with the help of total variation (TV) spectral transform technique. Firstly, we impose the TV spectral transform on an image with noises to get basic elements that preserve shape structure. Secondly, the object structure is roughly separated from the background by a separation surface, which is fitted using the maximal response time of each pixel. Thirdly, the roughly generated object structure is refined with a guided filter whose guidance image is calculated through a band-pass filter and inverse transform in the TV spectral domain. Finally, binary segmentation and morphological operations are conducted on the refined object structure to obtain the segmentation result. Experiment results indicate that our method can achieve high segmentation accuracy and be robust to noise compared with other classical approaches.

Introduction

Image segmentation aims to partition an image into several regions. In each region, pixels share similar characterizations, such as intensity, color, texture, or other features. It plays an important role in various computer vision tasks including object detection [1], image fusion [2] and image retrieval [3,4]. Nowadays, existing segmentation methods can be classified into two categories, deep learning-based methods [5], [6], [7], [8], [9] and model-based methods [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]. The learning-based methods employs deep neural networks to extract semantic features whereby semantic labels are portioned to each pixel to achieve segmentation. Although the segmentation performance of a well-trained network is usually satisfying, high requirements for data volume, running time and storage space, and interpretability influence its practical applications. Hence, this paper mainly focuses on the model-based methods.

Model-based methods mainly consists of boundary-based methods [10], [11], [12], region-based approaches [15], [16], [17], hybrid methods [20,21] and transform-based approaches [22]. Boundary-based methods utilize intensity discontinuity of transitional image structure such as edges to separate objects from the background. The edge detectors [10], [11], [12] are the representatives, which calculate gradient of every pixel to get the boundary between object and the background. It is a key challenge to determine a reasonable threshold to classify every pixel by its gradient information. Moreover, the extraction of gradient information is sensitive to noise [13].

Region-based methods mainly focus on the statistical feature uniformity of the image region. The uniformity difference between object regions and background regions are employed to perform segmentation. The classical works contain Chan-Vese (C-V) Model [14] and Fuzzy C-Means (FCM) [15]. The former needs to determine an initial contour in a manual way. Then, the initial contour gradually evolves until it approaches the object boundary based on energy minimization of regions inside and outside the evolution curve. Nevertheless, the C-V Model tends to bring unsatisfying segmentation results for intensity inhomogeneity. The latter combines fuzzy set theory and the clustering method to separate different objects from the background. Nonetheless, FCM and its variants [16,17] are sensitive to noise, outliers, and imaging artifacts for lack of pixel spatial information in the image.

Hybrid methods simultaneously employ boundary and region features to segment images. One of the representative works is transition region (TR)-based thresholding [18,19], which exploits a transition region as a boundary structure and calculates the average gray level of the transition region as a threshold for image segmentation. TR-based thresholding methods take advantage of boundary features to detect objects and employ region features to get segmentation results. However, the extraction of the transition region tends to fail when gray levels between the objects and background are overlapping.

The abovementioned methods segment images with the help of gray-level, gradient information, and other spatial features, which are often sensitive to noise. Different from approaches conducted in the spatial domain, transform-based methods decompose an original image into various components in the transform domain. Then, the desired region is calculated by band-pass filters and inverse transform. Finally, the segmentation result can be achieved using some post-processing methods including morphological operators and Otsu thresholding. One of the representatives is wavelet transform-based approaches [20], which uses wavelet basis functions to decompose the original image into many complex planes in the wavelet transform domain. The multiscale representations of the original image are made up of these complex planes. In every scale, the fine details in the horizontal, vertical, and diagonal directions are separated from the coarse resolution image, whose histogram is calculated to achieve threshold segmentation. Nonetheless, wavelet transform is limited by contrast. Furthermore, the choice of wavelet basis functions has a great effect on the segmentation results.

Recently, a new TV spectral framework [21] has been presented as a new spectral analysis technique. The TV spectral transform decomposes an image into different components in the different time scales, where objects with different sizes and contrast have distinguished performance. Different from the wavelet transform, the TV spectral transform is more robust to intensity inhomogeneity and eliminates the need for choosing basis functions. Additionally, it is noted that objects with different sizes and local contrast can be separated in the different time scales through TV spectral transform. Specifically, the time scale corresponding to the max spectral response of objects becomes larger when the size of objects becomes bigger, or the local contrast of objects becomes larger. To this end, the paper makes use of TV spectral transform to decompose an image into various elements in the different time scales, whereby basic elements that preserve shape structure are obtained. The basic strategy is to extract the object structure without noise and then refine the edge of the desired object. The proposed segmentation framework consists of three procedures in Fig. 1. Firstly, the TV flow of the original image is iteratively solved to obtain a series of smoothed images in the different time scales. The difference between the adjacent smoothed images is calculated to realize the TV spectral transform whereby the basic elements that preserve the shape structures are separated from the original image. Secondly, the structure of the desired object is roughly separated from the background by a separation surface, which is fitted by the maximal response time of every pixel. Thirdly, the post-processing techniques and binary segmentation are applied to get the refined segmentation result.

The main contributions of this paper are listed as follows:

We analyze the properties of TV spectral transform in the images without and with noise. The experimental analysis illustrates components with different sizes and contrast can be distinguished by TV spectral transform, which is an important clue for image segmentation. Our work is inspired by these interesting properties.

We present a robust segmentation framework based on TV spectral transform. In the proposed framework, basic elements that preserve shape structure are decomposed from the image with noises using TV spectral transform. A separation surface is fitted as a soft threshold of band-pass filters to filter images in the TV spectral domain. Compared with state-of-the-art methods, our method can achieve high segmentation accuracy and be robust to noise.

The remainder of this paper is organized as follows. Section 2 reviews the TV spectral transform briefly. Section 3 describes the motivation for our work. The proposed segmentation framework is described in detail in Section 4. Section 5 illustrates experimental results of the proposed method along with existing state-of-the-art methods. Finally, the study is concluded in Section 6.

Section snippets

Total variation spectral transform

This section summarizes the essential theory concerning TV spectral transform framework [21]. The scale-space approach is a natural way to define scale:{ut(t;X)=p(t;X),u(0;X)=f(X),p(t;X)uJ(u),where f(X) is an original image, and u(t;X) is a solution set with increasing scales (t is a scale parameter larger than or equal to 0). uJ(u) refers to the subdifferential of some regularizing functionJ(u). p(t;X) is one of definite solutions to uJ(u).

The TV spectral transform method bases on TV

Motivation

The properties of the TV spectral transform are explored in this section. To analyze the performance of the TV spectral transform in images with noises. We firstly analyze the following properties on the clean images without noise: sensitivity to size, and sensitivity to contrast [21]. Then the Gaussian noise (variance30%), Salt & Pepper noise (density30%) and Specklenoise(variance30%) are added to corrupt the clean images. Finally, experimental analysis shows that the TV spectral transform is

Proposed method

This section illustrates the image segmentation framework based on total variation spectral transform. The main procedure is divided into three parts as shown in Fig. 4. Each part is introduced in the following subsections.

Experimental results and discussions

We conduct experiments from three perspectives. Firstly, ablation experiments are performed to explore the role of post-processing techniques in our proposed framework. Secondly, we evaluate the segmentation performance of six methods including our proposed method and other state of art methods on the clean natural images. In the third place, natural images and medical images with noises are chosen to assess the robustness of our proposed method.

Conclusion

In this work, we demonstrated that the TV spectral transform is sensitive to object size and contrast in images corrupted with various noises. Based on the analysis, we proposed an image segmentation framework based on TV spectral transform to segment images with noises. The framework first transforms the image into the TV spectral domain to get basic elements. Then the object and background are roughly separated by a separation surface which is fitted using maximal response time of each pixel.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was partly supported by National Natural Science Foundation of China (No. 62076137, No. 61972206, No. 61971233 and No. 62011540407) and was also partly supported under the framework of international cooperation program managed by the National Research Foundation of Korea (No. NRF-2020K2A9A2A06036255, No. FY2020).

References (29)

  • J. Long

    Fully convolutional networks for semantic segmentation

    IEEE Conf. Comput. Vis. Pattern Recognit.

    (2015)
  • F. Yan

    Triplet adversarial domain adaptation for pixel-level classification of VHR remote sensing images

    IEEE Transactions on Geoscience and Remote Sensing

    (2019)
  • Z. Zhou

    Unet++: redesigning skip connections to exploit multiscale features in image segmentation

    IEEE Trans Med Imaging

    (2019)
  • F. Yan

    Triplet adversarial domain adaptation for pixel-level classification of VHR remote sensing images

    IEEE Transactions on Geoscience and Remote Sensing

    (2019)
  • Cited by (6)

    • Combining spectral total variation with dynamic threshold neural P systems for medical image fusion

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      The Spectral Total Variation (STV) method was first introduced by Gilboa [36] in 2014. Currently, several applications of the STV can be mentioned, such as image segmentation [37], medical image fusion [38], and image fusion [39]. DTNP systems are introduced by Peng et al. [40] in 2019.

    • Whale Optimization Algorithm for Color Image Segmentation using Supra-Extensive Entropy

      2022, Canadian Conference on Electrical and Computer Engineering
    View full text