Elsevier

Pattern Recognition Letters

Volume 79, 1 August 2016, Pages 60-67
Pattern Recognition Letters

Geometric divergence based fuzzy clustering with strong resilience to noise features

https://doi.org/10.1016/j.patrec.2016.04.013Get rights and content

Highlights

  • Development of FCM with geometric divergence measure.

  • Investigating sub-optimization problems constituting the Alternating Optimization.

  • Performance comparison with FCM and weighted FCM.

  • Detailed theoretical proof of strong resilience to noise features.

  • Experimental proof of strong resilience to noise features.

Abstract

In this article we consider the problem of fuzzy partitional clustering using a separable multi-dimensional version of the geometric distance which includes f-divergences as special cases. We propose an iterative relocation algorithm for the Fuzzy C Means (FCM) clustering that is guaranteed to converge to local minima. We also demonstrate, through theoretical analysis, that the FCM clustering with the proposed divergence based similarity measure, is more robust towards the perturbation of noise features than the standard FCM with Euclidean distance based similarity measure. In addition, we show that FCM with the suggested geometric divergence measure has better or comparable clustering performance to that of FCM with squared Euclidean distance on real world and synthetic datasets (even in absence of the noise features).

Introduction

Clustering is the method of unsupervised partitioning of a collection of data based on some prefixed similarity measures, such that the data belonging to the same cluster (same label) are more similar to each other, than they are to the data belonging to the other clusters. Clustering has been extensively studied across varied disciplines like decision-making, exploratory pattern-analysis, taxonomy, and machine-learning problems, including data mining, information retrieval, image segmentation, and scene understanding [13], [14].

Data clustering algorithms come in two primary categories: hierarchical and partitional [14]. Here, we focus on a special type of partitional clustering called Fuzzy C Means (FCM) [5]. In FCM, the clusters are treated as fuzzy sets and data-points are assigned a membership in each cluster (between [0, 1]), based on their “distances” from the corresponding cluster center (the most representative point in a cluster) in the designated feature space. FCM locally minimizes an objective function which consists of a weighted sum of the dissimilarity (distance) between each data-point and cluster centers. The weight of the dissimilarity between the ith data-point and the center of the jth cluster is a “suitable” function of the membership of ith point in jth cluster.

In order to detect the amount of homogeneity/heterogeneity between two data-points, a similarity measure is required. Choice of a suitable similarity measure plays a pivotal role in recovering the cluster structure of the data under consideration. Though squared Euclidean distance has been the most popular choice of distance measure for clustering, in recent past there has been an increased interest in replacing conventional metrics with statistical divergence measures [3], [18], [19].Such divergence measures introduce additional non-linearity in the dissimilarity measure without the need of kernel-space mapping and in general retain acceptably good clustering performance.

FCM was first introduced by Dunn [8] and subsequently generalized by Bezdek [5]. This algorithm also used Euclidean distance as the dissimilarity measure and an Alternating Optimization (AO) heuristic [6] was used to locally minimize the criterion function. This algorithm has undergone several notable changes, in terms of the similarity measures used, over the years. To mitigate the problems of the bias of Euclidean distance based clustering algorithms towards hyper-spherical clusters, Mahalanobish Distance (MD) was used in FCM. Gustafson-Kessel (GK) [11] and Gath-Geva (GG) [10] clustering algorithms were developed for this purpose. However, Krishnapuram and Kim [15] proved that MD is not directly usable in a fuzzy clustering algorithm. Liu et al. [16] proposed a modification to MD by imposing a restriction on the covariance matrix. The distance function for FCM with arithmetic mean as the cluster centers was generalized [21] and it was found that using divergence measures belonging to a class of Point-to-Centroid Distance (P2C-D), will guarantee convergence of FCM when the cluster centers are obtained as the arithmetic mean. This class constituted of Bregman divergence and some other divergences. Teboulle [18] developed a unified continuous optimization framework for the center-based clustering methods with general distance like functions. Some more generalization of FCM with other divergence-based similarity measures can be found in [23].

We first present an alternative formulation of the FCM clustering algorithm with the family of geometric divergence measures [4]. This broader class of distance measures are defined on R+d instead of the standard probability simplex (as in the case of f-divergences). The update rules and optimality conditions for geometric divergence based FCM are presented in Theorems 1 and 2 of Section 2. Experimental comparison on synthetic and real datasets shows that FCM with geometrical divergence can yield better or comparable results with respect to the conventional FCM. We finally demonstrate an interesting aspect of noise resilience of the proposed FCM algorithm. In a clustering task, noise features are those which do not contribute to the discrimination of the naturally occurring clusters and can be misleading to a clustering algorithm. If we continue multiplying the noise feature with a constant (e.g. a > 1), both the mean and variance of that feature will increase. Hence, its contribution towards the distance between any two data-points will also increase, which will further deteriorate the clustering performance. Hence, if we can propose a similarity measure, which is able to handle some specific kind of noise features and has little or no effect of noise feature perturbation, that will mitigate the problem. We show that FCM equipped with the geometric divergence measure (instead of squared Euclidean distance) can indeed counterbalance the effect of noise feature perturbation by preserving the cluster structure very well. Such formulation holds true even when the perturbing scale factor is different for different data-points.

Section snippets

Geometric divergence based fuzzy clustering

In this section, we provide a brief description of the divergence measure of choice and develop an FCM algorithm with the specific divergence measure.

Intuitive justification and empirical observation

We investigate the question of resilience to noise features in the following way. It is known that if we multiply the noise feature with a positive constant a > 1, then the performance of FCM is likely to deteriorate as this results in an increased share of the noise feature in the distance computation. From the very definition of geometric distance measure, we see that it scales linearly i.e. dφ(ax,ay)=adφ(x,y), whereas squared Euclidean distance scales in a quadratic fashion, i.e. d(ax,ay)=a2d

Theoretical analysis

In order to support the empirical observation and the intuitive justifications stated in the last section, we carry out a theoretical study of the effect of the increment of variance of noise feature(s) on the membership matrix in the context to FCM algorithm with both Euclidean distance and any general geometric divergence. As membership vector corresponding to each of the patterns has no relation with that of others, we study change in membership of a single pattern. We investigate the

Experimental results

Here, we provide detailed experimental results to validate our claim and highlight some other characteristics of FCM with geometric divergence measure.

Conclusion

We developed an alternating optimization based FCM algorithm with geometric divergence measures. The theoretical treatment, undertaken in the paper, is applicable to any divergence measure in general. We provided a mathematical proof of greater resilience (for specific kind of noise feature and perturbation) of FCM with geometric divergences than that corresponding to FCM with the conventional squared Euclidean distance. We validated our claim through detailed comparative discussion with

References (24)

  • I. Gath et al.

    Unsupervised optimal fuzzy clustering

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1989)
  • D. Gustafson et al.

    Fuzzy clustering with a fuzzy covariance matrix

    Proceedings of the IEEE Conference on Decision and Control Including the 17th Symposium on Adaptive Processes, 17

    (1978)
  • Cited by (21)

    • A novel kernelized total Bregman divergence-based fuzzy clustering with local information for image segmentation

      2021, International Journal of Approximate Reasoning
      Citation Excerpt :

      In the end, the novel kernel-based TBD with local neighborhood information is applied to modify the objective function of robust weighted fuzzy local information c-mean clustering, and an optimization mathematical model of robust weighed fuzzy local information clustering with kernel-based TBD is constructed by clustering compactness criterion, afterwards Lagrange multiplier method is employed to obtain the expression of iterative solution of robust fuzzy clustering optimization problem. In short, this proposed robust fuzzy clustering algorithm is a major improvement of TBD and BD theory [17–21] in the fuzzy clustering algorithm. Experimental results show that the proposed algorithm not only improves the segmentation accuracy and robustness of existing fuzzy clustering-related algorithms for image corrupted with high noise, but also reduces the classification error rate of state-of-the-art KWFLICM algorithm for uneven illumination image.

    • Noise distance driven fuzzy clustering based on adaptive weighted local information and entropy-like divergence kernel for robust image segmentation

      2021, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Based on the superior performance of kernel method in high-dimensional feature space, a few scholars have successively extended information divergence to kernel space [24–26], among which KL-divergence kernel [27–30], Jensen-Shannon divergence kernel [31–33], Renyi divergence kernel [27], and Bhattacharyya distance kernel [34] have been widely used, and good performance has been achieved. Due to the wide application of various divergence measures, based on the concept of S-divergence [35,36], an entropy-like divergence is firstly constructed by combining Jensen-Shannon/Bregman divergence [37] with convex function in this paper. Meanwhile, the square root of entropy-like divergence is proved to be a distance measure, and its anti-noise performance is also tested by artificial data.

    • Clustering analysis using an adaptive fused distance

      2020, Engineering Applications of Artificial Intelligence
    • Fuzzy c-means clustering using Jeffreys-divergence based similarity measure

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      In recent years, it is observed that investigators have been replacing the traditional Euclidean distance with the help of non-linear similarity measures to identify more accurate cluster boundaries. Few of them do not follow all the metric properties especially the triangle inequality property [11–14]. General Bregman divergence as a similarity measure was integrated with the k-means to improve the performance of traditional k-means [11].

    View all citing articles on Scopus

    This paper has been recommended for acceptance by Y. Liu.

    View full text