Geometric divergence based fuzzy clustering with strong resilience to noise features☆
Introduction
Clustering is the method of unsupervised partitioning of a collection of data based on some prefixed similarity measures, such that the data belonging to the same cluster (same label) are more similar to each other, than they are to the data belonging to the other clusters. Clustering has been extensively studied across varied disciplines like decision-making, exploratory pattern-analysis, taxonomy, and machine-learning problems, including data mining, information retrieval, image segmentation, and scene understanding [13], [14].
Data clustering algorithms come in two primary categories: hierarchical and partitional [14]. Here, we focus on a special type of partitional clustering called Fuzzy C Means (FCM) [5]. In FCM, the clusters are treated as fuzzy sets and data-points are assigned a membership in each cluster (between [0, 1]), based on their “distances” from the corresponding cluster center (the most representative point in a cluster) in the designated feature space. FCM locally minimizes an objective function which consists of a weighted sum of the dissimilarity (distance) between each data-point and cluster centers. The weight of the dissimilarity between the ith data-point and the center of the jth cluster is a “suitable” function of the membership of ith point in jth cluster.
In order to detect the amount of homogeneity/heterogeneity between two data-points, a similarity measure is required. Choice of a suitable similarity measure plays a pivotal role in recovering the cluster structure of the data under consideration. Though squared Euclidean distance has been the most popular choice of distance measure for clustering, in recent past there has been an increased interest in replacing conventional metrics with statistical divergence measures [3], [18], [19].Such divergence measures introduce additional non-linearity in the dissimilarity measure without the need of kernel-space mapping and in general retain acceptably good clustering performance.
FCM was first introduced by Dunn [8] and subsequently generalized by Bezdek [5]. This algorithm also used Euclidean distance as the dissimilarity measure and an Alternating Optimization (AO) heuristic [6] was used to locally minimize the criterion function. This algorithm has undergone several notable changes, in terms of the similarity measures used, over the years. To mitigate the problems of the bias of Euclidean distance based clustering algorithms towards hyper-spherical clusters, Mahalanobish Distance (MD) was used in FCM. Gustafson-Kessel (GK) [11] and Gath-Geva (GG) [10] clustering algorithms were developed for this purpose. However, Krishnapuram and Kim [15] proved that MD is not directly usable in a fuzzy clustering algorithm. Liu et al. [16] proposed a modification to MD by imposing a restriction on the covariance matrix. The distance function for FCM with arithmetic mean as the cluster centers was generalized [21] and it was found that using divergence measures belonging to a class of Point-to-Centroid Distance (P2C-D), will guarantee convergence of FCM when the cluster centers are obtained as the arithmetic mean. This class constituted of Bregman divergence and some other divergences. Teboulle [18] developed a unified continuous optimization framework for the center-based clustering methods with general distance like functions. Some more generalization of FCM with other divergence-based similarity measures can be found in [23].
We first present an alternative formulation of the FCM clustering algorithm with the family of geometric divergence measures [4]. This broader class of distance measures are defined on instead of the standard probability simplex (as in the case of f-divergences). The update rules and optimality conditions for geometric divergence based FCM are presented in Theorems 1 and 2 of Section 2. Experimental comparison on synthetic and real datasets shows that FCM with geometrical divergence can yield better or comparable results with respect to the conventional FCM. We finally demonstrate an interesting aspect of noise resilience of the proposed FCM algorithm. In a clustering task, noise features are those which do not contribute to the discrimination of the naturally occurring clusters and can be misleading to a clustering algorithm. If we continue multiplying the noise feature with a constant (e.g. a > 1), both the mean and variance of that feature will increase. Hence, its contribution towards the distance between any two data-points will also increase, which will further deteriorate the clustering performance. Hence, if we can propose a similarity measure, which is able to handle some specific kind of noise features and has little or no effect of noise feature perturbation, that will mitigate the problem. We show that FCM equipped with the geometric divergence measure (instead of squared Euclidean distance) can indeed counterbalance the effect of noise feature perturbation by preserving the cluster structure very well. Such formulation holds true even when the perturbing scale factor is different for different data-points.
Section snippets
Geometric divergence based fuzzy clustering
In this section, we provide a brief description of the divergence measure of choice and develop an FCM algorithm with the specific divergence measure.
Intuitive justification and empirical observation
We investigate the question of resilience to noise features in the following way. It is known that if we multiply the noise feature with a positive constant a > 1, then the performance of FCM is likely to deteriorate as this results in an increased share of the noise feature in the distance computation. From the very definition of geometric distance measure, we see that it scales linearly i.e. whereas squared Euclidean distance scales in a quadratic fashion, i.e.
Theoretical analysis
In order to support the empirical observation and the intuitive justifications stated in the last section, we carry out a theoretical study of the effect of the increment of variance of noise feature(s) on the membership matrix in the context to FCM algorithm with both Euclidean distance and any general geometric divergence. As membership vector corresponding to each of the patterns has no relation with that of others, we study change in membership of a single pattern. We investigate the
Experimental results
Here, we provide detailed experimental results to validate our claim and highlight some other characteristics of FCM with geometric divergence measure.
Conclusion
We developed an alternating optimization based FCM algorithm with geometric divergence measures. The theoretical treatment, undertaken in the paper, is applicable to any divergence measure in general. We provided a mathematical proof of greater resilience (for specific kind of noise feature and perturbation) of FCM with geometric divergences than that corresponding to FCM with the conventional squared Euclidean distance. We validated our claim through detailed comparative discussion with
References (24)
- et al.
Genetic clustering for automatic evolution of clusters and application to image classification
Pattern Recognit.
(2002) - et al.
Entropic means
J. Math. Anal. Appl.
(1989) - et al.
Iterative shrinking method for clustering problems
Pattern Recognit.
(2006) Data clustering: 50 years beyond k-means
Pattern Recognit. Lett.
(2010)- et al.
Classification and Learning Using Genetic Algorithms: Applications in Bioinformatics and Web Intelligence
(2007) - et al.
Clustering with bregman divergences
J. Mach. Learn. Res.
(2005) Pattern Recognition with Fuzzy Objective Function Algorithms
(1981)- et al.
Convergence of alternating optimization
Neural, Parallel Sci. Comput.
(2003) I-divergence geometry of probability distributions and minimization problems
Ann.Probab.
(1975)A fuzzy relative of the ISODATA process and its use in detecting compact well-Separated Clusters
J. Cybernet.
(1973)
Unsupervised optimal fuzzy clustering
IEEE Trans. Pattern Anal. Mach. Intell.
Fuzzy clustering with a fuzzy covariance matrix
Proceedings of the IEEE Conference on Decision and Control Including the 17th Symposium on Adaptive Processes, 17
Cited by (21)
A novel kernelized total Bregman divergence-based fuzzy clustering with local information for image segmentation
2021, International Journal of Approximate ReasoningCitation Excerpt :In the end, the novel kernel-based TBD with local neighborhood information is applied to modify the objective function of robust weighted fuzzy local information c-mean clustering, and an optimization mathematical model of robust weighed fuzzy local information clustering with kernel-based TBD is constructed by clustering compactness criterion, afterwards Lagrange multiplier method is employed to obtain the expression of iterative solution of robust fuzzy clustering optimization problem. In short, this proposed robust fuzzy clustering algorithm is a major improvement of TBD and BD theory [17–21] in the fuzzy clustering algorithm. Experimental results show that the proposed algorithm not only improves the segmentation accuracy and robustness of existing fuzzy clustering-related algorithms for image corrupted with high noise, but also reduces the classification error rate of state-of-the-art KWFLICM algorithm for uneven illumination image.
Spectral embedded generalized mean based k-nearest neighbors clustering with S-distance
2021, Expert Systems with ApplicationsNoise distance driven fuzzy clustering based on adaptive weighted local information and entropy-like divergence kernel for robust image segmentation
2021, Digital Signal Processing: A Review JournalCitation Excerpt :Based on the superior performance of kernel method in high-dimensional feature space, a few scholars have successively extended information divergence to kernel space [24–26], among which KL-divergence kernel [27–30], Jensen-Shannon divergence kernel [31–33], Renyi divergence kernel [27], and Bhattacharyya distance kernel [34] have been widely used, and good performance has been achieved. Due to the wide application of various divergence measures, based on the concept of S-divergence [35,36], an entropy-like divergence is firstly constructed by combining Jensen-Shannon/Bregman divergence [37] with convex function in this paper. Meanwhile, the square root of entropy-like divergence is proved to be a distance measure, and its anti-noise performance is also tested by artificial data.
Clustering analysis using an adaptive fused distance
2020, Engineering Applications of Artificial IntelligenceTotal Bregman divergence-based fuzzy local information C-means clustering for robust image segmentation
2020, Applied Soft Computing JournalFuzzy c-means clustering using Jeffreys-divergence based similarity measure
2020, Applied Soft Computing JournalCitation Excerpt :In recent years, it is observed that investigators have been replacing the traditional Euclidean distance with the help of non-linear similarity measures to identify more accurate cluster boundaries. Few of them do not follow all the metric properties especially the triangle inequality property [11–14]. General Bregman divergence as a similarity measure was integrated with the k-means to improve the performance of traditional k-means [11].
- ☆
This paper has been recommended for acceptance by Y. Liu.