Elsevier

Pattern Recognition

Volume 39, Issue 1, January 2006, Pages 35-46
Pattern Recognition

Analytically tractable case of fuzzy c-means clustering

https://doi.org/10.1016/j.patcog.2005.06.005Get rights and content

Abstract

In this paper, we offer a simple and accurate clustering algorithm which was derived as a closed-form analytical solution to a cluster fit function minimization problem. As a result, the algorithm finds the global minimum of the fit function, and combines exceptional efficiency with optimal clustering results.

Introduction

The problem of clustering can be viewed as the problem of dividing a potentially large set of d-dimensional feature points P={pi}i=1n, into a few k<n compact subsets. Since these subsets are often associated with their centroids cj, one is facing the problem of finding k d-dimensional cluster centers C={cj}j=1k, such that grouping pi around them will be optimal in some preferred distance-related metric. In this respect, the majority of known clustering techniques [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] can be subdivided into optimal and suboptimal.

Optimal clustering algorithms [10] (such as many variants of hierarchical clustering) find the best possible dataset partition with respect to the chosen clustering metric. This optimality also ensures algorithm predictability. Unfortunately, most of the optimal techniques are based on the exhaustive and extremely time-consuming feature set breakdown analysis, with typical O(n2) complexity at a single algorithm step. Moreover, most of the optimal clustering algorithms process each pi individually, which makes them very sensitive to the outliers, often leading to counterintuitive results.

Suboptimal clustering algorithms (such as well-known fuzzy c-means, k-means, neural-network clustering etc.) [11], [6], [7], [8] sacrifice global optimality to the improved numerical efficiency and flexibility of the clustering process. It is performed as a suboptimal minimization of a certain cluster fit function F(c1,c2,,c3), measuring how close the unknown set of cluster centers C={cj}j=1k matches the set of P={pi}i=1n. Since finding the exact analytical solution for the F(c1,c2,,ck) extrema is usually impossible, the function is minimized with some numerical, iterative approach, typically converging to a reasonably good solution. However,

  • 1.

    Unlike the optimal clustering, the iterated suboptimal minimum is not guaranteed to be global (optimal).

  • 2.

    The starting iteration point is often guessed or set randomly, which, combined with non-optimal minimum, makes the whole outcome guess-dependent.

  • 3.

    The complexity of the suboptimal techniques still heavily depends on the number n of the feature points pi, that need to be processed in each iteration.

  • 4.

    The maximum number of sufficient iterations is not bounded.

In this paper, we offer a simple clustering algorithm that was derived as a closed-form analytical solution to the 2D fit function minimization problem. As a result, the algorithm finds the global minimum of the cluster fit function, does not require any exhaustive searches or unbounded iterations, and produces the exact solution after a well-defined, finite number of computational steps.

The paper can be broken into four parts: the choice of the cluster fit function, the solution for its global extrema, the numerical analysis of the algorithm performance, and the mathematical proofs for the most important algorithm's properties.

Section snippets

Cluster fit function

Our clustering equations are based on a particular form of the cluster fit function, which we define asF(c1,c2,,ck)=i=1nj=1kcj-pi2,cj,piRd.In this definition, d1 is the dimensionality of the feature (data) space, pi are the feature points, cj are the cluster centers that need to be found, and cj-pi denotes the Euclidean distance. The proposed definition of F(c1,c2,,ck) may be justified in a number of ways. Intuitively, the jcj-pi2 term quantifies how well the ith feature point pi

Optimizing cluster fit function in 2D

Consider the proposed cluster fit function in 2D, where it can be rewritten in complex form asF(c1,c2,,ck)=i=1j=1kcj-pi2=i=1j=1k(cj-pi)(c¯j-p¯i)=F(a1,a2,,ak,b1,b2,,bk),cj=aj+Ibj,c¯j=aj-Ibj,aj,bjR,I=-1.Function F(a1,a2,,ak,b1,b2,,bk) is a real-valued, smooth, and positive; therefore it has a global minimum satisfyingFar=i(c¯r-p¯i)jr(cj-pi)(c¯j-p¯i)+(cr-pi)jr(cj-pi)(c¯j-p¯i)=0,Fbr=Ii(c¯r-p¯i)jr(cj-pi)(c¯j-p¯i)-(cr-pi)jr(cj-pi)(c¯j-p¯i)=0.In particular, one can defineFr=12

Numerical results

We coded our globally optimal direct clustering technique (further referred to as DC-clustering) in Matlab®, and compared its performance to the other two major Matlab clustering functions:

  • 1.

    clusterdata(Data, Nclus), which employs hierarchical clustering; we will briefly refer to this algorithm as H-clustering;

  • 2.

    fcm(Data, Nclus), which implements fuzzy c-means clustering; we will refer to this function as FCM-clustering.

The clusterdata and fcm functions were used with their default Matlab options.

Conclusions

Undoubtedly, in terms of computational speed, our DC approach is the fastest of the three clustering methods: the dataset points in the DC algorithm are scanned only once, to compute the A and B matrices in the Av=B equation (7). This roughly corresponds to a single iteration of the FCM algorithm, but in our experiments FCM was taking up to 100 iterations to converge. Moreover, the number of the FCM iterations was observed to increase with the size n of the data to be clustered, making FCM even

Mathematical properties of the proposed DC algorithm

In this section we will prove the most important properties of the DC algorithm.

Summary

We introduced a new clustering approach particularly suitable for the fast and accurate 2D clustering. The DC algorithm is very straightforward and globally optimal with respect to its cluster fit measure. Computationally, it has low linear complexity with respect to the size of the dataset. As we demonstrated, DC can be viewed as an analytically tractable subcase of the more general FCM approach.

About the Author—OLEG PIANYKH received his M. Sc. Degree in applied mathematics from Moscow State University, Russia, in June 1994. He earned a Ph.D. degree in Computer Science from Louisiana State University, USA, in 1998. He is currently an Assistant Professor in the Department of Radiology, LSU Health Sciences Center, and Researcher at the Central Cardio Institute, Moscow, Russia. His research interests include medical image analysis and processing, teleradiology, PACS, data compression and

References (21)

  • R.J. Hathaway et al.

    Local convergence of the fuzzy c-means algorithms

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1986)
  • J.C. Dunn

    A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters

    J. Cybern.

    (1973)
  • J.C. Bezdek

    Pattern Recognition with Fuzzy Objective Function Algorithms

    (1981)
  • B.C. Everitt

    Cluster Analysis

    (1993)
  • H. Hellendoorn et al.

    Fuzzy Model Identification

    (1997)
  • U. Mailik et al.

    Performance evaluation of some clustering algorithms and validity indices

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • J.A. Hartigan

    Clustering Algorithms

    (1975)
  • L. Kaufman et al.

    Finding groups in Data: An Introduction to Cluster Analysis

    (1990)
  • B. Mirkin

    Mathematical Classification and Clustering

    (1996)
  • A. Baraldi et al.

    A survey of fuzzy clustering algorithms for pattern recognition

    IEEE Trans. Syst. Man Cybern.

    (1999)
There are more references available in the full text version of this article.

Cited by (19)

  • Natural gas consumption behavior of companies by clustering analysis

    2021, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    The fuzzy C-means method is another popular clustering technique, which combines the K-means algorithm and fuzzy logic. It is compelling in overcoming uncertainties in the real world (Dogan et al., 2020b; Pianykh, 2006). The concept of intuitionistic fuzzy set (IFS)’s noticeable feature is assigning each IFS element a membership and non-membership degree.

  • Robust Heterogeneous C-means

    2020, Applied Soft Computing Journal
  • Development and application of soft sensor model for heterogeneous information of aluminum reduction cells

    2011, Control Engineering Practice
    Citation Excerpt :

    Finally a soft sensor model for ACD of aluminum reduction cells based on FCM and multiple neural networks is developed. The FCM has been used in many fields, such as image segmentation (Chen, Giger, & Bick, 2006; Chuang, Tzeng, & Chen, 2006; Lo & Wang, 2001; Noordam, van den Broek, & Buydens, 2002; Oleg, 2006; Zhang & Chen, 2004), control (Pekka, Satu-Pia, & Pentti, 1999), etc. The algorithm of FCM is as follows:

  • Unsupervised minor prototype detection using an adaptive population partitioning algorithm

    2007, Pattern Recognition
    Citation Excerpt :

    Finally, Section 6 presents some brief conclusions and provides an indication of the intended direction of future research. Clustering provides a powerful tool for analyzing experimental data and is used extensively in pattern recognition applications [7–14]. Jiang et al. [15] classified existing clustering algorithms into a number of broad categories, including (1) partition-based clustering methods, e.g. the G-Means algorithm [16] and the K-Means algorithm [17], which partitions the dataset into K disjointed subsets which optimize a set of objective functions; (2) neural network methods, e.g. the Self-Organizing Map by Tamayo et al. [18], which organizes the output neurons as a grid and then maps each data point to a neuron (i.e. a cluster center) in accordance with the closest reference vector; (3) hierarchical clustering methods, e.g. the Un-weighted Pair Group Method with Arithmetic mean (UPGMA) [19] or the Density-based Hierarchical Clustering method (DHC) [20], which generate a hierarchy of nested clusters known as a dendrogram, in which the branches indicate the similarity between two subsets, and the dendrogram is then cut to produce the specified number of clusters; (4) graph-theoretical approaches such as CLICK [21] and CAST [22], in which each vertex in the graph corresponds to a data point, the linkage between vertex pairs indicates the proximity between them, and clusters in the data are represented by the connected components once the minimum cut or maximum cliques have been found; and finally, (5) model-based clustering methods [23,24], which are based on a statistical framework and make the assumption that each data point originates from a finite mixture of underlying probability distributions and each mixture is associated with one final cluster.

  • Pattern recognition of longitudinal trial data with nonignorable missingness:

    2009, International Journal of Information Technology and Decision Making
View all citing articles on Scopus

About the Author—OLEG PIANYKH received his M. Sc. Degree in applied mathematics from Moscow State University, Russia, in June 1994. He earned a Ph.D. degree in Computer Science from Louisiana State University, USA, in 1998. He is currently an Assistant Professor in the Department of Radiology, LSU Health Sciences Center, and Researcher at the Central Cardio Institute, Moscow, Russia. His research interests include medical image analysis and processing, teleradiology, PACS, data compression and visualization.

View full text