Analytically tractable case of fuzzy c-means clustering
Introduction
The problem of clustering can be viewed as the problem of dividing a potentially large set of d-dimensional feature points , into a few compact subsets. Since these subsets are often associated with their centroids , one is facing the problem of finding d-dimensional cluster centers , such that grouping around them will be optimal in some preferred distance-related metric. In this respect, the majority of known clustering techniques [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] can be subdivided into optimal and suboptimal.
Optimal clustering algorithms [10] (such as many variants of hierarchical clustering) find the best possible dataset partition with respect to the chosen clustering metric. This optimality also ensures algorithm predictability. Unfortunately, most of the optimal techniques are based on the exhaustive and extremely time-consuming feature set breakdown analysis, with typical complexity at a single algorithm step. Moreover, most of the optimal clustering algorithms process each individually, which makes them very sensitive to the outliers, often leading to counterintuitive results.
Suboptimal clustering algorithms (such as well-known fuzzy c-means, k-means, neural-network clustering etc.) [11], [6], [7], [8] sacrifice global optimality to the improved numerical efficiency and flexibility of the clustering process. It is performed as a suboptimal minimization of a certain cluster fit function , measuring how close the unknown set of cluster centers matches the set of . Since finding the exact analytical solution for the extrema is usually impossible, the function is minimized with some numerical, iterative approach, typically converging to a reasonably good solution. However,
- 1.
Unlike the optimal clustering, the iterated suboptimal minimum is not guaranteed to be global (optimal).
- 2.
The starting iteration point is often guessed or set randomly, which, combined with non-optimal minimum, makes the whole outcome guess-dependent.
- 3.
The complexity of the suboptimal techniques still heavily depends on the number n of the feature points , that need to be processed in each iteration.
- 4.
The maximum number of sufficient iterations is not bounded.
The paper can be broken into four parts: the choice of the cluster fit function, the solution for its global extrema, the numerical analysis of the algorithm performance, and the mathematical proofs for the most important algorithm's properties.
Section snippets
Cluster fit function
Our clustering equations are based on a particular form of the cluster fit function, which we define asIn this definition, is the dimensionality of the feature (data) space, are the feature points, are the cluster centers that need to be found, and denotes the Euclidean distance. The proposed definition of may be justified in a number of ways. Intuitively, the term quantifies how well the ith feature point
Optimizing cluster fit function in 2D
Consider the proposed cluster fit function in 2D, where it can be rewritten in complex form asFunction is a real-valued, smooth, and positive; therefore it has a global minimum satisfyingIn particular, one can define
Numerical results
We coded our globally optimal direct clustering technique (further referred to as DC-clustering) in Matlab, and compared its performance to the other two major Matlab clustering functions:
- 1.
clusterdata(Data, Nclus), which employs hierarchical clustering; we will briefly refer to this algorithm as H-clustering;
- 2.
fcm(Data, Nclus), which implements fuzzy c-means clustering; we will refer to this function as FCM-clustering.
Conclusions
Undoubtedly, in terms of computational speed, our DC approach is the fastest of the three clustering methods: the dataset points in the DC algorithm are scanned only once, to compute the A and B matrices in the equation (7). This roughly corresponds to a single iteration of the FCM algorithm, but in our experiments FCM was taking up to 100 iterations to converge. Moreover, the number of the FCM iterations was observed to increase with the size of the data to be clustered, making FCM even
Mathematical properties of the proposed DC algorithm
In this section we will prove the most important properties of the DC algorithm.
Summary
We introduced a new clustering approach particularly suitable for the fast and accurate 2D clustering. The DC algorithm is very straightforward and globally optimal with respect to its cluster fit measure. Computationally, it has low linear complexity with respect to the size of the dataset. As we demonstrated, DC can be viewed as an analytically tractable subcase of the more general FCM approach.
About the Author—OLEG PIANYKH received his M. Sc. Degree in applied mathematics from Moscow State University, Russia, in June 1994. He earned a Ph.D. degree in Computer Science from Louisiana State University, USA, in 1998. He is currently an Assistant Professor in the Department of Radiology, LSU Health Sciences Center, and Researcher at the Central Cardio Institute, Moscow, Russia. His research interests include medical image analysis and processing, teleradiology, PACS, data compression and
References (21)
- et al.
Local convergence of the fuzzy c-means algorithms
IEEE Trans. Pattern Anal. Mach. Intell.
(1986) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters
J. Cybern.
(1973)Pattern Recognition with Fuzzy Objective Function Algorithms
(1981)Cluster Analysis
(1993)- et al.
Fuzzy Model Identification
(1997) - et al.
Performance evaluation of some clustering algorithms and validity indices
IEEE Trans. Pattern Anal. Mach. Intell.
(2002) Clustering Algorithms
(1975)- et al.
Finding groups in Data: An Introduction to Cluster Analysis
(1990) Mathematical Classification and Clustering
(1996)- et al.
A survey of fuzzy clustering algorithms for pattern recognition
IEEE Trans. Syst. Man Cybern.
(1999)
Cited by (19)
Natural gas consumption behavior of companies by clustering analysis
2021, Engineering Applications of Artificial IntelligenceCitation Excerpt :The fuzzy C-means method is another popular clustering technique, which combines the K-means algorithm and fuzzy logic. It is compelling in overcoming uncertainties in the real world (Dogan et al., 2020b; Pianykh, 2006). The concept of intuitionistic fuzzy set (IFS)’s noticeable feature is assigning each IFS element a membership and non-membership degree.
Robust Heterogeneous C-means
2020, Applied Soft Computing JournalDevelopment and application of soft sensor model for heterogeneous information of aluminum reduction cells
2011, Control Engineering PracticeCitation Excerpt :Finally a soft sensor model for ACD of aluminum reduction cells based on FCM and multiple neural networks is developed. The FCM has been used in many fields, such as image segmentation (Chen, Giger, & Bick, 2006; Chuang, Tzeng, & Chen, 2006; Lo & Wang, 2001; Noordam, van den Broek, & Buydens, 2002; Oleg, 2006; Zhang & Chen, 2004), control (Pekka, Satu-Pia, & Pentti, 1999), etc. The algorithm of FCM is as follows:
The structural clustering and analysis of metric based on granular space
2010, Pattern RecognitionUnsupervised minor prototype detection using an adaptive population partitioning algorithm
2007, Pattern RecognitionCitation Excerpt :Finally, Section 6 presents some brief conclusions and provides an indication of the intended direction of future research. Clustering provides a powerful tool for analyzing experimental data and is used extensively in pattern recognition applications [7–14]. Jiang et al. [15] classified existing clustering algorithms into a number of broad categories, including (1) partition-based clustering methods, e.g. the G-Means algorithm [16] and the K-Means algorithm [17], which partitions the dataset into K disjointed subsets which optimize a set of objective functions; (2) neural network methods, e.g. the Self-Organizing Map by Tamayo et al. [18], which organizes the output neurons as a grid and then maps each data point to a neuron (i.e. a cluster center) in accordance with the closest reference vector; (3) hierarchical clustering methods, e.g. the Un-weighted Pair Group Method with Arithmetic mean (UPGMA) [19] or the Density-based Hierarchical Clustering method (DHC) [20], which generate a hierarchy of nested clusters known as a dendrogram, in which the branches indicate the similarity between two subsets, and the dendrogram is then cut to produce the specified number of clusters; (4) graph-theoretical approaches such as CLICK [21] and CAST [22], in which each vertex in the graph corresponds to a data point, the linkage between vertex pairs indicates the proximity between them, and clusters in the data are represented by the connected components once the minimum cut or maximum cliques have been found; and finally, (5) model-based clustering methods [23,24], which are based on a statistical framework and make the assumption that each data point originates from a finite mixture of underlying probability distributions and each mixture is associated with one final cluster.
Pattern recognition of longitudinal trial data with nonignorable missingness:
2009, International Journal of Information Technology and Decision Making
About the Author—OLEG PIANYKH received his M. Sc. Degree in applied mathematics from Moscow State University, Russia, in June 1994. He earned a Ph.D. degree in Computer Science from Louisiana State University, USA, in 1998. He is currently an Assistant Professor in the Department of Radiology, LSU Health Sciences Center, and Researcher at the Central Cardio Institute, Moscow, Russia. His research interests include medical image analysis and processing, teleradiology, PACS, data compression and visualization.