Validation criteria for enhanced fuzzy clustering
Introduction
Since Zadeh’s initial introduction of the concept of fuzzy sets (1965), numerous fuzzy set-based approaches have been developed to model systems with uncertainties. The principles of these theories are to identify uncertainties in a given system by means of linguistic terms represented with membership functions. Fuzzy clustering methods are one of the strategies implemented to identify these membership functions by organizing patterns into clusters such that data samples within clusters are more similar to each other. The most commonly used fuzzy clustering method is the fuzzy C-means (FCM) (Bezdek, 1981) algorithm. Numerous variations of FCM algorithm are later developed for different purposes, e.g. (Hathaway and Bezdek, 1993, Höppner and Klawonn, 2003, Pedrycz, 2004, Cimino et al., 2006).
Recently, the authors have developed a new improved fuzzy clustering (IFC) (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication) algorithm for regression and classification type domains. Initially, IFC combines standard fuzzy clustering, i.e. FCM (Bezdek, 1981) and fuzzy C-regression, i.e. FCRM (Hathaway and Bezdek, 1993), algorithms for identification of the structures of fuzzy system models (FSM) with improved fuzzy functions (IFFs) (Türkşen, in press, Türkşen and Celikyilmaz, 2006), which use membership values as additional predictors of the system model. They have shown that FSMs with IFFs can provide better estimations. The extension of the novel IFC to classification problems, IFC-C (Celikyilmaz and Türkşen, submitted for publication) also explores any given classification dataset to find local fuzzy partitions and simultaneously builds c fuzzy classifiers (functions).
Every fuzzy clustering approach, including the latest IFC (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication) algorithms, assumes that some initialization parameters are known. Cluster validity index (CVI) measures, (Fukuyama and Sugeno, 1989, Xie and Beni, 1991, Pal and Bezdek, 1995, Bezdek, 1976) have been proposed to validate the underlying assumptions of the number of clusters, mainly for FCM (Bezdek, 1981) clustering approach. Later, many variations of these functions are introduced, e.g., (Bouguessa et al., 2006, Dave, 1996, Kim et al., 2003, Kim and Ramakrishna, 2005, Wu and Yang, 2005a, Wu et al., 2005b), have been extended. The main characteristics of these CVIs are that they all use either within cluster, viz., compactness, or between cluster distances, viz., separability, or both as a way of assessing the clustering schema (Kim et al., 2003). Base on the way the compactness and separability are coupled, the CVI measures are generally classified into ratio or summation-type measures.
Most commonly the CVIs listed above are designed to validate FCM (Bezdek, 1981) clustering algorithm and they may not be suitable for other variations of fuzzy clustering algorithms, which are designed for different purposes, e.g. fuzzy C-regression (switching regression) algorithm (FCRM) (Hathaway and Bezdek, 1993). For these types of FCM variations, different validity measures are created. For instance, in (Kung and Lin, 2002, Kung and Lin, 2004), a new CVI, which is a modification of the Xie and Beni (1991) ratio-type validity function, is introduced to measure the optimum number of clusters of the FCRM applications. It accounts for the similarity between regression models using the standard inner product of unit normal vectors, instead of the distance between the cluster centers.
In this paper, two new validity criterions is introduced to measure the optimum number of clusters, denoted with c∗, using two different versions of IFC algorithms (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication). The new ratio-type validity criterions measure the ratio between the compactness and separability of the clusters. Since IFC is a new type of hybrid-clustering method, which in a way uses structures from two separate clustering algorithms during optimization, viz., fuzzy clustering types, i.e. FCM (Bezdek, 1981) and FCRM (Hathaway and Bezdek, 1993) in a novel way, and utilizes fuzzy functions (FFs), the new CVI is designed to validate two different concepts. The compactness couples within cluster distances and c number of regression/classification function errors between the actual and estimated output values/class labels. The separability, on the other hand, will determine the structure of the clusters by measuring the ratio between the cluster center distances and the angle between their fuzzy decision surfaces.
The organization of this paper is as follows. In Section 2, IFC algorithms are briefly reviewed. In Section 3, the new CVIs designed for IFC algorithm are introduced. In Section 4, we present simulation results of the application of the new CVI on different dataset domains using artificial and real life datasets and compare the results to three other well-known cluster validity measures, which are closely related to the proposed validity measures.
Section snippets
Improved fuzzy clustering (IFC) algorithm
In the earlier FSM with FFs modeling strategies (Celikyilmaz and Türkşen, 2007, Türkşen, in press, Türkşen and Celikyilmaz, 2006) standard FCM clustering algorithm is used to find membership values, which are supposed to represent good partitions of the given system domain. In (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication), two new fuzzy clustering methods are presented, improved fuzzy clustering, IFC (Celikyilmaz and Türkşen, in press) for regression
Validity measures for improved fuzzy clustering (IFC)
The fact that IFC couples point-wise clustering, e.g. FCM, and regression/classification type clustering e.g. fuzzy C-regression (FCRM), we hypothesize that the new validity index should include the concepts from two different types of CVIs. Hence, we will investigate both types in this section before we present the new CVI.
Analysis of experiments
This section presents the simulation results from the application of the cviFF and cviFF-C on the results of the IFC and IFC-C, respectively, using artificial and real datasets with different structures.
Discussions
We conclude the following results from the application of the new cluster validity functions, cviFF as well as XB, XB∗, and Kung–Lin CVI measures on the outcome of IFC using 4 different artificial datasets and a real dataset, additionally cviFF-C on the outcome of IFC-C using a real classification dataset.
Conclusion
In this paper, two new cluster validity criterions are introduced for the validation of a previously proposed improved fuzzy clustering (IFC) algorithm. Given fuzzy partitions with different input–output relations, the proposed validity index, cviFF, computes two different clustering (dis)similarities, compactness and separability. The best CVI is obtained by the ratio between the maximum compactness and the minimum separability measures. The new index gradually asymptotes to its minimum after
Acknowledgements
We gratefully thank anonymous reviewers for their many helpful and constructive comments and suggestions. This work has been supported by research grants from the Natural Sciences and Engineering Research Council of Canada (NSERC).
References (24)
- et al.
An objective approach to cluster validation
Pattern Recognition Lett.
(2006) - et al.
Fuzzy functions with support vector machines
Inform. Sciences
(2007) - et al.
A novel approach to fuzzy clustering based on a dissimilarity relation extracted from data using TS system
Pattern Recognition
(2006) Validating fuzzy partition obtained through C-shells clustering
Pattern Recognition Lett.
(1996)- et al.
Improved fuzzy partitions for fuzzy regression models
Internat. J. Approx. Reason.
(2003) - et al.
New indices for cluster validity assessment
Pattern Recognition Lett.
(2005) - et al.
Fuzzy cluster validation index based on inter-cluster proximity
Pattern Recognition Lett.
(2003) Fuzzy clustering with knowledge-based guidance
Pattern Recognition Lett.
(2004)- et al.
A cluster validity index for fuzzy clustering
Pattern Recognition Lett.
(2005) - et al.
A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests
Pattern Recognition Lett.
(2005)
Fuzzy sets
Inform. Control
Cluster validity with fuzzy sets
J. Cybernetics
Cited by (38)
Analyzing correlation between quality and accuracy of graph clustering
2023, Advances in ComputersGranular transfer learning using type-2 fuzzy HMM for text sequence recognition
2016, NeurocomputingCitation Excerpt :On the other hand some criterion such as cluster validity indices [40] would be tested.
Alpha-plane based automatic general type-2 fuzzy clustering based on simulated annealing meta-heuristic algorithm for analyzing gene expression data
2015, Computers in Biology and MedicineIntegrating Fuzzy C-Means and TOPSIS for performance evaluation: An application and comparative analysis
2014, Expert Systems with ApplicationsCitation Excerpt :FCM clustering has been widely applied in fields such as astronomy, geology, medical imaging, target recognition, and image segmentation (Bezdek & Pal, 1992; Chuang, Tzeng, Chen, Wu, & Chen, 2006; Tizhoosh, 1998). Several enhancements to the original FCM method such as integration with genetic algorithms, simulated annealing, ant colony optimization and fuzzy particle swarm optimization and their various amalgamations have been proposed to overcome some inherent limitations of the original method (Celikyilmaz & Burhan Türkşen, 2008a, 2008b; Izakian & Abraham, 2011). The two major deficiencies are the tendency of FCM iterative algorithm to become trapped in a local minimum and the slow convergence towards the optimal values.
A new cluster validity measure based on general type-2 fuzzy sets: Application in gene expression data clustering
2014, Knowledge-Based SystemsCitation Excerpt :These measures are called Cluster Validity Indices (CVIs). To date, many different CVIs have been introduced to the literature including [15,19–26]. Of course, there are some studies on CVIs which can handle IT2 FCM.