Validation criteria for enhanced fuzzy clustering

doi:10.1016/j.patrec.2007.08.017

Pattern Recognition Letters

Volume 29, Issue 2, 15 January 2008, Pages 97-108

https://doi.org/10.1016/j.patrec.2007.08.017 Get rights and content

Abstract

We introduce two new criterions for validation of results obtained from recent novel-clustering algorithm, improved fuzzy clustering (IFC) to be used to find patterns in regression and classification type datasets, separately. IFC algorithm calculates membership values that are used as additional predictors to form fuzzy decision functions for each cluster. Proposed validity criterions are based on the ratio of compactness to separability of clusters. The optimum compactness of a cluster is represented with average distances between every object and cluster centers, and total estimation error from their fuzzy decision functions. The separability is based on a conditional ratio between the similarities between cluster representatives and similarities between fuzzy decision surfaces of each cluster. The performance of the proposed validity criterions are compared to other structurally similar cluster validity indexes using datasets from different domains. The results indicate that the new cluster validity functions are useful criterions when selecting parameters of IFC models.

Introduction

Since Zadeh’s initial introduction of the concept of fuzzy sets (1965), numerous fuzzy set-based approaches have been developed to model systems with uncertainties. The principles of these theories are to identify uncertainties in a given system by means of linguistic terms represented with membership functions. Fuzzy clustering methods are one of the strategies implemented to identify these membership functions by organizing patterns into clusters such that data samples within clusters are more similar to each other. The most commonly used fuzzy clustering method is the fuzzy C-means (FCM) (Bezdek, 1981) algorithm. Numerous variations of FCM algorithm are later developed for different purposes, e.g. (Hathaway and Bezdek, 1993, Höppner and Klawonn, 2003, Pedrycz, 2004, Cimino et al., 2006).

Recently, the authors have developed a new improved fuzzy clustering (IFC) (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication) algorithm for regression and classification type domains. Initially, IFC combines standard fuzzy clustering, i.e. FCM (Bezdek, 1981) and fuzzy C-regression, i.e. FCRM (Hathaway and Bezdek, 1993), algorithms for identification of the structures of fuzzy system models (FSM) with improved fuzzy functions (IFFs) (Türkşen, in press, Türkşen and Celikyilmaz, 2006), which use membership values as additional predictors of the system model. They have shown that FSMs with IFFs can provide better estimations. The extension of the novel IFC to classification problems, IFC-C (Celikyilmaz and Türkşen, submitted for publication) also explores any given classification dataset to find local fuzzy partitions and simultaneously builds c fuzzy classifiers (functions).

Every fuzzy clustering approach, including the latest IFC (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication) algorithms, assumes that some initialization parameters are known. Cluster validity index (CVI) measures, (Fukuyama and Sugeno, 1989, Xie and Beni, 1991, Pal and Bezdek, 1995, Bezdek, 1976) have been proposed to validate the underlying assumptions of the number of clusters, mainly for FCM (Bezdek, 1981) clustering approach. Later, many variations of these functions are introduced, e.g., (Bouguessa et al., 2006, Dave, 1996, Kim et al., 2003, Kim and Ramakrishna, 2005, Wu and Yang, 2005a, Wu et al., 2005b), have been extended. The main characteristics of these CVIs are that they all use either within cluster, viz., compactness, or between cluster distances, viz., separability, or both as a way of assessing the clustering schema (Kim et al., 2003). Base on the way the compactness and separability are coupled, the CVI measures are generally classified into ratio or summation-type measures.

Most commonly the CVIs listed above are designed to validate FCM (Bezdek, 1981) clustering algorithm and they may not be suitable for other variations of fuzzy clustering algorithms, which are designed for different purposes, e.g. fuzzy C-regression (switching regression) algorithm (FCRM) (Hathaway and Bezdek, 1993). For these types of FCM variations, different validity measures are created. For instance, in (Kung and Lin, 2002, Kung and Lin, 2004), a new CVI, which is a modification of the Xie and Beni (1991) ratio-type validity function, is introduced to measure the optimum number of clusters of the FCRM applications. It accounts for the similarity between regression models using the standard inner product of unit normal vectors, instead of the distance between the cluster centers.

In this paper, two new validity criterions is introduced to measure the optimum number of clusters, denoted with c^∗, using two different versions of IFC algorithms (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication). The new ratio-type validity criterions measure the ratio between the compactness and separability of the clusters. Since IFC is a new type of hybrid-clustering method, which in a way uses structures from two separate clustering algorithms during optimization, viz., fuzzy clustering types, i.e. FCM (Bezdek, 1981) and FCRM (Hathaway and Bezdek, 1993) in a novel way, and utilizes fuzzy functions (FFs), the new CVI is designed to validate two different concepts. The compactness couples within cluster distances and c number of regression/classification function errors between the actual and estimated output values/class labels. The separability, on the other hand, will determine the structure of the clusters by measuring the ratio between the cluster center distances and the angle between their fuzzy decision surfaces.

The organization of this paper is as follows. In Section 2, IFC algorithms are briefly reviewed. In Section 3, the new CVIs designed for IFC algorithm are introduced. In Section 4, we present simulation results of the application of the new CVI on different dataset domains using artificial and real life datasets and compare the results to three other well-known cluster validity measures, which are closely related to the proposed validity measures.

Section snippets

Improved fuzzy clustering (IFC) algorithm

In the earlier FSM with FFs modeling strategies (Celikyilmaz and Türkşen, 2007, Türkşen, in press, Türkşen and Celikyilmaz, 2006) standard FCM clustering algorithm is used to find membership values, which are supposed to represent good partitions of the given system domain. In (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication), two new fuzzy clustering methods are presented, improved fuzzy clustering, IFC (Celikyilmaz and Türkşen, in press) for regression

Validity measures for improved fuzzy clustering (IFC)

The fact that IFC couples point-wise clustering, e.g. FCM, and regression/classification type clustering e.g. fuzzy C-regression (FCRM), we hypothesize that the new validity index should include the concepts from two different types of CVIs. Hence, we will investigate both types in this section before we present the new CVI.

Analysis of experiments

This section presents the simulation results from the application of the cviFF and cviFF-C on the results of the IFC and IFC-C, respectively, using artificial and real datasets with different structures.

Discussions

We conclude the following results from the application of the new cluster validity functions, cviFF as well as XB, XB^∗, and Kung–Lin CVI measures on the outcome of IFC using 4 different artificial datasets and a real dataset, additionally cviFF-C on the outcome of IFC-C using a real classification dataset.

Conclusion

In this paper, two new cluster validity criterions are introduced for the validation of a previously proposed improved fuzzy clustering (IFC) algorithm. Given fuzzy partitions with different input–output relations, the proposed validity index, cviFF, computes two different clustering (dis)similarities, compactness and separability. The best CVI is obtained by the ratio between the maximum compactness and the minimum separability measures. The new index gradually asymptotes to its minimum after

Acknowledgements

We gratefully thank anonymous reviewers for their many helpful and constructive comments and suggestions. This work has been supported by research grants from the Natural Sciences and Engineering Research Council of Canada (NSERC).

References (24)

M. Bouguessa et al.
An objective approach to cluster validation
Pattern Recognition Lett.
(2006)
A. Celikyilmaz et al.
Fuzzy functions with support vector machines
Inform. Sciences
(2007)
M.G.C.A. Cimino et al.
A novel approach to fuzzy clustering based on a dissimilarity relation extracted from data using TS system
Pattern Recognition
(2006)
R.N. Dave
Validating fuzzy partition obtained through C-shells clustering
Pattern Recognition Lett.
(1996)
F. Höppner et al.
Improved fuzzy partitions for fuzzy regression models
Internat. J. Approx. Reason.
(2003)
M. Kim et al.
New indices for cluster validity assessment
Pattern Recognition Lett.
(2005)
D.-W. Kim et al.
Fuzzy cluster validation index based on inter-cluster proximity
Pattern Recognition Lett.
(2003)
W. Pedrycz
Fuzzy clustering with knowledge-based guidance
Pattern Recognition Lett.
(2004)
K.-L. Wu et al.
A cluster validity index for fuzzy clustering
Pattern Recognition Lett.
(2005)
K.-L. Wu et al.
A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests
Pattern Recognition Lett.
(2005)

L.A. Zadeh

Fuzzy sets

Inform. Control

(1965)

J.C. Bezdek

Cluster validity with fuzzy sets

J. Cybernetics

(1976)

Cited by (38)

Analyzing correlation between quality and accuracy of graph clustering
2023, Advances in Computers
In this chapter, a model is proposed to establish correlation between the quality and accuracy metrics that are used during graph clustering evaluation. Earlier works have discussed trade-off between accuracy and quality, but it has never been studied extensively to derive correlation between the two and to ensure accuracy alternatively. The experimental analysis also shows such trade-off between accuracy and quality metrics. The proposed model has addressed a solution to the trade-off between quality and accuracy by establishing correlation between two via number of clusters. We have performed empirical analysis to validate the existence of correlation between quality and accuracy via number of clusters. The analysis indicated that the number of clusters plays significant role in assurance of quality as well as accuracy of the clustering.
Optimizing the prototypes with a novel data weighting algorithm for enhancing the classification performance of fuzzy clustering
2021, Fuzzy Sets and Systems
Fuzzy clustering is regarded as an unsupervised learning process that constitutes a prerequisite for many other data mining techniques. Deciding how to classify data efficiently and accurately has been one of the topics pursued by many researchers. We anticipate that the classification performance of the clustering is strongly dependent on the boundary data (viz. data located at the boundaries of the clusters). The boundary data hold some levels of uncertainties and as such contain more information than others. Usually the greater the uncertainty, the more information contained in such data. To improve the quality of clustering, this study develops an augmented scheme of fuzzy clustering, in which a novel weighted data-based fuzzy clustering is proposed. In the introduced scheme, a dataset is composed of boundary data and non-boundary data. The partition matrix is used to determine the boundary data and the non-boundary data to be next considered in the clustering process. Then, we assign different weights to each datum to construct the weighted data. During this process, we make the weights for the boundary data and the non-boundary data different, which makes the contributions of the boundary data and the non-boundary data to the prototypes being reduced and enhanced, respectively. Furthermore, we build a weighting function to determine the weights of the data. The weighted data are used to optimize the prototypes. With the optimized prototypes, the partition matrix can be refined, which ultimately makes the boundaries of the clusters optimized. Finally, the classification performance of fuzzy clustering is enhanced. We offer a thorough analysis of the developed scheme. Comprehensive experimental studies involving synthetic and publicly available datasets are reported to demonstrate the performance of the proposed approach.
Granular transfer learning using type-2 fuzzy HMM for text sequence recognition
2016, Neurocomputing
Citation Excerpt :
On the other hand some criterion such as cluster validity indices [40] would be tested.
Context information plays an important role in text sequence recognition, but it is difficult to harness the uncertainty caused by conflicting implications. In this paper, we propose a novel Granular Transfer (GT) learning with type-2 fuzzy Hidden Markov Model (HMM) called GT2HMM, in which interpretable granules’ representation is introduced to describe the contextual uncertainty for its transfer learning. The correspondences among words are transformed into information granules using fuzzy c-means. To fulfill the utilization of granular information in sequence recognition, we construct a type-2 fuzzy HMM which fuses labeled data and unlabeled observations. With a tunable granularity, correspondence information is refined in a coarse-to-fine manner in GT2HMM. Experiments on transductive and inductive transfer learning in part-of-speech (POS) tagging tasks verify the effectiveness of our proposed GT2HMM.
Alpha-plane based automatic general type-2 fuzzy clustering based on simulated annealing meta-heuristic algorithm for analyzing gene expression data
2015, Computers in Biology and Medicine
This paper considers microarray gene expression data clustering using a novel two stage meta-heuristic algorithm based on the concept of α-planes in general type-2 fuzzy sets. The main aim of this research is to present a powerful data clustering approach capable of dealing with highly uncertain environments. In this regard, first, a new objective function using α-planes for general type-2 fuzzy c-means clustering algorithm is represented. Then, based on the philosophy of the meta-heuristic optimization framework ‘Simulated Annealing’, a two stage optimization algorithm is proposed. The first stage of the proposed approach is devoted to the annealing process accompanied by its proposed perturbation mechanisms. After termination of the first stage, its output is inserted to the second stage where it is checked with other possible local optima through a heuristic algorithm. The output of this stage is then re-entered to the first stage until no better solution is obtained. The proposed approach has been evaluated using several synthesized datasets and three microarray gene expression datasets. Extensive experiments demonstrate the capabilities of the proposed approach compared with some of the state-of-the-art techniques in the literature.
Integrating Fuzzy C-Means and TOPSIS for performance evaluation: An application and comparative analysis
2014, Expert Systems with Applications
Citation Excerpt :
FCM clustering has been widely applied in fields such as astronomy, geology, medical imaging, target recognition, and image segmentation (Bezdek & Pal, 1992; Chuang, Tzeng, Chen, Wu, & Chen, 2006; Tizhoosh, 1998). Several enhancements to the original FCM method such as integration with genetic algorithms, simulated annealing, ant colony optimization and fuzzy particle swarm optimization and their various amalgamations have been proposed to overcome some inherent limitations of the original method (Celikyilmaz & Burhan Türkşen, 2008a, 2008b; Izakian & Abraham, 2011). The two major deficiencies are the tendency of FCM iterative algorithm to become trapped in a local minimum and the slow convergence towards the optimal values.
In this paper we introduce a multi-method multiple criteria approach for evaluating the performance of organizations. Performance analysis may include both strategic and operational performance, as well as financial and other less tangible factors. This paper introduces the use of Fuzzy C-Means and TOPSIS for organizational performance evaluation purposes. Using real company data and balanced scorecard accounting and performance dimensions the methodology is applied and evaluated. The predictive abilities of the technique from an organizational performance evaluation approach are evaluated using this data. One of the results from the illustrative application is that economic performance evaluation is not the best predictor of overall viability of some organizations, especially e-commerce based organizations.
A new cluster validity measure based on general type-2 fuzzy sets: Application in gene expression data clustering
2014, Knowledge-Based Systems
Citation Excerpt :
These measures are called Cluster Validity Indices (CVIs). To date, many different CVIs have been introduced to the literature including [15,19–26]. Of course, there are some studies on CVIs which can handle IT2 FCM.
As a widespread pattern recognition technique, clustering has been widely used in various disciplines including: science, engineering, medicine, etc. One the latest progresses in this field is introduction of general type-2 fuzzy sets and the new clustering method represented on its basis called general type-2 fuzzy c-means. In this paper, the aim is to develop a robust and accurate similarity measure between general type-2 fuzzy sets. Utilizing philosophy behind this developed similarity measure, the first exclusively developed general type-2 fuzzy cluster validity index will be proposed to be used for finding the optimal number of clusters through using general type-2 fuzzy c-means. To verify quality of the proposed approaches, several heavy computations have been conducted on artificial datasets and also real gene expression datasets. Numerical comparisons reveal robustness and quality of the proposed approach compared to several similar approaches in the literature.

View all citing articles on Scopus

View full text

Validation criteria for enhanced fuzzy clustering

Abstract

Introduction

Section snippets

Improved fuzzy clustering (IFC) algorithm

Validity measures for improved fuzzy clustering (IFC)

Analysis of experiments

Discussions

Conclusion

Acknowledgements

Pattern Recognition Lett.

Inform. Sciences

Pattern Recognition

Pattern Recognition Lett.

Internat. J. Approx. Reason.

Pattern Recognition Lett.

Pattern Recognition Lett.

Pattern Recognition Lett.

Pattern Recognition Lett.

Pattern Recognition Lett.

Inform. Control

Cluster validity with fuzzy sets

J. Cybernetics