Elsevier

Pattern Recognition Letters

Volume 29, Issue 2, 15 January 2008, Pages 97-108
Pattern Recognition Letters

Validation criteria for enhanced fuzzy clustering

https://doi.org/10.1016/j.patrec.2007.08.017Get rights and content

Abstract

We introduce two new criterions for validation of results obtained from recent novel-clustering algorithm, improved fuzzy clustering (IFC) to be used to find patterns in regression and classification type datasets, separately. IFC algorithm calculates membership values that are used as additional predictors to form fuzzy decision functions for each cluster. Proposed validity criterions are based on the ratio of compactness to separability of clusters. The optimum compactness of a cluster is represented with average distances between every object and cluster centers, and total estimation error from their fuzzy decision functions. The separability is based on a conditional ratio between the similarities between cluster representatives and similarities between fuzzy decision surfaces of each cluster. The performance of the proposed validity criterions are compared to other structurally similar cluster validity indexes using datasets from different domains. The results indicate that the new cluster validity functions are useful criterions when selecting parameters of IFC models.

Introduction

Since Zadeh’s initial introduction of the concept of fuzzy sets (1965), numerous fuzzy set-based approaches have been developed to model systems with uncertainties. The principles of these theories are to identify uncertainties in a given system by means of linguistic terms represented with membership functions. Fuzzy clustering methods are one of the strategies implemented to identify these membership functions by organizing patterns into clusters such that data samples within clusters are more similar to each other. The most commonly used fuzzy clustering method is the fuzzy C-means (FCM) (Bezdek, 1981) algorithm. Numerous variations of FCM algorithm are later developed for different purposes, e.g. (Hathaway and Bezdek, 1993, Höppner and Klawonn, 2003, Pedrycz, 2004, Cimino et al., 2006).

Recently, the authors have developed a new improved fuzzy clustering (IFC) (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication) algorithm for regression and classification type domains. Initially, IFC combines standard fuzzy clustering, i.e. FCM (Bezdek, 1981) and fuzzy C-regression, i.e. FCRM (Hathaway and Bezdek, 1993), algorithms for identification of the structures of fuzzy system models (FSM) with improved fuzzy functions (IFFs) (Türkşen, in press, Türkşen and Celikyilmaz, 2006), which use membership values as additional predictors of the system model. They have shown that FSMs with IFFs can provide better estimations. The extension of the novel IFC to classification problems, IFC-C (Celikyilmaz and Türkşen, submitted for publication) also explores any given classification dataset to find local fuzzy partitions and simultaneously builds c fuzzy classifiers (functions).

Every fuzzy clustering approach, including the latest IFC (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication) algorithms, assumes that some initialization parameters are known. Cluster validity index (CVI) measures, (Fukuyama and Sugeno, 1989, Xie and Beni, 1991, Pal and Bezdek, 1995, Bezdek, 1976) have been proposed to validate the underlying assumptions of the number of clusters, mainly for FCM (Bezdek, 1981) clustering approach. Later, many variations of these functions are introduced, e.g., (Bouguessa et al., 2006, Dave, 1996, Kim et al., 2003, Kim and Ramakrishna, 2005, Wu and Yang, 2005a, Wu et al., 2005b), have been extended. The main characteristics of these CVIs are that they all use either within cluster, viz., compactness, or between cluster distances, viz., separability, or both as a way of assessing the clustering schema (Kim et al., 2003). Base on the way the compactness and separability are coupled, the CVI measures are generally classified into ratio or summation-type measures.

Most commonly the CVIs listed above are designed to validate FCM (Bezdek, 1981) clustering algorithm and they may not be suitable for other variations of fuzzy clustering algorithms, which are designed for different purposes, e.g. fuzzy C-regression (switching regression) algorithm (FCRM) (Hathaway and Bezdek, 1993). For these types of FCM variations, different validity measures are created. For instance, in (Kung and Lin, 2002, Kung and Lin, 2004), a new CVI, which is a modification of the Xie and Beni (1991) ratio-type validity function, is introduced to measure the optimum number of clusters of the FCRM applications. It accounts for the similarity between regression models using the standard inner product of unit normal vectors, instead of the distance between the cluster centers.

In this paper, two new validity criterions is introduced to measure the optimum number of clusters, denoted with c, using two different versions of IFC algorithms (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication). The new ratio-type validity criterions measure the ratio between the compactness and separability of the clusters. Since IFC is a new type of hybrid-clustering method, which in a way uses structures from two separate clustering algorithms during optimization, viz., fuzzy clustering types, i.e. FCM (Bezdek, 1981) and FCRM (Hathaway and Bezdek, 1993) in a novel way, and utilizes fuzzy functions (FFs), the new CVI is designed to validate two different concepts. The compactness couples within cluster distances and c number of regression/classification function errors between the actual and estimated output values/class labels. The separability, on the other hand, will determine the structure of the clusters by measuring the ratio between the cluster center distances and the angle between their fuzzy decision surfaces.

The organization of this paper is as follows. In Section 2, IFC algorithms are briefly reviewed. In Section 3, the new CVIs designed for IFC algorithm are introduced. In Section 4, we present simulation results of the application of the new CVI on different dataset domains using artificial and real life datasets and compare the results to three other well-known cluster validity measures, which are closely related to the proposed validity measures.

Section snippets

Improved fuzzy clustering (IFC) algorithm

In the earlier FSM with FFs modeling strategies (Celikyilmaz and Türkşen, 2007, Türkşen, in press, Türkşen and Celikyilmaz, 2006) standard FCM clustering algorithm is used to find membership values, which are supposed to represent good partitions of the given system domain. In (Celikyilmaz and Türkşen, in press, Celikyilmaz and Türkşen, submitted for publication), two new fuzzy clustering methods are presented, improved fuzzy clustering, IFC (Celikyilmaz and Türkşen, in press) for regression

Validity measures for improved fuzzy clustering (IFC)

The fact that IFC couples point-wise clustering, e.g. FCM, and regression/classification type clustering e.g. fuzzy C-regression (FCRM), we hypothesize that the new validity index should include the concepts from two different types of CVIs. Hence, we will investigate both types in this section before we present the new CVI.

Analysis of experiments

This section presents the simulation results from the application of the cviFF and cviFF-C on the results of the IFC and IFC-C, respectively, using artificial and real datasets with different structures.

Discussions

We conclude the following results from the application of the new cluster validity functions, cviFF as well as XB, XB, and Kung–Lin CVI measures on the outcome of IFC using 4 different artificial datasets and a real dataset, additionally cviFF-C on the outcome of IFC-C using a real classification dataset.

Conclusion

In this paper, two new cluster validity criterions are introduced for the validation of a previously proposed improved fuzzy clustering (IFC) algorithm. Given fuzzy partitions with different input–output relations, the proposed validity index, cviFF, computes two different clustering (dis)similarities, compactness and separability. The best CVI is obtained by the ratio between the maximum compactness and the minimum separability measures. The new index gradually asymptotes to its minimum after

Acknowledgements

We gratefully thank anonymous reviewers for their many helpful and constructive comments and suggestions. This work has been supported by research grants from the Natural Sciences and Engineering Research Council of Canada (NSERC).

References (24)

  • L.A. Zadeh

    Fuzzy sets

    Inform. Control

    (1965)
  • J.C. Bezdek

    Cluster validity with fuzzy sets

    J. Cybernetics

    (1976)
  • Cited by (38)

    • Granular transfer learning using type-2 fuzzy HMM for text sequence recognition

      2016, Neurocomputing
      Citation Excerpt :

      On the other hand some criterion such as cluster validity indices [40] would be tested.

    • Integrating Fuzzy C-Means and TOPSIS for performance evaluation: An application and comparative analysis

      2014, Expert Systems with Applications
      Citation Excerpt :

      FCM clustering has been widely applied in fields such as astronomy, geology, medical imaging, target recognition, and image segmentation (Bezdek & Pal, 1992; Chuang, Tzeng, Chen, Wu, & Chen, 2006; Tizhoosh, 1998). Several enhancements to the original FCM method such as integration with genetic algorithms, simulated annealing, ant colony optimization and fuzzy particle swarm optimization and their various amalgamations have been proposed to overcome some inherent limitations of the original method (Celikyilmaz & Burhan Türkşen, 2008a, 2008b; Izakian & Abraham, 2011). The two major deficiencies are the tendency of FCM iterative algorithm to become trapped in a local minimum and the slow convergence towards the optimal values.

    • A new cluster validity measure based on general type-2 fuzzy sets: Application in gene expression data clustering

      2014, Knowledge-Based Systems
      Citation Excerpt :

      These measures are called Cluster Validity Indices (CVIs). To date, many different CVIs have been introduced to the literature including [15,19–26]. Of course, there are some studies on CVIs which can handle IT2 FCM.

    View all citing articles on Scopus
    View full text