Elsevier

Pattern Recognition

Volume 33, Issue 7, July 2000, Pages 1219-1237
Pattern Recognition

The fuzzy c+2-means: solving the ambiguity rejection in clustering

https://doi.org/10.1016/S0031-3203(99)00110-7Get rights and content

Abstract

In this paper we deal with the clustering problem whose goal consists of computing a partition of a family of patterns into disjoint classes. The method that we propose is formulated as a constrained minimization problem, whose solution depends on a fuzzy objective function in which reject options are introduced. Two types of rejection have been included: the ambiguity rejection which concerns patterns lying near the class boundaries and the distance rejection dealing with patterns that are far away from all the classes. To compute these rejections, we propose an extension of the fuzzy c-means (FcM) algorithm of Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. This algorithm is called the fuzzy c+2-means (Fc+2M). These measures allow to manage uncertainty due both to imprecise and incomplete definition of the classes. The advantages of our method are (1) the degree of membership to the reject classes for a pattern xk are learned in the iterative clustering problem; (2) it is not necessary to compute other characteristics to determine the reject and ambiguity degrees; (3) the partial ambiguity rejections introduce a discounting process between the classical FcM membership functions in order to avoid the memberships to be spread across the classes; (4) the membership functions are more immune to noise and correspond more closely to the notion of compatibility. Preliminary computational experiences on the developed algorithm are encouraging and compared favorably with results from other methods as FcM, FPcM and F(c+1)M (fuzzy c+1-means: clustering with solely distance rejection) algorithms on the same data sets. The differences in the performance can be attributed to the fact that ambiguous patterns are less accounted in for the computing of the centers.

Introduction

In various information processing, as for example image understanding applications, signal or image processing, or diagnosis of a static or dynamic process, a pattern recognition approach is often used. The p parameters observed are used to build up the pattern vector. The system analysis is directly linked to the pattern classes to be discriminated in the p-dimensional representation space.

The pattern recognition process generally includes clustering, pattern classification and decision. We deal in this paper with clustering problem that refers to a broad spectrum of methods which subdivide a family of unlabeled objects into subsets, or clusters, which are pairwise disjoints, all non-empty, and produce the original data set via union. This problem can be defined as follows:

  • let X=(xk)k∈[1,n] be the family of objects where xk=(xk1,xk2,…,xkp)t is a pattern described by p features (i.e. xkRp);

  • let Ω=(ωi)i∈[1,c] be a family of classes.

Objects belonging to a same class share common properties that distinguish them from objects belonging to other classes. Then, clustering techniques in pattern recognition area consist of searching for a function f such that:f:X→Ω,xkf(xk),where f(xk) denotes the class associated with xk. The more an object xk belongs to a class, the closer it is to the class, in the representation space (i.e. f is usually a function of distance). In the literature, most of the clustering algorithms can be classified into two types:

  • Hard or crisp. In this case, algorithm assigns each features vector to a single cluster and ignores the possibility that this features vector may also belong to other clusters. Such algorithms are exclusive and the cluster labels are hard and mutually exclusive;

  • Fuzzy. Fuzzy clustering algorithms consider each cluster as a fuzzy set, while a membership function measures the probability that each features vector belongs to a cluster. So, each features vector may be assigned to multiple clusters with some degree of sharing measured by the membership function.

In the first case, classes may be described by a family of functions F=(fi)i∈[1,c], where xk is classified with a hard cluster label, such thatfi:X→0,1,xk1ifxk∈ωi,0otherwiseverifying ∀xk∈X,ci=1fi(xk)=1 (i.e. mutual exclusive property).

Clustering is widely performed by an iterative algorithm which is known as the crisp c-means algorithm [1], [2] and which makes it possible to find a local optimal family of c-centers clustering the family X=(xk)k∈[1,n].

In this paper we deal with the second case. Fuzzy set theory may be used to compute a family of functions called membership functions U=(μi)i∈[1,c] such thatμi:X→[0,1],xkμi(xk)verifying ∀xk∈X,ci=1μi(xk)=1. This constraint will be discussed later. Fuzzy clustering algorithms generate a fuzzy partition providing a measure of membership degree of each pattern to a given cluster. These methods are less prone to local minima than the crisp clustering algorithms since they make soft decisions in each iteration through the use of membership functions. Many membership functions have been defined for this purpose [3], [4], [5]. The first fuzzy clustering algorithm was developed in 1969 by Ruspini [6]. Following this, a class of fuzzy ISODATA clustering algorithm has been developed which includes fuzzy c-means [7] (FcM).

The classical FcM problem involves finding a fuzzy partition of the family X. It is sufficient therefore to find a family of membership functions which minimize the criterion:J0m(U,v)=i=1ck=1nμmikd2ik,where m>1 is a fuzzifier exponent, μik=μi(xk) and dik=||xkvi||G, G is a norm.

The FcM algorithm was developed to solve this minimization problem. It consists of choosing a random initial partition U(0) and iterating the two following equations:v(t+1)i=nk=1μ(t)mikxknk=1μ(t)mik,μ(t+1)ik=1cj=1(d(t+1)ik/d(t+1)jk)2/(m−1)Given a finite set of objects X, the fuzzy clustering of X into c clusters is a process which assigns membership degree for each of the object to every cluster. This algorithm converges under some conditions [7] to a local minimum of Eq. (1). During the past years, this has been the object of several extensions and utilization.

More particularly, we quote the algorithm from Gustafson and Kessel [8] allowing to take the covariance matrix of each class ωi into account. They argue that the use of fuzzy covariances is a natural approach to fuzzy clustering. For example, in the case of well-known Fisher's IRIS (cf. 4.5), experimental results indicate that more accurate clustering may be obtained using fuzzy covariances. The iteration then becomes:v(t+1)i=nk=1μ(t)mikxknk=1μ(t)mik,i(t+1)=nk=1μ(t)mikxk−v(t+1)ixk−v(t+1)itnk=1μ(t)mik,G(t+1)iii(t+1)1/pi(t+1)−1,μ(t+1)ik=1cj=1(d(t+1)ik/d(t+1)jk)2/(m−1)where ∑(t+1)i is the fuzzy covariances matrix for the cluster ωi and p is the features space dimension. Typically ρi=1,i=1,2,…,c. Then, the distance function1 in the objective function 1,dik=||xk−vi||G is defined asdik=||xk−vi||′Gi||xk−vi||.However, since the memberships are generated by a probabilistic constraint originally due to Ruspini [6]:∀xk∈X,i=1cμi(xk)=1i.e. sum to 1 over each column of U, the FcM algorithm suffers from several drawbacks:

  • the memberships are relative numbers. The membership of a point in a class depends on the membership of the point in all other classes. So, the cluster centers estimates are poor with respect to a possibilistic approach. This can be a serious problem in situations where one wishes to generate membership functions from training data; μik cannot be interpreted as the typicality of xk for the ith cluster;

  • the FcM algorithm is very sensitive to the presence of noise. The membership of noise points can be significantly high;

  • The FcM algorithm cannot distinguish between “equally highly likely” and “equally highly unlikely” [9].

To overcome this problem, Krishnapuram and Keller [9] proposed a new clustering model named possibilistic c-means (PcM), where the constraint is relaxed. In this case, the value μik should be interpreted as the typicality of xk relative to cluster i. But PcM is very sensitive to good initializations, and it sometimes generates coincident clusters. Moreover, values in U can be very sensitive to the choice of the additional parameters needed by the PcM.

We reformulate the fuzzy clustering problem by inclu-ding reject options to decrease these drawbacks. We propose a model and companion algorithm to optimize it, which we will call fuzzy c+2-means (Fc+2M) because it requires two additional clusters. This paper is organized as follows. In Section 2, to avoid the memberships to be spread across the classes and to allow to distinguish between “equally likely” and “unknown”, we define partial ambiguity rejections which introduce a discounting process between the classical FcM membership functions. We modify the objective function used in the FcM algorithm and we derive the membership and prototype update equations from the conditions for minimization of our criterion function. In Section 3, to improve the performance of our algorithm in the presence of noise, we define an amorphous noise cluster. This class allows us to reject an individual xk when it is very far from the centers of each class. So, our membership functions are more immune to noise and the membership functions correspond more closely to the notion of compatibility. The proof of this new theorem is presented.

We have chosen to propose an extension of the fuzzy c-means, (FcM), because

  • the crisp k-means algorithm provides an iterative clustering of the search space and does not require any initial knowledge about the structure in the data set;

  • the use of the fuzzy sets allows to manage uncertainty on measures, lack of information,… all characteristics which bring ambiguity notions;

  • most fuzzy clustering algorithms are derived from the FcM algorithm, which minimizes the sum of squared distances from the prototypes weighted by the corresponding memberships [8], [10], [11], [12]. These algorithms have been used very successfully in many applications in which the final goal is to make a crisp decision such as pattern classification. Moreover, we may interpret memberships as the probabilities or degrees of sharing.

The advantages of our method are:

  • 1.

    Contrary to Dubuisson, these rejects are introduced in the clustering or learning stage and not in the decision processing [13], [14].

  • 2.

    The membership degree to the ambiguity reject class of a pattern xk is explicit and, above all, this value is learned in the iterative clustering problem.

  • 3.

    No characteristic anymore is necessary to compute the reject and ambiguity degrees.

  • 4.

    The location of cluster centers may be modified (according to FcM) because the ambiguous and reject patterns are less taken into account.

Section 4 illustrates our approach on various examples in order to show the interest of clustering conditioned by reject measures. We first present two examples to provide insights to our approach. We then present in the third example, the results obtained with the FcM [15] (algorithm without reject option), FPcM [16] (algorithm with typicality), F(c+1)M [17], [18], [19] (algorithm with solely distance rejection), and F(c+2)M on the diamonds data set. Another example shows the behavior of F(c+2)M with a 2-D dataset concerning three classes. An example more realistic deals with a twofold comparison on a classical real data set. On the one hand, we compare the results obtained by different unsupervised clustering algorithm with reject option or not (FcM, FPcM, F(c+1)M, F(c+2)M). On the other hand, we compare the behavior of membership functions according to the both parameters, ρα and ρ, which makes it possible to control the power of distance and ambiguity rejections of our algorithm.

Section snippets

Clustering with ambiguity rejection

To reduce an excessive error rate due to noise and other uncertain factors inherent in any real-world system, clustering with an ambiguity rejection is a solution. In most papers, proposed rules in order to reject a pattern lying near the classes boundaries are based on threshold values in the decision processing and not on the clustering or learning stage.

In order to specify this decision making, it is common to characterize a pattern xk with an ambiguity concept. The ambiguity rejection has

Ambiguity and distance rejections simultaneously

We propose a parallel approach for the management of both types of rejects, qualified as distance and ambiguity rejection. They are considered at the same step and are explicitly considered as two additional classes.

In fuzzy pattern recognition, an individual xk located too far from a class has got a weak membership value according to this class. In order to reduce the misclassification risk, reject option is used to avoid to classify this individual. In most papers, rules proposed in order to

1-D study

We first present a simple example to provide insight into the ambiguity reject approach. Here we discuss the shape of membership functions in the 1-D case.

Fig. 3 presents two normalized gaussian classes with a global reject rate of 10% and α=1. μΘ and μ12, the membership functions are as per the theoretic study specified.

We notice a particular point which is totally ambiguous, characterized by the equality: μ12k=μ1k=μ2k. This can be explained as follows: from the equationdijk(d1k+d2k)24(∏cu=1d

Conclusion

We started this paper by justifying the interest to find a fuzzy partition and to introduce two type of reject classes in the clustering process: the distance rejection dealing with patterns that are far away from all the classes and the ambiguity rejection dealing with patterns lying near the boundaries of classes. The method that we propose is formulated as a constrained minimization problem, whose solution depends on a fuzzy objective function in which reject options are introduced.

To avoid

About the Author—MICHEL MÉNARD is currently an assistant professor at the University of La Rochelle, France. He holds a Ph.D. degree from the University of Poitiers, France (1993). His research interests are fuzzy pattern recognition, fuzzy sets, data fusion with particular applications to medical imaging.

References (29)

  • J.C. Bezdek et al.

    Convergence theory for fuzzy c-meansCounter examples and repairs

    IEEE Trans. Systems Man Cybernet.

    (1987)
  • D.E. Gustafson, W.C. Kessel, Fuzzy clustering with a fuzzy covariance matrix, Proceedingds of IEEE CDC, San Diego, CA,...
  • R. Krishnapuram

    A possibilistic approach to clustering

    IEEE Trans. Fuzzy Systems

    (1993)
  • N.B. Karayiannis, M.Ravuri, An Integrated Approach to Fuzzy Learning Vector Quantization and Fuzzy c-Means Clustering,...
  • Cited by (0)

    About the Author—MICHEL MÉNARD is currently an assistant professor at the University of La Rochelle, France. He holds a Ph.D. degree from the University of Poitiers, France (1993). His research interests are fuzzy pattern recognition, fuzzy sets, data fusion with particular applications to medical imaging.

    About the Author—CHRISTOPHE DEMKO is currently an assistant professor at the University of La Rochelle, France. He received an engineer grade and holds a Ph.D. degree from the University of Technology of Compigne, France (1992). His research interests are fuzzy logic, multi-agent systems and pattern recognition.

    About the Author—PIERRE LOONIS, born in 1968, is currently an assistant professor at the University of La Rochelle, where he received his Ph.D. degree in Pattern Recognition (1996). His main scientific interest include fuzzy pattern recognition, aggregation of multiple classification decisions, neural networks, genetic algorithms and real-world applications.

    View full text