The fuzzy c+2-means: solving the ambiguity rejection in clustering
Introduction
In various information processing, as for example image understanding applications, signal or image processing, or diagnosis of a static or dynamic process, a pattern recognition approach is often used. The p parameters observed are used to build up the pattern vector. The system analysis is directly linked to the pattern classes to be discriminated in the p-dimensional representation space.
The pattern recognition process generally includes clustering, pattern classification and decision. We deal in this paper with clustering problem that refers to a broad spectrum of methods which subdivide a family of unlabeled objects into subsets, or clusters, which are pairwise disjoints, all non-empty, and produce the original data set via union. This problem can be defined as follows:
- •
let X=(xk)k∈[1,n] be the family of objects where is a pattern described by p features (i.e. xk∈Rp);
- •
let be a family of classes.
Objects belonging to a same class share common properties that distinguish them from objects belonging to other classes. Then, clustering techniques in pattern recognition area consist of searching for a function f such that:where denotes the class associated with xk. The more an object xk belongs to a class, the closer it is to the class, in the representation space (i.e. f is usually a function of distance). In the literature, most of the clustering algorithms can be classified into two types:
- •
Hard or crisp. In this case, algorithm assigns each features vector to a single cluster and ignores the possibility that this features vector may also belong to other clusters. Such algorithms are exclusive and the cluster labels are hard and mutually exclusive;
- •
Fuzzy. Fuzzy clustering algorithms consider each cluster as a fuzzy set, while a membership function measures the probability that each features vector belongs to a cluster. So, each features vector may be assigned to multiple clusters with some degree of sharing measured by the membership function.
In the first case, classes may be described by a family of functions , where xk is classified with a hard cluster label, such thatverifying (i.e. mutual exclusive property).
Clustering is widely performed by an iterative algorithm which is known as the crisp c-means algorithm [1], [2] and which makes it possible to find a local optimal family of c-centers clustering the family X=(xk)k∈[1,n].
In this paper we deal with the second case. Fuzzy set theory may be used to compute a family of functions called membership functions U=(μi)i∈[1,c] such thatverifying . This constraint will be discussed later. Fuzzy clustering algorithms generate a fuzzy partition providing a measure of membership degree of each pattern to a given cluster. These methods are less prone to local minima than the crisp clustering algorithms since they make soft decisions in each iteration through the use of membership functions. Many membership functions have been defined for this purpose [3], [4], [5]. The first fuzzy clustering algorithm was developed in 1969 by Ruspini [6]. Following this, a class of fuzzy ISODATA clustering algorithm has been developed which includes fuzzy c-means [7] (FcM).
The classical FcM problem involves finding a fuzzy partition of the family X. It is sufficient therefore to find a family of membership functions which minimize the criterion:where m>1 is a fuzzifier exponent, μik=μi(xk) and dik=||xk−vi||G, G is a norm.
The FcM algorithm was developed to solve this minimization problem. It consists of choosing a random initial partition U(0) and iterating the two following equations:Given a finite set of objects X, the fuzzy clustering of X into c clusters is a process which assigns membership degree for each of the object to every cluster. This algorithm converges under some conditions [7] to a local minimum of Eq. (1). During the past years, this has been the object of several extensions and utilization.
More particularly, we quote the algorithm from Gustafson and Kessel [8] allowing to take the covariance matrix of each class ωi into account. They argue that the use of fuzzy covariances is a natural approach to fuzzy clustering. For example, in the case of well-known Fisher's IRIS (cf. 4.5), experimental results indicate that more accurate clustering may be obtained using fuzzy covariances. The iteration then becomes:where ∑(t+1)i is the fuzzy covariances matrix for the cluster ωi and p is the features space dimension. Typically . Then, the distance function1 in the objective function is defined asHowever, since the memberships are generated by a probabilistic constraint originally due to Ruspini [6]:i.e. sum to 1 over each column of U, the FcM algorithm suffers from several drawbacks:
- •
the memberships are relative numbers. The membership of a point in a class depends on the membership of the point in all other classes. So, the cluster centers estimates are poor with respect to a possibilistic approach. This can be a serious problem in situations where one wishes to generate membership functions from training data; μik cannot be interpreted as the typicality of xk for the ith cluster;
- •
the FcM algorithm is very sensitive to the presence of noise. The membership of noise points can be significantly high;
- •
The FcM algorithm cannot distinguish between “equally highly likely” and “equally highly unlikely” [9].
To overcome this problem, Krishnapuram and Keller [9] proposed a new clustering model named possibilistic c-means (PcM), where the constraint is relaxed. In this case, the value μik should be interpreted as the typicality of xk relative to cluster i. But PcM is very sensitive to good initializations, and it sometimes generates coincident clusters. Moreover, values in U can be very sensitive to the choice of the additional parameters needed by the PcM.
We reformulate the fuzzy clustering problem by inclu-ding reject options to decrease these drawbacks. We propose a model and companion algorithm to optimize it, which we will call fuzzy c+2-means (Fc+2M) because it requires two additional clusters. This paper is organized as follows. In Section 2, to avoid the memberships to be spread across the classes and to allow to distinguish between “equally likely” and “unknown”, we define partial ambiguity rejections which introduce a discounting process between the classical FcM membership functions. We modify the objective function used in the FcM algorithm and we derive the membership and prototype update equations from the conditions for minimization of our criterion function. In Section 3, to improve the performance of our algorithm in the presence of noise, we define an amorphous noise cluster. This class allows us to reject an individual xk when it is very far from the centers of each class. So, our membership functions are more immune to noise and the membership functions correspond more closely to the notion of compatibility. The proof of this new theorem is presented.
We have chosen to propose an extension of the fuzzy c-means, (FcM), because
- •
the crisp k-means algorithm provides an iterative clustering of the search space and does not require any initial knowledge about the structure in the data set;
- •
the use of the fuzzy sets allows to manage uncertainty on measures, lack of information,… all characteristics which bring ambiguity notions;
- •
most fuzzy clustering algorithms are derived from the FcM algorithm, which minimizes the sum of squared distances from the prototypes weighted by the corresponding memberships [8], [10], [11], [12]. These algorithms have been used very successfully in many applications in which the final goal is to make a crisp decision such as pattern classification. Moreover, we may interpret memberships as the probabilities or degrees of sharing.
The advantages of our method are:
- 1.
Contrary to Dubuisson, these rejects are introduced in the clustering or learning stage and not in the decision processing [13], [14].
- 2.
The membership degree to the ambiguity reject class of a pattern xk is explicit and, above all, this value is learned in the iterative clustering problem.
- 3.
No characteristic anymore is necessary to compute the reject and ambiguity degrees.
- 4.
The location of cluster centers may be modified (according to FcM) because the ambiguous and reject patterns are less taken into account.
Section 4 illustrates our approach on various examples in order to show the interest of clustering conditioned by reject measures. We first present two examples to provide insights to our approach. We then present in the third example, the results obtained with the FcM [15] (algorithm without reject option), FPcM [16] (algorithm with typicality), F(c+1)M [17], [18], [19] (algorithm with solely distance rejection), and F(c+2)M on the diamonds data set. Another example shows the behavior of F(c+2)M with a 2-D dataset concerning three classes. An example more realistic deals with a twofold comparison on a classical real data set. On the one hand, we compare the results obtained by different unsupervised clustering algorithm with reject option or not (FcM, FPcM, F(c+1)M, F(c+2)M). On the other hand, we compare the behavior of membership functions according to the both parameters, ρα and ρ, which makes it possible to control the power of distance and ambiguity rejections of our algorithm.
Section snippets
Clustering with ambiguity rejection
To reduce an excessive error rate due to noise and other uncertain factors inherent in any real-world system, clustering with an ambiguity rejection is a solution. In most papers, proposed rules in order to reject a pattern lying near the classes boundaries are based on threshold values in the decision processing and not on the clustering or learning stage.
In order to specify this decision making, it is common to characterize a pattern xk with an ambiguity concept. The ambiguity rejection has
Ambiguity and distance rejections simultaneously
We propose a parallel approach for the management of both types of rejects, qualified as distance and ambiguity rejection. They are considered at the same step and are explicitly considered as two additional classes.
In fuzzy pattern recognition, an individual xk located too far from a class has got a weak membership value according to this class. In order to reduce the misclassification risk, reject option is used to avoid to classify this individual. In most papers, rules proposed in order to
1-D study
We first present a simple example to provide insight into the ambiguity reject approach. Here we discuss the shape of membership functions in the 1-D case.
Fig. 3 presents two normalized gaussian classes with a global reject rate of 10% and α=1. μΘ and μ12, the membership functions are as per the theoretic study specified.
We notice a particular point which is totally ambiguous, characterized by the equality: μ12k=μ1k=μ2k. This can be explained as follows: from the equation
Conclusion
We started this paper by justifying the interest to find a fuzzy partition and to introduce two type of reject classes in the clustering process: the distance rejection dealing with patterns that are far away from all the classes and the ambiguity rejection dealing with patterns lying near the boundaries of classes. The method that we propose is formulated as a constrained minimization problem, whose solution depends on a fuzzy objective function in which reject options are introduced.
To avoid
About the Author—MICHEL MÉNARD is currently an assistant professor at the University of La Rochelle, France. He holds a Ph.D. degree from the University of Poitiers, France (1993). His research interests are fuzzy pattern recognition, fuzzy sets, data fusion with particular applications to medical imaging.
References (29)
A new approach to clustering
Inform. Control
(1969)Generalized fuzzy c-shells clustering and detection of circular and elliptical boundaries
Pattern recognition
(1992)- et al.
A statistical decision rule with incomplete knowledge about classes
Pattern Recognition
(1993) - et al.
Fuzzy mathematical morphologiesa comparative study
Pattern Recognition
(1995) Characterization and detection of noise in clustering
Pattern recognition Lett.
(1991)Some methods of classification and analysis of multivariate observations, Proceedings of the 5th Berkeley Symposium on Math. Stat. and Prob. U
(1967)Cluster Analysis for Applications
(1973)- et al.
Fuzzy sets and decision making approaches in vowel and speaker recognition
IEEE. Trans. Systems Man Cybernet.
(1977) Fuzzy tools in the management of uncertainty in pattern recognition, image analysis, vision and expert systems
Int. J. Systems Sci.
(1991)Pattern Recognition with Fuzzy Objective Function Algorithms
(1987)
Convergence theory for fuzzy c-meansCounter examples and repairs
IEEE Trans. Systems Man Cybernet.
A possibilistic approach to clustering
IEEE Trans. Fuzzy Systems
Cited by (0)
About the Author—MICHEL MÉNARD is currently an assistant professor at the University of La Rochelle, France. He holds a Ph.D. degree from the University of Poitiers, France (1993). His research interests are fuzzy pattern recognition, fuzzy sets, data fusion with particular applications to medical imaging.
About the Author—CHRISTOPHE DEMKO is currently an assistant professor at the University of La Rochelle, France. He received an engineer grade and holds a Ph.D. degree from the University of Technology of Compigne, France (1992). His research interests are fuzzy logic, multi-agent systems and pattern recognition.
About the Author—PIERRE LOONIS, born in 1968, is currently an assistant professor at the University of La Rochelle, where he received his Ph.D. degree in Pattern Recognition (1996). His main scientific interest include fuzzy pattern recognition, aggregation of multiple classification decisions, neural networks, genetic algorithms and real-world applications.