Abstract
We introduce the Partition Negentropy Criterion (PNC) for cluster validation. It is a cluster validity index that rewards the average normality of the clusters, measured by means of the negentropy, and penalizes the overlap, measured by the partition entropy. The PNC is aimed at finding well separated clusters whose shape is approximately Gaussian. We use the new index to validate fuzzy partitions in a set of synthetic clustering problems, and compare the results to those obtained by the AIC, BIC and ICL criteria. The partitions are obtained by fitting a Gaussian Mixture Model to the data using the EM algorithm. We show that, when the real clusters are normally distributed, all the criteria are able to correctly assess the number of components, with AIC and BIC allowing a higher cluster overlap. However, when the real cluster distributions are not Gaussian (i.e. the distribution assumed by the mixture model) the PNC outperforms the other indices, being able to correctly evaluate the number of clusters while the other criteria (specially AIC and BIC) tend to overestimate it.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Hodder Arnold, London (2001)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statistical Soc. B 39, 1–38 (1977)
Gordon, A.D.: Cluster Validation. In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.H., Baba, Y. (eds.) Data Science, Classification and Related Methods, pp. 22–39. Springer, New York (1998)
Bezdek, J.C., Pal, R.N.: Some New Indexes of Cluster Validity. IEEE Trans. Systems, Man and Cybernetics B 28(3), 301–315 (1998)
Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity Index for Crisp and Fuzzy Clusters. Pattern Recognition 37(3), 487–501 (2004)
Bouguessa, M., Wang, S., Sun, H.: An Objective Approach to Cluster Validation. Pattern Recognition Letters 27(13), 1419–1430 (2006)
Bozdogan, H.: Choosing the Number of Component Clusters in the Mixture-Model Using a New Information Complexity Criterion of the Inverse-Fisher Information Matrix. In: Opitz, O., Lausen, B., Klar, R. (eds.) Data Analysis and Knowledge Organization, pp. 40–54. Springer, Heidelberg (1993)
Biernacki, C., Celeux, G., Govaert, G.: An Improvement of the NEC Criterion for Assessing the Number of Clusters in a Mixture Model. Pattern Recognition Letters 20(3), 267–272 (1999)
Geva, A.B., Steinberg, Y., Bruckmair, S., Nahum, G.: A Comparison of Cluster Validity Criteria for a Mixture of Normal Distributed Data. Pattern Recognition Letters 21(6-7), 511–529 (2000)
Pal, N.R., Biswas, J.: Cluster Validation Using Graph Theoretic Concepts. Pattern Recognition 30(6), 847–857 (1997)
Hathaway, R.J., Bezdek, J.C.: Visual Cluster Validity for Prototype Generator Clustering Models. Pattern Recognition Letters 24(9-10), 1563–1569 (2003)
Ding, Y., Harrison, R.F.: Relational Visual Cluster Validity (RVCV). Pattern Recognition Letters 28(15), 2071–2079 (2007)
Richardson, S., Green, P.: On Bayesian Analysis of Mixtures with Unknown Number of Components. J. Royal Statistical Soc. 59, 731–792 (1997)
Rasmussen, C.: The Infinite Gaussian Mixture Model. In: Solla, S., Leen, T., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 554–560. MIT Press, Cambridge (2000)
Neal, R.M.: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. J. Computational and Graphical Statistics 9(2), 249–265 (2000)
Figueiredo, M.A.T., Jain, A.K.: Unsupervised Learning of Finite Mixture Models. IEEE Trans. Pattern Analysis and Machine Intelligence 24(3), 381–396 (2002)
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Automatic Control 19, 716–723 (1974)
Schwartz, G.: Estimating the Dimension of a Model. Annals of Statistics 6, 461–464 (1978)
Fraley, C., Raftery, A.: How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. Technical Report 329, Dept. Statistics, Univ. Washington, Seattle, WA (1998)
Bezdek, J.C., Li, W.Q., Attikiouzel, Y., Windham, M.: A Geometric Approach to Cluster Validity for Normal Mixtures. Soft Computing 1, 166–179 (1997)
Biernacki, C., Celeux, G., Govaert, G.: Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood. IEEE Trans. Pattern Analysis Machine Intelligence 22(7), 719–725 (2000)
Samé, A., Ambroise, C., Govaert, G.: An Online Classification EM Algorithm Based on the Mixture Model. Stat. Comput. 17, 209–218 (2007)
Comon, P.: Independent Component Analysis, a New Concept? Signal Processing 36(3), 287–314 (1994)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, New York (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lago-Fernández, L.F., Sánchez-Montañés, M., Corbacho, F. (2009). Fuzzy Cluster Validation Using the Partition Negentropy Criterion. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds) Artificial Neural Networks – ICANN 2009. ICANN 2009. Lecture Notes in Computer Science, vol 5769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04277-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-04277-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04276-8
Online ISBN: 978-3-642-04277-5
eBook Packages: Computer ScienceComputer Science (R0)