Finding Irregularly Shaped Clusters Based on Entropy

Kuri-Morales, Angel; Aldana-Bobadilla, Edwin

doi:10.1007/978-3-642-14400-4_5

Angel Kuri-Morales²⁰ &
Edwin Aldana-Bobadilla²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6171))

Included in the following conference series:

Industrial Conference on Data Mining

2452 Accesses

Abstract

In data clustering the more traditional algorithms are based on similarity criteria which depend on a metric distance. This fact imposes important constraints on the shape of the clusters found. These shapes generally are hyperspherical in the metric’s space due to the fact that each element in a cluster lies within a radial distance relative to a given center. In this paper we propose a clustering algorithm that does not depend on simple distance metrics and, therefore, allows us to find clusters with arbitrary shapes in n-dimensional space. Our proposal is based on some concepts stemming from Shannon’s information theory and evolutionary computation. Here each cluster consists of a subset of the data where entropy is minimized. This is a highly non-linear and usually non-convex optimization problem which disallows the use of traditional optimization techniques. To solve it we apply a rugged genetic algorithm (the so-called Vasconcelos’ GA). In order to test the efficiency of our proposal we artificially created several sets of data with known properties in a tridimensional space. The result of applying our algorithm has shown that it is able to find highly irregular clusters that traditional algorithms cannot. Some previous work is based on algorithms relying on similar approaches (such as ENCLUS’ and CLIQUE’s). The differences between such approaches and ours are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cha, S.H.: Taxonomy of Nominal Type Histogram Distance Measures, Massachusetts (2008)
Google Scholar
Mahalanobis, P.C.: On the genaralized distance in statistics (1936)
Google Scholar
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by probability distributions, Calcutta (1943)
Google Scholar
Pollard, D.E.: A user’s guide to measure theoretic probability. Cambridge University Press, Cambridge (2002)
MATH Google Scholar
Yang, G.L., Le Cam, L.M.: Asymptotics in Statistics: Some Basic Concepts. Springer, Berlin (2000)
MATH Google Scholar
Li, X., Wai, M., Kwong Li, C.: Determining the Optimal Number of Clusters by an Extended RPCL Algorithm. Hong Kong Polytechnic University, Hong Kong (1999)
Google Scholar
MacQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkley Sysmposium on Mathematical Statiscs and Probability, Berkley, pp. 281–297 (1967)
Google Scholar
Ng, R., Han, J.: Effecient and Effective Clustering Methods for Spatial Data Mining, Santiago de Chile (1994)
Google Scholar
Zhang, T., Ramakrishnman, R., Linvy, M.: BIRCH: An Efficient Method for Very Large Databases, Montreal, Canada (1996)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: An efificient Clustering Algorithm for Large Databases (1998)
Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Portland, pp. 226–223 (1996)
Google Scholar
Hinneburg, A., Keim, D.: An Efficient Approach to Clustering in Large Multimedia Databases with noise (2000)
Google Scholar
Wang, W., Yang, J., Muntz, R.: STING: A Statistical Information Grid Approach to Spatial Data. In: Proceedings of the 23rd VLDB Conference, Athens (1997)
Google Scholar
Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A multi-resolution clustering. In: Proceedings of the 24th VLDB conference (1998)
Google Scholar
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters, pp. 32–57 (1973)
Google Scholar
Kohonen, T.: Self-Organizing Maps. Series in Information Sciences (1995)
Google Scholar
Halkidi, M., Batistakis, Y., Vzirgiannis, M.: On Clustering Validation Techniques, pp. 107-145 (2001)
Google Scholar
Cheng, C., Fu, A.W., Zhang, Y.: Entropy- based Subspace Clustering for Mining Numerical Data (1998)
Google Scholar
Barbará, D., Julia, C., Li, Y.: COOLCAT: An entropy-based algorithm for categorical clustering, George Mason University (2001)
Google Scholar
Shannon, C.E.: A mathematical theory of communication, pp. 379–423 (1948)
Google Scholar
Kolmogorov, A.N.: Three approaches to the quantitative definition of information, pp. 1–7 (1948)
Google Scholar
Gray, R.M.: Entropy and Information Theory. Springer, Heidelberg (2008)
Google Scholar
Bäck, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, Oxford (1996)
MATH Google Scholar
Rudolph, G.: Convergence Analysis of Canonical Genetic Algorithms. IEEE Transactions on Neural Networks (1994)
Google Scholar
Forrest, S., Mitchell, M.: What makes a problem hard for a genetic algorithm? Machine Learning (1993)
Google Scholar
Kuri, A.: A Methodology for the Statistical Characterization of Genetic Algorithms, pp. 79–88. Springer, págs (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computation, Autonomous Technological Institute of Mexico, Rio Hondo No. 1, Mexico City, Mexico
Angel Kuri-Morales
Institute of Research in Applied Mathematics and Systems, Autonomous University of Mexico, University City, Mexico City, Mexico
Edwin Aldana-Bobadilla

Authors

Angel Kuri-Morales
View author publications
You can also search for this author in PubMed Google Scholar
Edwin Aldana-Bobadilla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuri-Morales, A., Aldana-Bobadilla, E. (2010). Finding Irregularly Shaped Clusters Based on Entropy. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science(), vol 6171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14400-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-14400-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14399-1
Online ISBN: 978-3-642-14400-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics