Abstract
Several information theory based measures have been used in machine learning. Using the definition of the Kullback-Leibler entropy, this paper presents a new measure for clustering objects — the attribute redundancy measure. First, an introduction to clustering is made, with its interpretation from the machine learning point of view and a classification of clustering techniques pointed out. Then, a description of the use of information theory based measures in machine learning, both in supervised and in unsupervised learning is made, including the application of the mutual information. Next, the new measure is presented, highlighting its ability to capture relations between attributes and outlining its closeness to other concepts of information theory. Finally, and a genetic algorithm as the search procedure to find the best clustering, a comparison between the attribute redundancy measure and the mutual information is made.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P. Clark. Functional specification of CN and AQ. Technical Report IT/P21154/PC/1.2, The Turing Institute, 1989.
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning Journal, 3:261–283, 1989.
James E. Corter and Mark A. Gluck. Explaining basic categories: Features predictability and information. Psychology Bulletin, 111(2):291–303, 1992.
Thomas M. Cover and Joy A. Thomas. Elements f Information Theory. Wiley Series in Telecomunication. John Wiley and Sons, Inc, New York, 1991.
Gustavo Deco and Dragan Obradovic. An Information-Theoretic Approach to Neural Computing. Springer-Verlag, New York, 1996.
Douglas Fisher and Pat Langley. Conceptual Clustering and its Relation to Numerical Taxonomy, pages 77–116. Addison-Wesley Publishing Company, 1986.
Douglas H. Fisher. Knowledge Acquisition Via Incremental Conceptual Clustering. PhD thesis, 1987.
Mark A. Gluck and James E. Corter. Information, uncertainty, and the utility of categories. In The Seventh Annual Conference of Cognitive Science Society, pages 283–288, Hillsdade, NJ, 1985.
J. A. Hartigan. Clustering Algorithms. John Wiley and Sons, 1975.
Richard A. Olshen Leo Breiman, Jerome H. Friedman and Charles J. Stone. Classification and Regression Trees. Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove, 1984.
Ramon L. Mántaras. A distance-based attribute selection measure for decision tree induction. Machine Learning Journal, 6:81–92, 1991.
C. J. Merz and P. M. Murphy. Uci repository of machine learning databases, 1996.
R. S. Michalski. Knowledge acquisition through conceptual clustering: A theoretical framework and an algorithm for partitioning data into conjunctive concepts. International Journal of Policy and Information Systems, 4(3):219–244, 1980.
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, California, 1993.
Colin R. Reeves. Modern Heuristic Techniques for Combinatorial Problems. MacGraw-Hill, London, UK, 1995.
P. H. Sneath and R. R. Sokal. Numerical Taxonomy: The Principles and Practice of Numerical Classification. W. H. Freeman and Company, San Francisco, 1973.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gonçalves, T., Moura-Pires, F. (1998). An attribute redundancy measure for clustering. In: Mercer, R.E., Neufeld, E. (eds) Advances in Artificial Intelligence. Canadian AI 1998. Lecture Notes in Computer Science, vol 1418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64575-6_57
Download citation
DOI: https://doi.org/10.1007/3-540-64575-6_57
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64575-7
Online ISBN: 978-3-540-69349-9
eBook Packages: Springer Book Archive