An attribute redundancy measure for clustering

Gonçalves, Teresa; Moura-Pires, Fernando

doi:10.1007/3-540-64575-6_57

Teresa Gonçalves¹ &
Fernando Moura-Pires¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1418))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

194 Accesses
1 Citations

Abstract

Several information theory based measures have been used in machine learning. Using the definition of the Kullback-Leibler entropy, this paper presents a new measure for clustering objects — the attribute redundancy measure. First, an introduction to clustering is made, with its interpretation from the machine learning point of view and a classification of clustering techniques pointed out. Then, a description of the use of information theory based measures in machine learning, both in supervised and in unsupervised learning is made, including the application of the mutual information. Next, the new measure is presented, highlighting its ability to capture relations between attributes and outlining its closeness to other concepts of information theory. Finally, and a genetic algorithm as the search procedure to find the best clustering, a comparison between the attribute redundancy measure and the mutual information is made.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Optimizing the C Index Using a Canonical Genetic Algorithm

Combinatorial Optimization Approaches for Data Clustering

A Comprehensive Review on Unsupervised Feature Selection Algorithms

References

P. Clark. Functional specification of CN and AQ. Technical Report IT/P21154/PC/1.2, The Turing Institute, 1989.
Google Scholar
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning Journal, 3:261–283, 1989.
Google Scholar
James E. Corter and Mark A. Gluck. Explaining basic categories: Features predictability and information. Psychology Bulletin, 111(2):291–303, 1992.
Article Google Scholar
Thomas M. Cover and Joy A. Thomas. Elements f Information Theory. Wiley Series in Telecomunication. John Wiley and Sons, Inc, New York, 1991.
Google Scholar
Gustavo Deco and Dragan Obradovic. An Information-Theoretic Approach to Neural Computing. Springer-Verlag, New York, 1996.
Google Scholar
Douglas Fisher and Pat Langley. Conceptual Clustering and its Relation to Numerical Taxonomy, pages 77–116. Addison-Wesley Publishing Company, 1986.
Google Scholar
Douglas H. Fisher. Knowledge Acquisition Via Incremental Conceptual Clustering. PhD thesis, 1987.
Google Scholar
Mark A. Gluck and James E. Corter. Information, uncertainty, and the utility of categories. In The Seventh Annual Conference of Cognitive Science Society, pages 283–288, Hillsdade, NJ, 1985.
Google Scholar
J. A. Hartigan. Clustering Algorithms. John Wiley and Sons, 1975.
Google Scholar
Richard A. Olshen Leo Breiman, Jerome H. Friedman and Charles J. Stone. Classification and Regression Trees. Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove, 1984.
Google Scholar
Ramon L. Mántaras. A distance-based attribute selection measure for decision tree induction. Machine Learning Journal, 6:81–92, 1991.
Google Scholar
C. J. Merz and P. M. Murphy. Uci repository of machine learning databases, 1996.
Google Scholar
R. S. Michalski. Knowledge acquisition through conceptual clustering: A theoretical framework and an algorithm for partitioning data into conjunctive concepts. International Journal of Policy and Information Systems, 4(3):219–244, 1980.
Google Scholar
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, California, 1993.
Google Scholar
Colin R. Reeves. Modern Heuristic Techniques for Combinatorial Problems. MacGraw-Hill, London, UK, 1995.
Google Scholar
P. H. Sneath and R. R. Sokal. Numerical Taxonomy: The Principles and Practice of Numerical Classification. W. H. Freeman and Company, San Francisco, 1973.
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Quinta de Torre, 2825, Monte da Caparica, Portugal
Teresa Gonçalves & Fernando Moura-Pires

Authors

Teresa Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Moura-Pires
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Robert E. Mercer Eric Neufeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonçalves, T., Moura-Pires, F. (1998). An attribute redundancy measure for clustering. In: Mercer, R.E., Neufeld, E. (eds) Advances in Artificial Intelligence. Canadian AI 1998. Lecture Notes in Computer Science, vol 1418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64575-6_57

Download citation

DOI: https://doi.org/10.1007/3-540-64575-6_57
Published: 29 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64575-7
Online ISBN: 978-3-540-69349-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

An attribute redundancy measure for clustering

Abstract

Access this chapter

Preview

Similar content being viewed by others

Optimizing the C Index Using a Canonical Genetic Algorithm

Combinatorial Optimization Approaches for Data Clustering

A Comprehensive Review on Unsupervised Feature Selection Algorithms

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An attribute redundancy measure for clustering

Abstract

Access this chapter

Preview

Similar content being viewed by others

Optimizing the C Index Using a Canonical Genetic Algorithm

Combinatorial Optimization Approaches for Data Clustering

A Comprehensive Review on Unsupervised Feature Selection Algorithms

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation