Skip to main content

An attribute redundancy measure for clustering

  • Posters
  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1418))

Abstract

Several information theory based measures have been used in machine learning. Using the definition of the Kullback-Leibler entropy, this paper presents a new measure for clustering objects — the attribute redundancy measure. First, an introduction to clustering is made, with its interpretation from the machine learning point of view and a classification of clustering techniques pointed out. Then, a description of the use of information theory based measures in machine learning, both in supervised and in unsupervised learning is made, including the application of the mutual information. Next, the new measure is presented, highlighting its ability to capture relations between attributes and outlining its closeness to other concepts of information theory. Finally, and a genetic algorithm as the search procedure to find the best clustering, a comparison between the attribute redundancy measure and the mutual information is made.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. P. Clark. Functional specification of CN and AQ. Technical Report IT/P21154/PC/1.2, The Turing Institute, 1989.

    Google Scholar 

  2. P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning Journal, 3:261–283, 1989.

    Google Scholar 

  3. James E. Corter and Mark A. Gluck. Explaining basic categories: Features predictability and information. Psychology Bulletin, 111(2):291–303, 1992.

    Article  Google Scholar 

  4. Thomas M. Cover and Joy A. Thomas. Elements f Information Theory. Wiley Series in Telecomunication. John Wiley and Sons, Inc, New York, 1991.

    Google Scholar 

  5. Gustavo Deco and Dragan Obradovic. An Information-Theoretic Approach to Neural Computing. Springer-Verlag, New York, 1996.

    Google Scholar 

  6. Douglas Fisher and Pat Langley. Conceptual Clustering and its Relation to Numerical Taxonomy, pages 77–116. Addison-Wesley Publishing Company, 1986.

    Google Scholar 

  7. Douglas H. Fisher. Knowledge Acquisition Via Incremental Conceptual Clustering. PhD thesis, 1987.

    Google Scholar 

  8. Mark A. Gluck and James E. Corter. Information, uncertainty, and the utility of categories. In The Seventh Annual Conference of Cognitive Science Society, pages 283–288, Hillsdade, NJ, 1985.

    Google Scholar 

  9. J. A. Hartigan. Clustering Algorithms. John Wiley and Sons, 1975.

    Google Scholar 

  10. Richard A. Olshen Leo Breiman, Jerome H. Friedman and Charles J. Stone. Classification and Regression Trees. Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove, 1984.

    Google Scholar 

  11. Ramon L. Mántaras. A distance-based attribute selection measure for decision tree induction. Machine Learning Journal, 6:81–92, 1991.

    Google Scholar 

  12. C. J. Merz and P. M. Murphy. Uci repository of machine learning databases, 1996.

    Google Scholar 

  13. R. S. Michalski. Knowledge acquisition through conceptual clustering: A theoretical framework and an algorithm for partitioning data into conjunctive concepts. International Journal of Policy and Information Systems, 4(3):219–244, 1980.

    Google Scholar 

  14. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.

    Google Scholar 

  15. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, California, 1993.

    Google Scholar 

  16. Colin R. Reeves. Modern Heuristic Techniques for Combinatorial Problems. MacGraw-Hill, London, UK, 1995.

    Google Scholar 

  17. P. H. Sneath and R. R. Sokal. Numerical Taxonomy: The Principles and Practice of Numerical Classification. W. H. Freeman and Company, San Francisco, 1973.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Robert E. Mercer Eric Neufeld

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gonçalves, T., Moura-Pires, F. (1998). An attribute redundancy measure for clustering. In: Mercer, R.E., Neufeld, E. (eds) Advances in Artificial Intelligence. Canadian AI 1998. Lecture Notes in Computer Science, vol 1418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64575-6_57

Download citation

  • DOI: https://doi.org/10.1007/3-540-64575-6_57

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64575-7

  • Online ISBN: 978-3-540-69349-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics