Evaluation of Interestingness Measures for Ranking Discovered Knowledge

Hilderman, Robert J.; Hamilton, Howard J.

doi:10.1007/3-540-45357-1_28

Robert J. Hilderman⁴ &
Howard J. Hamilton⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1412 Accesses
34 Citations

Abstract

When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents

A generic framework for efficient computation of top-k diverse results

Article 28 November 2022

Model-Based Diversification for Sequential Exploratory Queries

Article Open access 27 March 2017

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB’94), pages 487–499, Santiago, Chile, September 1994.
Google Scholar
A.B. Atkinson. On the measurement of inequality. Journal of Economic Theory, 2:244–263, 1970.
Article MathSciNet Google Scholar
R.J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD’99), pages 145–154, San Diego, California, August 1999.
Google Scholar
I. Bournaud and J.-G. Ganascia. Accounting for domain knowledge in the construction of a generalization space. In Proceedings of the Third International Conference on Conceptual Structures, pages 446–459. Springer-Verlag, August 1997.
Google Scholar
C.L. Carter and H.J. Hamilton. Efficient attribute-oriented algorithms for knowledge discovery from large databases. IEEE Transactions on Knowledge and Data Engineering, 10(2):193–208, March/April 1998.
Article Google Scholar
H. Dalton. The measurement of the inequality of incomes. Economic Journal, 30:348–361, 1920.
Article Google Scholar
G. Dong and J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In X. Wu, R. Kotagiri, and K. Korb, editors, Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98), pages 72–86, Melbourne, Australia, April 1998.
Google Scholar
A.A. Freitas. On objective measures of rule surprisingness. In J. Zytkow and M. Quafafou, editors, Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’98), pages 1–9, Nantes, France, September 1998.
Google Scholar
R. Godin, R. Missaoui, and H. Alaoui. Incremental concept formation algorithms based on galois (concept) lattices. Computational Intelligence, 11(2):246–267, 1995.
Article Google Scholar
R.J. Hilderman and H.J. Hamilton. Heuristic measures of interestingness. In J. Zytkow and J. Rauch, editors, Proceedings of the Third European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’99), pages 232–241, Prague, Czech Republic, September 1999.
Google Scholar
R.J. Hilderman and H.J. Hamilton. Heuristics for ranking the interestingness of discovered knowledge. In N. Zhong and L. Zhou, editors, Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’99), pages 204–209, Beijing, China, April 1999.
Google Scholar
R.J. Hilderman and H.J. Hamilton. Applying objective interestingness measures in data mining systems. In Proceedings of the 4th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’00), pages 432–439, Lyon, France, September 2000.
Google Scholar
R.J. Hilderman and H.J. Hamilton. Principles for mining summaries: Theorems and proofs. Technical Report CS 00-01, Department of Computer Science, University of Regina, February 2000. Online at http://www.cs.uregina.ca/research/Techreport/0001.ps.
R.J. Hilderman and H.J. Hamilton. Principles for mining summaries using objective measures of interestingness. In Proceedings of the Twelfth IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pages 72–81, Vancouver, Canada, November 2000.
Google Scholar
R.J. Hilderman, H.J. Hamilton, and N. Cercone. Data mining in large databases using domain generalization graphs. Journal of Intelligent Information Systems, 13(3):195–234, November 1999.
Article Google Scholar
S. Lieberson. An extension of Greenberg’s linguistic diversity measures. Language, 40:526–531, 1964.
Article Google Scholar
A.E. Magurran. Ecological diversity and its measurement. Princeton University Press, 1988.
Google Scholar
B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 94–100, New York, New York, August 1998.
Google Scholar
G.P. Patil and C. Taillie. Diversity as a concept and its measurement. Journal of the American Statistical Association, 77(379):548–567, 1982.
Article MATH MathSciNet Google Scholar
S. Sahar. Interestingness via what is not interesting. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD’99), pages 332–336, San Diego, California, August 1999.
Google Scholar
C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, 1949.
Google Scholar
G. Stumme, R. Wille, and U. Wille. Conceptual knowledge discovery in databases using formal concept analysis methods. In J. Zytkow and M. Quafafou, editors, Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’98), pages 450–458, Nantes, France, September 1998.
Google Scholar
M.L. Weitzman. On diversity. The Quarterly Journal of Economics, pages 363–405, May 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Saskatchewan Population Health and Evaluation Research Unit, Canada
Robert J. Hilderman
Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Howard J. Hamilton

Authors

Robert J. Hilderman
View author publications
You can also search for this author in PubMed Google Scholar
Howard J. Hamilton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hilderman, R.J., Hamilton, H.J. (2001). Evaluation of Interestingness Measures for Ranking Discovered Knowledge. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_28

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_28
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics