Abstract
When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB’94), pages 487–499, Santiago, Chile, September 1994.
A.B. Atkinson. On the measurement of inequality. Journal of Economic Theory, 2:244–263, 1970.
R.J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD’99), pages 145–154, San Diego, California, August 1999.
I. Bournaud and J.-G. Ganascia. Accounting for domain knowledge in the construction of a generalization space. In Proceedings of the Third International Conference on Conceptual Structures, pages 446–459. Springer-Verlag, August 1997.
C.L. Carter and H.J. Hamilton. Efficient attribute-oriented algorithms for knowledge discovery from large databases. IEEE Transactions on Knowledge and Data Engineering, 10(2):193–208, March/April 1998.
H. Dalton. The measurement of the inequality of incomes. Economic Journal, 30:348–361, 1920.
G. Dong and J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In X. Wu, R. Kotagiri, and K. Korb, editors, Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98), pages 72–86, Melbourne, Australia, April 1998.
A.A. Freitas. On objective measures of rule surprisingness. In J. Zytkow and M. Quafafou, editors, Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’98), pages 1–9, Nantes, France, September 1998.
R. Godin, R. Missaoui, and H. Alaoui. Incremental concept formation algorithms based on galois (concept) lattices. Computational Intelligence, 11(2):246–267, 1995.
R.J. Hilderman and H.J. Hamilton. Heuristic measures of interestingness. In J. Zytkow and J. Rauch, editors, Proceedings of the Third European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’99), pages 232–241, Prague, Czech Republic, September 1999.
R.J. Hilderman and H.J. Hamilton. Heuristics for ranking the interestingness of discovered knowledge. In N. Zhong and L. Zhou, editors, Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’99), pages 204–209, Beijing, China, April 1999.
R.J. Hilderman and H.J. Hamilton. Applying objective interestingness measures in data mining systems. In Proceedings of the 4th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’00), pages 432–439, Lyon, France, September 2000.
R.J. Hilderman and H.J. Hamilton. Principles for mining summaries: Theorems and proofs. Technical Report CS 00-01, Department of Computer Science, University of Regina, February 2000. Online at http://www.cs.uregina.ca/research/Techreport/0001.ps.
R.J. Hilderman and H.J. Hamilton. Principles for mining summaries using objective measures of interestingness. In Proceedings of the Twelfth IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pages 72–81, Vancouver, Canada, November 2000.
R.J. Hilderman, H.J. Hamilton, and N. Cercone. Data mining in large databases using domain generalization graphs. Journal of Intelligent Information Systems, 13(3):195–234, November 1999.
S. Lieberson. An extension of Greenberg’s linguistic diversity measures. Language, 40:526–531, 1964.
A.E. Magurran. Ecological diversity and its measurement. Princeton University Press, 1988.
B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 94–100, New York, New York, August 1998.
G.P. Patil and C. Taillie. Diversity as a concept and its measurement. Journal of the American Statistical Association, 77(379):548–567, 1982.
S. Sahar. Interestingness via what is not interesting. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD’99), pages 332–336, San Diego, California, August 1999.
C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, 1949.
G. Stumme, R. Wille, and U. Wille. Conceptual knowledge discovery in databases using formal concept analysis methods. In J. Zytkow and M. Quafafou, editors, Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’98), pages 450–458, Nantes, France, September 1998.
M.L. Weitzman. On diversity. The Quarterly Journal of Economics, pages 363–405, May 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hilderman, R.J., Hamilton, H.J. (2001). Evaluation of Interestingness Measures for Ranking Discovered Knowledge. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_28
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive