Heuristics for Ranking the Interestingness of Discovered Knowledge

Hilderman, Robert J.; Hamilton, Howard J.

doi:10.1007/3-540-48912-6_28

Robert J. Hilderman³ &
Howard J. Hamilton³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1574))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1019 Accesses
6 Citations

Abstract

We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the Simpson index, and the Shannon index. Using each of the proposed measures, we assign a single real value to a summary that describes its interestingness. Our experimental results show that the ranks assigned by the four interestingness measures are highly correlated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C.L. Carter and H.J. Hamilton. Efficient attribute-oriented algorithms for knowledge discovery from large databases. IEEE Transactions on Knowledge and Data Engineering, 10(2):193–208, March/April 1998.
Article Google Scholar
C.L. Carter, H.J. Hamilton, and N. Cercone. Share-based measures for itemsets. In J. Komorowski and J. Zytkow, editors, Proceedings of the First European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’97), pages 14–24, Trondheim, Norway, June 1997.
Google Scholar
U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Adavances in Knowledge Discovery and Data Mining, pages 1–34. AAAI/MIT Press, 1996.
Google Scholar
W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus. Knowledge discovery in databases: An overview. In Knowledge Discovery in Databases, pages 1–27. AAAI/MIT Press, 1991.
Google Scholar
H.J. Hamilton, R.J. Hilderman, L. Li, and D.J. Randall. Generalization lattices. In J. Zytkow and M. Quafafou, editors, Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’98), pages 328–336, Nantes, France, September 1998.
Google Scholar
R.J. Hilderman, C.L. Carter, H.J. Hamilton, and N. Cercone. Mining market basket data using share measures and characterized itemsets. In X. Wu, R. Kotagiri, and K. Korb, editors, Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98), pages 159–173, Melbourne, Australia, April 1998.
Google Scholar
R.J. Hilderman, H.J. Hamilton, and B. Barber. Ranking the interestingness of summaries from data mining systems. In Proceedings of the 12th Annual Florida Artificial Intelligence Research Symposium (FLAIRS’99). To appear.
Google Scholar
R.J. Hilderman, H.J. Hamilton, R.J. Kowalchuk, and N. Cercone. Parallel knowledge discovery using domain generalization graphs. In J. Komorowski and J. Zytkow, editors, Proceedings of the First European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’97), pages 25–35, Trondheim, Norway, June 1997.
Google Scholar
H. Liu, H. Lu, and J. Yao. Identifying relevant databases for multidatabase mining. In X. Wu, R. Kotagiri, and K. Korb, editors, Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98), pages 210–221, Melbourne, Australia, April 1998.
Google Scholar
D.J. Randall, H.J. Hamilton, and R.J. Hilderman. Temporal generalization with domain generalization graphs. International Journal of Pattern Recognition and Artificial Intelligence. To appear.
Google Scholar
W.A. Rosenkrantz. Introduction to Probability and Statistics for Scientists and Engineers. McGraw-Hill, 1997.
Google Scholar
C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, 1949.
Google Scholar
E.H. Simpson. Measurement of diversity. Nature, 163:688, 1949.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Robert J. Hilderman & Howard J. Hamilton

Authors

Robert J. Hilderman
View author publications
You can also search for this author in PubMed Google Scholar
Howard J. Hamilton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Systems Engineering, Yamaguchi University, Tokiwa-Dai, 2557, Ube, 755, Japan
Ning Zhong
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Lizhu Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hilderman, R.J., Hamilton, H.J. (1999). Heuristics for Ranking the Interestingness of Discovered Knowledge. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_28

Download citation

DOI: https://doi.org/10.1007/3-540-48912-6_28
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics