Levelwise Search and Borders of Theories in Knowledge Discovery

Mannila, Heikki; Toivonen, Hannu

doi:10.1023/A:1009796218281

Levelwise Search and Borders of Theories in Knowledge Discovery

Published: September 1997

Volume 1, pages 241–258, (1997)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Heikki Mannila^1,1 &
Hannu Toivonen^1,1

775 Accesses
635 Citations
Explore all metrics

Abstract

One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S \( \subseteq \) L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '93).Washington, D.C., pp. 207–216.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, (Eds.), Menlo Park, CA: AAAI Press. pp. 307–328.
Google Scholar
Bell, S. 1995. Discovery and maintenance of functional dependencies by independencies. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 27–32.
Berge, C. 1973. Hypergraphs. Combinatorics of Finite Sets (third edition). Amsterdam: North-Holland.
Google Scholar
Brin, S., Motwani, R., and Silverstein, S. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '97). Tucson, AZ., pp. 265–276.
Chang, C.C. and Keisler, H.J. 1973. Model Theory (third edition, 1990). Amsterdam: North-Holland.
Google Scholar
De Raedt, L. and Bruynooghe, M. 1993. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93). Chambéry, France, pp. 1058–1053.
De Raedt, L. and Džeroski, S. 1994. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392.
Google Scholar
Eiter, T. and Gottlob, G. 1995. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304.
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (Eds.), 1996. Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press.
Google Scholar
Fredman, M.L. and Khachiyan, L. 1996. On the complexity of dualization of monotone disjunctive normal forms. Journal of Algorithms, 21(3):618–628.
Google Scholar
Fukuda, T., Morimoto, Y., Morishita, S., and Tokuyama, T. 1996. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '96). Montréal, Canada, pp. 182–191.
Gunopulos, D., Khardon, R., Mannila, H., and Toivonen, H. 1997. Data mining, hypergraph transversals, and machine learning. In Proceedings of the SixteenthACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97). Tucson, AZ., pp. 209–216.
Gurvich, V. and Khachiyan, L. 1995. On generating the irredundant conjunctive and disjunctive normal forms of monotone boolean functions. Technical Report LCSR-TR-251, Rutgers University.
Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 420–431.
Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 150–155.
Houtsma, M. and Swami, A. 1993. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, CA.
Google Scholar
Imielinski, T. and Mannila, H. 1996. A database perspective on knowledge discovery. Communications of the ACM, 39(11):58–64.
Google Scholar
Kantola, M., Mannila, H., Räihä, K.-J., and Siirtola, H. 1992. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607.
Google Scholar
Kietz, J.-U. and Wrobel, S. 1992. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton ( Ed.), Inductive Logic Programming. London: Academic Press, pp. 335–359.
Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A.I. 1994. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Management (CIKM '94). Gaithersburg, MD, pp. 401–407.
Kloesgen, W. 1995. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69.
Google Scholar
Knobbe, A.J. and Adriaans, P.W. 1996. Discovering foreign key relations in relational databases. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research. Vienna, Austria, pp. 961–966.
Google Scholar
Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.
Mannila, H. and Räihä, K.-J. 1986. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141.
Google Scholar
Mannila, H. and Räihä, K.-J. 1992a. Design of Relational Databases. Wokingham, UK: Addison-Wesley.
Google Scholar
Mannila, H. and Räihä, K.-J. 1992b. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243.
Google Scholar
Mannila, H. and Räihä, K.-J. 1994. Algorithms for inferring functional dependencies. Data and Knowledge Engineering, 12(1):83–99.
Google Scholar
Mannila, H., Toivonen, H., and Verkamo, A.I. 1995. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 210–215.
Mannila, H., Toivonen, H., and Verkamo, A.I. 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289.
Google Scholar
Mishra, N. and Pitt, L. 1995. On bounded-degree hypergraph transversal. Manuscript.
Mitchell, T.M. 1982. Generalization as search. Artificial Intelligence, 18:203–226.
Google Scholar
Park, J.S., Chen, M.-S., and Yu, P.S. 1995. An effective hash-based algorithm for mining association rules. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '95). San Jose, CA, pp. 175–186.
Pfahringer, B. and Kramer, S. 1995. Compression-based evaluation of partial determinations. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD '95). Montréal, Canada, pp. 234–239.
Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley (Eds.), Menlo Park, CA: AAAI Press, pp. 229–248.
Google Scholar
Piatetsky-Shapiro, G. and Frawley, W.J. (Eds.), 1991. Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.
Google Scholar
Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 432–444.
Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95). Zurich, Swizerland, pp. 407–419.
Srikant, R. and Agrawal, R. 1996. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '96). Montréal, Canada, pp. 1–12.
Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). Mumbay, India, pp. 134–145.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, P.O. Box 26, FIN–00014, Finland
Heikki Mannila, Heikki Mannila, Hannu Toivonen & Hannu Toivonen

Authors

Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar
Hannu Toivonen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mannila, H., Toivonen, H. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery 1, 241–258 (1997). https://doi.org/10.1023/A:1009796218281

Download citation

Issue Date: September 1997
DOI: https://doi.org/10.1023/A:1009796218281

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Levelwise Search and Borders of Theories in Knowledge Discovery

Abstract

Access this article

Similar content being viewed by others

Pruning Hypothesis Spaces Using Learned Domain Theories

Tractable Combinations of Theories via Sampling

Detecting Inconsistencies in Large First-Order Knowledge Bases

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Levelwise Search and Borders of Theories in Knowledge Discovery

Abstract

Access this article

Similar content being viewed by others

Pruning Hypothesis Spaces Using Learned Domain Theories

Tractable Combinations of Theories via Sampling

Detecting Inconsistencies in Large First-Order Knowledge Bases

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation