Discovery of frequent DATALOG patterns

Dehaspe, Luc; Toivonen, Hannu

doi:10.1023/A:1009863704807

Discovery of frequent DATALOG patterns

Published: March 1999

Volume 3, pages 7–36, (1999)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Luc Dehaspe¹ &
Hannu Toivonen²

457 Accesses
189 Citations
Explore all metrics

Abstract

Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem.

The motivation for this novel approach is twofold. First, exploratory data mining is well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings.

We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adé, H., De Raedt, L. and Bruynooghe, M. 1995. Declarative Bias for Specific-to-General ILP Systems. Machine Learning 20:119-154.
Google Scholar
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. Proceedings of the Eleventh International Conference on Data Engineering (ICDE'95), pp. 3-14.
Agrawal, R., Imielinski, T. and Swami, A. 1993. Mining association rules between sets of items in large databases. Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93). ACM, Washington, D.C., pp. 207-216.
Google Scholar
Agrawal, R. Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I. 1996. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, pp. 307-328.
Google Scholar
Ashby, J. and Tennant, R. W. 1991. Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutation Research, 257:229-306.
Google Scholar
Bettini, C., Wang, X. S. and Jajodia, S. 1996. Testing complex temporal relationships involving multiple granularities and its application to data mining. Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96), pp. 68-78.
Blockeel, H. and De Raedt, L. 1996. Relational knowledge discovery in databases. Proceedings of the 6th International Workshop on Inductive Logic Programming. Lecture Notes in Artificial Intelligence, Springer-Verlag, 1314, pp. 199-212.
Blockeel, H. and De Raedt, L. 1998. Top-down induction of first order logical decision trees. Artificial Intelligence, 101:285-297.
Google Scholar
Blockeel, H., De Raedt, L., Jacobs, N. and Demoen, B. 1998a. Scaling up ILP by learning from interpretations. This volume.
Blockeel, H., De Raedt, L. and Ramon, J. 1998b. Top-down induction of clustering trees. In Proceedings of the 15th International Conference on Machine Learning, 55-63. Morgan Kaufmann.
Bristol, D., Wachsman, J. and Greenwell, A. 1996. The NIEHS predictive-toxicology evaluation project. Environmental Health Perspectives Supplement 3:1001-1010.
Google Scholar
Clark, P. and Niblett, T. 1989. The CN2 algorithm. Machine Learning 3(4):261-284.
Google Scholar
De Raedt, L. and Dehaspe, L. 1997. Clausal discovery. Machine Learning 26:99-146.
Google Scholar
De Raedt, L. and Džeroski, S. 1994. First order jk-clausal theories are PAC-learnable. Artificial Intelligence 70:375-392.
Google Scholar
De Raedt, L. and Van Laer, W. 1995. Inductive constraint logic. In Jantke, K. P.; Shinohara, T.; and Zeugmann, T., eds., Proceedings of the 6th International Workshop on Algorithmic Learning Theory, volume 997 of Lecture Notes in Artificial Intelligence, 80-94. Springer-Verlag.
De Raedt, L., Blockeel, H., Dehaspe, L. and Van Laer, W. 1998. Three companions for first order data mining. In Džeroski, S., and Lavrač, N., eds., Inductive Logic Programming for Knowledge Discovery in Databases, Lecture Notes in Artificial Intelligence. Springer-Verlag. To appear.
De Raedt, L. 1996. Induction in logic. In Michalski, R., and J., W., eds., Proceedings of the 3rd International Workshop on Multistrategy Learning, 29-38.
Dehaspe, L. and De Raedt, L. 1996. DLAB: A declarative language bias formalism. In Proceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS96), volume 1079 of Lecture Notes in Artificial Intelligence, 613-622. Springer-Verlag.
Dehaspe, L. and De Raedt, L. 1997. Mining association rules in multiple relations. In Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297 of Lecture Notes in Artificial Intelligence, 125-132. Springer-Verlag.
Dehaspe, L., Toivonen, H. and King, R. 1998. Finding frequent substructures in chemical compounds. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98), 30-36. AAAI Press.
Dehaspe, L. 1998. Frequent pattern discovery in first-order logic. Ph.D. Dissertation, K.U.Leuven.
Dietterich, T. G., Lathrop, R. H. and Lozano-Pérez, T. 1997. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89(1–2):31-71.
Google Scholar
Djoko, S., Cook, D. J. and Holder, L. B. 1995. Analyzing the benefits of domain knowledge in substructure discovery. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), 75-80.
Dousson, C., Gaborit, P. and Ghallab, M. 1993. Situation recognition: Representation and algorithms. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), 166-172.
Džeroski, S., De Raedt, L. and Blockeel, H. 1998. Relational reinforcement learning. In Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann.
Džeroski, S. 1996. Inductive logic programming and knowledge discovery in databases. In Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R., eds., Advances in Knowledge Discovery and Data Mining. MIT Press. 118-152.
Goodman, R. M. and Latin, H. 1991. Automated knowledge acquisition from network management databases. In Krishnan, I., and Zimmer, W., eds., Integrated Network Management, II. Amsterdam, The Netherlands: Elsevier Science Publishers B.V (North-Holland). 541-549.
Google Scholar
Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 420-431.
Hätönen, K.; Klemettinen, M.; Mannila, H.; Ronkainen, P.; and Toivonen, H. 1996. Knowledge discovery from telecommunication network alarm databases. In Proceedings of the 12th International Conference on Data Engineering (ICDE'96), 115-122. New Orleans, Louisiana: IEEE Computer Society Press.
Google Scholar
Holsheimer, M., Kersten, M., Mannila, H. and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), 150-155. Montreal, Canada: AAAI Press.
Google Scholar
Kietz, J. and Lübbe, M. 1994. An efficient subsumption algorithm for inductive logic programming. In Proceedings of the 11th International Conference on Machine Learning. Morgan Kaufmann.
Kietz, J.-U. and Wrobel, S. 1992. Controlling the complexity of learning in logic through syntactic and task-oriented models. In Muggleton, S., ed., Inductive logic programming. Academic Press. 335-359.
King, R. and Srinivasan, A. 1996. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives 104(5):1031-1040.
Google Scholar
King, R., Muggleton, S., Srinivasan, A. and Sternberg, M. 1996. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences 93:438-442.
Google Scholar
Klemettinen, M., Mannila, H. and Toivonen, H. 1998. Rule discovery in telecommunication alarm data. Journal of Network and Systems Management.
Klösgen, W. 1996. Explora: A multipattern and multistrategy discovery assistant. In Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R., eds., Advances in Knowledge Discovery and Data Mining. MIT Press.
Kramer, S., Pfahringer, B. and Helma, C. 1997. Mining for causes of cancer: machine learning experiments at various levels of detail. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 223-226.
Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Lindner, G. and Morik, K. 1995. Coupling a relational learning algorithm with a database system. In Kodratoff, Y.; Nakhaeizadeh, G.; and Taylor, G., eds., Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases.
Lu, H., Setiono, R., and Liu, H. 1995. Neurorule: A connectionist approach to data mining. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 478-489.
Mannila, H. and Toivonen, H. 1996. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), 146-151. Portland, Oregon: AAAI Press.
Google Scholar
Mannila, H. and Toivonen, H. 1997. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3):241-258.
Google Scholar
Mannila, H., Toivonen, H. and Verkamo, A. I. 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3):259-289.
Google Scholar
Mitchell, T. 1982. Generalization as search. Artificial Intelligence 18:203-226.
Google Scholar
Morris, R. A., Khatib, L. and Ligozat, G. 1995. Generating scenarios from specifications of repeating events. In Second International Workshop on Temporal Representation and Reasoning (TIME-95).
Muggleton, S. and De Raedt, L. 1994. Inductive logic programming: Theory and methods. Journal of Logic Programming 19, 20:629-679.
Google Scholar
Muggleton, S. 1995. Inverse entailment and Progol. New Generation Computing 13.
Muggleton, S. 1996. Learning from positive data. In Muggleton, S., ed., Proceedings of the 6th International Workshop on Inductive Logic Programming, 225-244. Stockholm University, Royal Institute of Technology.
Nédellec, C., Adé, H., Bergadano, F. and Tausend, B. 1996. Declarative bias in ILP. In De Raedt, L., ed., Advances in Inductive Logic Programming, volume 32 of Frontiers in Artificial Intelligence and Applications. IOS Press. 82-103.
Oates, T. and Cohen, P. R. 1996. Searching for structure in multiple streams of data. In Proceedings of the Thirteenth International Conference on Machine Learning (ICML'96), 346-354. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Padmanabhan, B. and Tuzhilin, A. 1996. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), 351-354.
Plotkin, G. 1970. A note on inductive generalization. In Machine Intelligence, volume 5. Edinburgh University Press. 153-163.
Quinlan, J. 1986. Induction of decision trees. Machine Learning 1:81-106.
Google Scholar
Sasisekharan, R., Seshadri, V. and Weiss, S. M. 1996. Data mining and forecasting in large-scale telecommunication networks. IEEE Expert, Intelligent Systems & Their Applications 11(1):37-43.
Google Scholar
Savasere, A., Omiecinski, E. and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 432-444.
Shen, W., Ong, K., Mitbander, B. and Zaniolo, C. 1996. Metaqueries for data mining. In Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R., eds., Advances in Knowledge Discovery and Data Mining. MIT Press. 375-398.
Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Dayal, U., Gray, P. M. D. and Nishio, S., eds., Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 407-419. Zürich, Switzerland: Morgan Kaufmann.
Google Scholar
Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Advances in Database Technology—5th International Conference on Extending Database Technology (EDBT'96), 3-17.
Srikant, R., Vu, Q. and Agrawal, R. 1997. Mining association rules with item constraints. In Heckerman, D., Mannila, H., Pregibon, D., and Uthurusamy, R., eds., Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 67-73. AAAI Press.
Srinisavan, A., King, R. D., Muggleton, S. H. and Sternberg, M. J. E. 1997. The predictive toxicology evaluation challenge. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97). Morgan Kaufmann.
Srinivasan, A., King, R., Muggleton, S., and Sternberg, M. 1997. Carcinogenesis predictions using ILP. In Proceedings of the 7th International Workshop on Inductive Logic Programming, Lecture Notes in Artificial Intelligence, 273-287. Springer-Verlag.
Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), 134-145. Mumbay, India: Morgan Kaufmann.
Google Scholar
Ullman, J. D. 1988. Principles of Database and Knowledge-Base Systems, volume I. Rockville, MD: Computer Science Press.
Google Scholar
Wang, K. and Liu, H. 1997. Schema discovery for semistructured data. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 271-274.
Wang, J. T.-L., Chirn, G.-W., Marr, T. G., Shapiro, B., Shasha, D. and Zhang, K. 1994. Combinatorial pattern discovery for scientific data: Some preliminary results. In Snodgrass, R., and Winslett, M., eds., Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'94), 115-125. Minneapolis, MI: ACM.
Google Scholar
Wang, X., Wang, J. T. L., Shasha, D., Shapiro, B., Dikshitulu, S., Rigoutsos, I. and Zhang, K. 1997. Automated discovery of active motifs in three dimensional molecules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 89-95.
Weber, I. 1997. Discovery of first-order regularities in a relational database using offline candidate determination. Proceedings of the 7th International Workshop on Inductive Logic Programming. Lecture Notes in Artificial Intelligence, Springer-Verlag, 1297, pp. 288-295.
Weber, I. 1998. A declarative language bias for levelwise search of first-order regularities. Proc. Fachgruppentreffen Maschinelles Lernen (FGML-98). Techn. Univ. Berlin, Technischer Bericht 98/11. http://www.informatik.uni-stuttgart.de/ifi/is/Pers onen/Irene/fgml98.ps.gz.
Wrobel, S. 1997. An algorithm for multi-relational discovery of subgroups. Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '97). Springer-Verlag, pp. 78-87.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001, Heverlee, Belgium
Luc Dehaspe
Rolf Nevanlinna Institute & Department of Computer Science, University of Helsinki, P.O. Box 4, FIN-00014, Finland
Hannu Toivonen

Authors

Luc Dehaspe
View author publications
You can also search for this author in PubMed Google Scholar
Hannu Toivonen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dehaspe, L., Toivonen, H. Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3, 7–36 (1999). https://doi.org/10.1023/A:1009863704807

Download citation

Issue Date: March 1999
DOI: https://doi.org/10.1023/A:1009863704807

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovery of frequent DATALOG patterns

Abstract

Access this article

Similar content being viewed by others

Frequent Pattern Mining Algorithms: A Survey

Pattern-Growth Methods

Pushing Constraints into a Pattern-Tree

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Discovery of frequent DATALOG patterns

Abstract

Access this article

Similar content being viewed by others

Frequent Pattern Mining Algorithms: A Survey

Pattern-Growth Methods

Pushing Constraints into a Pattern-Tree

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation