Skip to main content
Log in

Discovery of frequent DATALOG patterns

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem.

The motivation for this novel approach is twofold. First, exploratory data mining is well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings.

We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adé, H., De Raedt, L. and Bruynooghe, M. 1995. Declarative Bias for Specific-to-General ILP Systems. Machine Learning 20:119-154.

    Google Scholar 

  • Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. Proceedings of the Eleventh International Conference on Data Engineering (ICDE'95), pp. 3-14.

  • Agrawal, R., Imielinski, T. and Swami, A. 1993. Mining association rules between sets of items in large databases. Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93). ACM, Washington, D.C., pp. 207-216.

    Google Scholar 

  • Agrawal, R. Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I. 1996. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, pp. 307-328.

    Google Scholar 

  • Ashby, J. and Tennant, R. W. 1991. Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutation Research, 257:229-306.

    Google Scholar 

  • Bettini, C., Wang, X. S. and Jajodia, S. 1996. Testing complex temporal relationships involving multiple granularities and its application to data mining. Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96), pp. 68-78.

  • Blockeel, H. and De Raedt, L. 1996. Relational knowledge discovery in databases. Proceedings of the 6th International Workshop on Inductive Logic Programming. Lecture Notes in Artificial Intelligence, Springer-Verlag, 1314, pp. 199-212.

  • Blockeel, H. and De Raedt, L. 1998. Top-down induction of first order logical decision trees. Artificial Intelligence, 101:285-297.

    Google Scholar 

  • Blockeel, H., De Raedt, L., Jacobs, N. and Demoen, B. 1998a. Scaling up ILP by learning from interpretations. This volume.

  • Blockeel, H., De Raedt, L. and Ramon, J. 1998b. Top-down induction of clustering trees. In Proceedings of the 15th International Conference on Machine Learning, 55-63. Morgan Kaufmann.

  • Bristol, D., Wachsman, J. and Greenwell, A. 1996. The NIEHS predictive-toxicology evaluation project. Environmental Health Perspectives Supplement 3:1001-1010.

    Google Scholar 

  • Clark, P. and Niblett, T. 1989. The CN2 algorithm. Machine Learning 3(4):261-284.

    Google Scholar 

  • De Raedt, L. and Dehaspe, L. 1997. Clausal discovery. Machine Learning 26:99-146.

    Google Scholar 

  • De Raedt, L. and Džeroski, S. 1994. First order jk-clausal theories are PAC-learnable. Artificial Intelligence 70:375-392.

    Google Scholar 

  • De Raedt, L. and Van Laer, W. 1995. Inductive constraint logic. In Jantke, K. P.; Shinohara, T.; and Zeugmann, T., eds., Proceedings of the 6th International Workshop on Algorithmic Learning Theory, volume 997 of Lecture Notes in Artificial Intelligence, 80-94. Springer-Verlag.

  • De Raedt, L., Blockeel, H., Dehaspe, L. and Van Laer, W. 1998. Three companions for first order data mining. In Džeroski, S., and Lavrač, N., eds., Inductive Logic Programming for Knowledge Discovery in Databases, Lecture Notes in Artificial Intelligence. Springer-Verlag. To appear.

  • De Raedt, L. 1996. Induction in logic. In Michalski, R., and J., W., eds., Proceedings of the 3rd International Workshop on Multistrategy Learning, 29-38.

  • Dehaspe, L. and De Raedt, L. 1996. DLAB: A declarative language bias formalism. In Proceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS96), volume 1079 of Lecture Notes in Artificial Intelligence, 613-622. Springer-Verlag.

  • Dehaspe, L. and De Raedt, L. 1997. Mining association rules in multiple relations. In Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297 of Lecture Notes in Artificial Intelligence, 125-132. Springer-Verlag.

  • Dehaspe, L., Toivonen, H. and King, R. 1998. Finding frequent substructures in chemical compounds. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98), 30-36. AAAI Press.

  • Dehaspe, L. 1998. Frequent pattern discovery in first-order logic. Ph.D. Dissertation, K.U.Leuven.

  • Dietterich, T. G., Lathrop, R. H. and Lozano-Pérez, T. 1997. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89(1–2):31-71.

    Google Scholar 

  • Djoko, S., Cook, D. J. and Holder, L. B. 1995. Analyzing the benefits of domain knowledge in substructure discovery. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), 75-80.

  • Dousson, C., Gaborit, P. and Ghallab, M. 1993. Situation recognition: Representation and algorithms. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), 166-172.

  • Džeroski, S., De Raedt, L. and Blockeel, H. 1998. Relational reinforcement learning. In Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann.

  • Džeroski, S. 1996. Inductive logic programming and knowledge discovery in databases. In Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R., eds., Advances in Knowledge Discovery and Data Mining. MIT Press. 118-152.

  • Goodman, R. M. and Latin, H. 1991. Automated knowledge acquisition from network management databases. In Krishnan, I., and Zimmer, W., eds., Integrated Network Management, II. Amsterdam, The Netherlands: Elsevier Science Publishers B.V (North-Holland). 541-549.

    Google Scholar 

  • Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 420-431.

  • Hätönen, K.; Klemettinen, M.; Mannila, H.; Ronkainen, P.; and Toivonen, H. 1996. Knowledge discovery from telecommunication network alarm databases. In Proceedings of the 12th International Conference on Data Engineering (ICDE'96), 115-122. New Orleans, Louisiana: IEEE Computer Society Press.

    Google Scholar 

  • Holsheimer, M., Kersten, M., Mannila, H. and Toivonen, H. 1995. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), 150-155. Montreal, Canada: AAAI Press.

    Google Scholar 

  • Kietz, J. and Lübbe, M. 1994. An efficient subsumption algorithm for inductive logic programming. In Proceedings of the 11th International Conference on Machine Learning. Morgan Kaufmann.

  • Kietz, J.-U. and Wrobel, S. 1992. Controlling the complexity of learning in logic through syntactic and task-oriented models. In Muggleton, S., ed., Inductive logic programming. Academic Press. 335-359.

  • King, R. and Srinivasan, A. 1996. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives 104(5):1031-1040.

    Google Scholar 

  • King, R., Muggleton, S., Srinivasan, A. and Sternberg, M. 1996. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences 93:438-442.

    Google Scholar 

  • Klemettinen, M., Mannila, H. and Toivonen, H. 1998. Rule discovery in telecommunication alarm data. Journal of Network and Systems Management.

  • Klösgen, W. 1996. Explora: A multipattern and multistrategy discovery assistant. In Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R., eds., Advances in Knowledge Discovery and Data Mining. MIT Press.

  • Kramer, S., Pfahringer, B. and Helma, C. 1997. Mining for causes of cancer: machine learning experiments at various levels of detail. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 223-226.

  • Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Lindner, G. and Morik, K. 1995. Coupling a relational learning algorithm with a database system. In Kodratoff, Y.; Nakhaeizadeh, G.; and Taylor, G., eds., Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases.

  • Lu, H., Setiono, R., and Liu, H. 1995. Neurorule: A connectionist approach to data mining. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 478-489.

  • Mannila, H. and Toivonen, H. 1996. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), 146-151. Portland, Oregon: AAAI Press.

    Google Scholar 

  • Mannila, H. and Toivonen, H. 1997. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3):241-258.

    Google Scholar 

  • Mannila, H., Toivonen, H. and Verkamo, A. I. 1997. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3):259-289.

    Google Scholar 

  • Mitchell, T. 1982. Generalization as search. Artificial Intelligence 18:203-226.

    Google Scholar 

  • Morris, R. A., Khatib, L. and Ligozat, G. 1995. Generating scenarios from specifications of repeating events. In Second International Workshop on Temporal Representation and Reasoning (TIME-95).

  • Muggleton, S. and De Raedt, L. 1994. Inductive logic programming: Theory and methods. Journal of Logic Programming 19, 20:629-679.

    Google Scholar 

  • Muggleton, S. 1995. Inverse entailment and Progol. New Generation Computing 13.

  • Muggleton, S. 1996. Learning from positive data. In Muggleton, S., ed., Proceedings of the 6th International Workshop on Inductive Logic Programming, 225-244. Stockholm University, Royal Institute of Technology.

  • Nédellec, C., Adé, H., Bergadano, F. and Tausend, B. 1996. Declarative bias in ILP. In De Raedt, L., ed., Advances in Inductive Logic Programming, volume 32 of Frontiers in Artificial Intelligence and Applications. IOS Press. 82-103.

  • Oates, T. and Cohen, P. R. 1996. Searching for structure in multiple streams of data. In Proceedings of the Thirteenth International Conference on Machine Learning (ICML'96), 346-354. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Padmanabhan, B. and Tuzhilin, A. 1996. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), 351-354.

  • Plotkin, G. 1970. A note on inductive generalization. In Machine Intelligence, volume 5. Edinburgh University Press. 153-163.

  • Quinlan, J. 1986. Induction of decision trees. Machine Learning 1:81-106.

    Google Scholar 

  • Sasisekharan, R., Seshadri, V. and Weiss, S. M. 1996. Data mining and forecasting in large-scale telecommunication networks. IEEE Expert, Intelligent Systems & Their Applications 11(1):37-43.

    Google Scholar 

  • Savasere, A., Omiecinski, E. and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 432-444.

  • Shen, W., Ong, K., Mitbander, B. and Zaniolo, C. 1996. Metaqueries for data mining. In Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R., eds., Advances in Knowledge Discovery and Data Mining. MIT Press. 375-398.

  • Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Dayal, U., Gray, P. M. D. and Nishio, S., eds., Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), 407-419. Zürich, Switzerland: Morgan Kaufmann.

    Google Scholar 

  • Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Advances in Database Technology—5th International Conference on Extending Database Technology (EDBT'96), 3-17.

  • Srikant, R., Vu, Q. and Agrawal, R. 1997. Mining association rules with item constraints. In Heckerman, D., Mannila, H., Pregibon, D., and Uthurusamy, R., eds., Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 67-73. AAAI Press.

  • Srinisavan, A., King, R. D., Muggleton, S. H. and Sternberg, M. J. E. 1997. The predictive toxicology evaluation challenge. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97). Morgan Kaufmann.

  • Srinivasan, A., King, R., Muggleton, S., and Sternberg, M. 1997. Carcinogenesis predictions using ILP. In Proceedings of the 7th International Workshop on Inductive Logic Programming, Lecture Notes in Artificial Intelligence, 273-287. Springer-Verlag.

  • Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), 134-145. Mumbay, India: Morgan Kaufmann.

    Google Scholar 

  • Ullman, J. D. 1988. Principles of Database and Knowledge-Base Systems, volume I. Rockville, MD: Computer Science Press.

    Google Scholar 

  • Wang, K. and Liu, H. 1997. Schema discovery for semistructured data. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 271-274.

  • Wang, J. T.-L., Chirn, G.-W., Marr, T. G., Shapiro, B., Shasha, D. and Zhang, K. 1994. Combinatorial pattern discovery for scientific data: Some preliminary results. In Snodgrass, R., and Winslett, M., eds., Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'94), 115-125. Minneapolis, MI: ACM.

    Google Scholar 

  • Wang, X., Wang, J. T. L., Shasha, D., Shapiro, B., Dikshitulu, S., Rigoutsos, I. and Zhang, K. 1997. Automated discovery of active motifs in three dimensional molecules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), 89-95.

  • Weber, I. 1997. Discovery of first-order regularities in a relational database using offline candidate determination. Proceedings of the 7th International Workshop on Inductive Logic Programming. Lecture Notes in Artificial Intelligence, Springer-Verlag, 1297, pp. 288-295.

  • Weber, I. 1998. A declarative language bias for levelwise search of first-order regularities. Proc. Fachgruppentreffen Maschinelles Lernen (FGML-98). Techn. Univ. Berlin, Technischer Bericht 98/11. http://www.informatik.uni-stuttgart.de/ifi/is/Pers onen/Irene/fgml98.ps.gz.

  • Wrobel, S. 1997. An algorithm for multi-relational discovery of subgroups. Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '97). Springer-Verlag, pp. 78-87.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dehaspe, L., Toivonen, H. Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3, 7–36 (1999). https://doi.org/10.1023/A:1009863704807

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009863704807

Navigation