Constraint Based Mining of First Order Sequences in SeqLog

Dan Lee, Sau; De Raedt, Luc

doi:10.1007/978-3-540-44497-8_8

Sau Dan Lee⁹ &
Luc De Raedt⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

399 Accesses

Abstract

A logical language, SeqLog, for mining and querying sequential data and databases is presented. In SeqLog, data takes the form of a sequence of logical atoms, background knowledge can be specified using Datalog style clauses and sequential queries or patterns correspond to subsequences of logical atoms. SeqLog is then used as the representation language for the inductive database mining system MineSeqLog. Inductive queries in MineSeqLog take the form of a conjunction of a monotonic and an anti-monotonic constraint on sequential patterns. Given such an inductive query, MineSeqLog computes the borders of the solution space. MineSeqLog uses variants of the famous level-wise algorithm together with ideas from version spaces to realize this. Finally, we report on a number of experiments in the domains of user-modelling that validate the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

SPaR-FTR: An Efficient Algorithm for Mining Sequential Patterns-Based Rules

Fast Discovery of Generalized Sequential Patterns

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.L.P. (eds.) Proc. 11th Int. Conf. Data Engineering, ICDE, pp. 3–14. IEEE Press, Los Alamitos (1995)
Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: Fayyad, U.M., Uthurusamy, R. (eds.) First International Conference on Knowledge Discovery and Data Mining (KDD 1995) (1995)
Google Scholar
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings of the 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 223–234. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Wang, K.: Discovering patterns from large and dynamic sequential data. Journal of Intelligent Information Systems 9, 33–56 (1997)
Article Google Scholar
Zaki, M.J.: Fast mining of sequential patterns in very large databases. Technical Report 668, Computer Science, University of Rochester, PO Box 270226, Rochester, NY 14627, U.S.A. (1997)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Jacobs, N., Blockeel, H.: From shell logs to shell scripts. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 80–90. Springer, Heidelberg (2001)
Chapter Google Scholar
Mannila, H., Toivonen, H.: Discovering generalized episodes using minimal occurrences. In: Simoudis, E., Han, J.W., Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), p. 146. AAAI Press, Menlo Park (1996)
Google Scholar
Kersting, K., Raiko, T., Kramer, S., De Raedt, L.: Towards discovering structural signatures of protein folds based on logical hidden markov models. In: Proceedings of the Pacific Symposium on Biocomputing (PSB-2003), Kauai, Hawaii, U.S.A. (2003)
Google Scholar
Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in hiv data. In: KDD-2001: The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery (2001), ISBN: 158113391X
Google Scholar
Hirsh, H.: Theoretical underpinnings of version spaces. In: Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (IJCAI 1991), pp. 665–670. Morgan Kaufmann Publishers, San Francisco (1991)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1, 241–258 (1997)
Article Google Scholar
Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD Conference on Management of Data (1998)
Google Scholar
Mitchell, T.: Generalization as search. Artificial Intelligence 18, 203–226 (1980)
Article MathSciNet Google Scholar
De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI 2001: Seventeenth International Joint Conference on Artificial Intelligence (2001)
Google Scholar
Mellish, C.: The description identification algorithm. Artificial Intelligence (1990)
Google Scholar
Nienhuys-Cheng, S.-H., de Wolf, R.: Foundations of Inductive Logic Programming. LNCS, vol. 1228. Springer, Heidelberg (1997)
MATH Google Scholar
Nijssen, S., Kok, J.N.: Faster association rules for multiple relations. In: IJCAI, pp. 891–896 (2001)
Google Scholar
Dehaspe, L., Toivonen, H.: Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery Journal 3 (1999)
Google Scholar
Greenberg, S.: Using unix: Collected traces of 168 users. Research Report 88/333/45, Department of Computer Science, University of Calgary, Calgary, Canada (1988)
Google Scholar
Masson, C., Jacquenet, F.: Mining frequent logical sequences with spirit-log. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, Springer, Heidelberg (2003)
Chapter Google Scholar
De Raedt, L.: A logical database mining query language. In: Cussens, J., Frisch, A.M. (eds.) ILP 2000. LNCS (LNAI), vol. 1866, pp. 78–92. Springer, Heidelberg (2000)
Chapter Google Scholar
Bonner, A.J., Mecca, G.: Sequence datalog: Declarative string manipulation in databases. In: Logic in Databases, pp. 399–413 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik, Albert-Ludwigs-Universität, Freiburg, Germany
Sau Dan Lee & Luc De Raedt

Authors

Sau Dan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università di Torino, Italy
Rosa Meo
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Pier Luca Lanzi
Nokia Research Center, Nokia Group, P.O.Box 407, FIN-00045, Finland
Mika Klemettinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dan Lee, S., De Raedt, L. (2004). Constraint Based Mining of First Order Sequences in SeqLog. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-44497-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics