Abstract
A logical language, SeqLog, for mining and querying sequential data and databases is presented. In SeqLog, data takes the form of a sequence of logical atoms, background knowledge can be specified using Datalog style clauses and sequential queries or patterns correspond to subsequences of logical atoms. SeqLog is then used as the representation language for the inductive database mining system MineSeqLog. Inductive queries in MineSeqLog take the form of a conjunction of a monotonic and an anti-monotonic constraint on sequential patterns. Given such an inductive query, MineSeqLog computes the borders of the solution space. MineSeqLog uses variants of the famous level-wise algorithm together with ideas from version spaces to realize this. Finally, we report on a number of experiments in the domains of user-modelling that validate the approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.L.P. (eds.) Proc. 11th Int. Conf. Data Engineering, ICDE, pp. 3–14. IEEE Press, Los Alamitos (1995)
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: Fayyad, U.M., Uthurusamy, R. (eds.) First International Conference on Knowledge Discovery and Data Mining (KDD 1995) (1995)
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings of the 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 223–234. Morgan Kaufmann, San Francisco (1999)
Wang, K.: Discovering patterns from large and dynamic sequential data. Journal of Intelligent Information Systems 9, 33–56 (1997)
Zaki, M.J.: Fast mining of sequential patterns in very large databases. Technical Report 668, Computer Science, University of Rochester, PO Box 270226, Rochester, NY 14627, U.S.A. (1997)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Jacobs, N., Blockeel, H.: From shell logs to shell scripts. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 80–90. Springer, Heidelberg (2001)
Mannila, H., Toivonen, H.: Discovering generalized episodes using minimal occurrences. In: Simoudis, E., Han, J.W., Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), p. 146. AAAI Press, Menlo Park (1996)
Kersting, K., Raiko, T., Kramer, S., De Raedt, L.: Towards discovering structural signatures of protein folds based on logical hidden markov models. In: Proceedings of the Pacific Symposium on Biocomputing (PSB-2003), Kauai, Hawaii, U.S.A. (2003)
Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in hiv data. In: KDD-2001: The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery (2001), ISBN: 158113391X
Hirsh, H.: Theoretical underpinnings of version spaces. In: Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (IJCAI 1991), pp. 665–670. Morgan Kaufmann Publishers, San Francisco (1991)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1, 241–258 (1997)
Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD Conference on Management of Data (1998)
Mitchell, T.: Generalization as search. Artificial Intelligence 18, 203–226 (1980)
De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI 2001: Seventeenth International Joint Conference on Artificial Intelligence (2001)
Mellish, C.: The description identification algorithm. Artificial Intelligence (1990)
Nienhuys-Cheng, S.-H., de Wolf, R.: Foundations of Inductive Logic Programming. LNCS, vol. 1228. Springer, Heidelberg (1997)
Nijssen, S., Kok, J.N.: Faster association rules for multiple relations. In: IJCAI, pp. 891–896 (2001)
Dehaspe, L., Toivonen, H.: Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery Journal 3 (1999)
Greenberg, S.: Using unix: Collected traces of 168 users. Research Report 88/333/45, Department of Computer Science, University of Calgary, Calgary, Canada (1988)
Masson, C., Jacquenet, F.: Mining frequent logical sequences with spirit-log. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, Springer, Heidelberg (2003)
De Raedt, L.: A logical database mining query language. In: Cussens, J., Frisch, A.M. (eds.) ILP 2000. LNCS (LNAI), vol. 1866, pp. 78–92. Springer, Heidelberg (2000)
Bonner, A.J., Mecca, G.: Sequence datalog: Declarative string manipulation in databases. In: Logic in Databases, pp. 399–413 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dan Lee, S., De Raedt, L. (2004). Constraint Based Mining of First Order Sequences in SeqLog. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-44497-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive