Abstract
Constraint Programming (CP) has proven to be an effective platform for constraint based sequence mining. Previous work has focused on standard frequent sequence mining, as well as frequent sequence mining with a maximum ’gap’ between two matching events in a sequence. The main challenge in the latter is that this constraint can not be imposed independently of the omnipresent frequency constraint. Indeed, the gap constraint changes whether a subsequence is included in a sequence, and hence its frequency. In this work, we go beyond that and investigate the integration of timed events and constraining the minimum/maximum gap as well as minimum/maximum span. The latter constrains the allowed time between the first and last matching event of a pattern. We show how the three are interrelated, and what the required changes to the frequency constraint are. Key in our approach is the concept of an extension window defined by gap/span and we develop techniques to avoid scanning the sequences needlessly, as well as using a backtracking-aware data structure. Experiments demonstrate that the proposed approach outperforms both specialized and CP-based approaches in almost all cases and that the advantage increases as the minimum frequency threshold decreases. This paper is an extension of the original manuscript presented at CPAIOR’17 [5].









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Except some CP-Solvers such as Gecode, Oz/Mozart and Figaro.
It is neither post-processing nor hard-coded
References
Aggarwal, C.C., & Han, J. (2014). Frequent pattern mining. Springer.
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns, Proceedings of the eleventh international conference on data engineering, 1995. (pp. 3–14).
Antunes, C., & Oliveira, A.L. (2003). Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In Perner, P., & Rosenfeld, A. (Eds.), Machine learning and data mining in pattern recognition: 3rd international conference, MLDM 2003 leipzig, Germany, July 5–7, 2003 Proceedings (pp. 239–251). Berlin: Springer.
Aoga, J.O.R., Guns, T., & Schaus, P. (2016). An efficient algorithm for mining frequent sequence with constraint programming. In Frasconi, P., Landwehr, N., Manco, G., & Vreeken, J. (Eds.), Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, riva del garda, Italy, September 19-23, 2016, Proceedings, Part II (pp. 315–330). Cham: Springer International Publishing.
Aoga, J.O.R., Guns, T., & Schaus, P. (2017). Mining time-constrained sequential patterns with constraint programming. In Salvagnin, D., & Lombardi, M. (Eds.), Integration of AI and OR techniques in constraint programming - 13th international conference, CPAIOR 2017, padova, Italy, June 5 - 8, 2017, Proceedings, Lecture Notes in Computer Science. Springer.
Ayres, J., Flannick, J., Gehrke, J., & Yiu, T. (2002). Sequential pattern mining using a bitmap representation, Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, July 23-26, 2002, edmonton, alberta, Canada (pp. 429–435).
Batal, I., Fradkin, D., Harrison, J., Moerchen, F., & Hauskrecht, M. (2012). Mining recent temporal patterns for event detection in multivariate time series data. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 280–288).
Beldiceanu, N., & Contejean, E. (1994). Introducing global constraints in chip. Mathematical and computer Modelling, 20(12), 97–123.
Coquery, E., Jabbour, S., Saïs, L., & Salhi, Y. (2012). A sat-based approach for discovering frequent, closed and maximal patterns in a sequence. In Raedt, L.d., Bessiėre, C., Dubois, D., Doherty, P., Frasconi, P., Heintz, F., & Lucas, P.J.F. (Eds.), ECAI 2012 - 20th European Conference on Artificial Intelligence. Montpellier, France, August 27-31, 2012, Frontiers in Artificial Intelligence and Applications, vol. 242, pp. 258–263. IOS Press.
Desai, N.A.K., & Ganatra, A. (2015). Efficient constraint-based sequential pattern mining (spm) algorithm to understand customers buying behaviour from time stamp-based sequence dataset. Cogent Engineering, 2(1), 1072,292.
Fournier-Viger, P., Wu, C.W., & Tseng, V.S. (2013). Mining maximal sequential patterns without candidate maintenance, Advanced data mining and applications (pp. 169–180): Springer.
Guns, T., Nijssen, S., & De Raedt, L. (2013). k-pattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering, 25(2), 402–418.
Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data mining and knowledge discovery, 8(1), 53–87.
He, J., Flener, P., Pearson, J., & Zhang, W.M. (2013). Solving string constraints: The case for constraint programming, International conference on principles and practice of constraint programming (pp. 381–397): Springer.
Henriques, R., Antunes, C., & Madeira, S.C. (2014). Methods for the efficient discovery of large item-indexable sequential patterns. In Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., & Ras, Z.W. (Eds.), New frontiers in mining complex patterns: Second international workshop, NFMCP 2013, held in conjunction with ECML-PKDD 2013, prague, Czech Republic, September 27, 2013, Revised Selected Papers (pp. 100–116). Cham: Springer International Publishing.
Henriques, R., & Madeira, S.C. (2014). Bicspam: flexible biclustering using sequential patterns. BMC Bioinformatics, 15(1), 130.
Kadioglu, S., & Sellmann, M. (2010). Grammar constraints. Constraints, 15(1), 117–144.
Kemmar, A., Lebbah, Y., Loudni, S., Boizumault, P., & Charnois, T. (2017). Prefix-projection global constraint and top-k approach for sequential pattern mining. Constraints, 22(2), 265–306.
Kemmar, A., Loudni, S., Lebbah, Y., Boizumault, P., & Charnois, T. (2015). Prefix-projection global constraint for sequential pattern mining. In Pesant, G. (Ed.), Principles and practice of constraint programming: 21st international conference, CP 2015, cork, Ireland, August 31 – September 4, 2015, Proceedings (pp. 226–243). Cham: Springer International Publishing.
Kemmar, A., Loudni, S., Lebbah, Y., Boizumault, P., & Charnois, T. (2016). A global constraint for mining sequential patterns with GAP constraint. In Quimper, C. (Ed.), Integration of AI and OR techniques in constraint programming - 13th international conference, CPAIOR 2016, banff, AB, Canada, May 29 - June 1, 2016, Proceedings, Lecture Notes in Computer Science, (Vol. 9676 pp. 198–215): Springer.
Li, C., & Wang, J. (2008). Efficiently mining closed subsequences with gap constraints. In Proceedings of the SIAM international conference on data mining, SDM 2008, April 24-26, 2008, atlanta, Georgia, USA (pp. 313–322).
Lu, S., & Li, C. (2004). Aprioriadjust: an efficient algorithm for discovering the maximum sequential patterns. In Proc. Intern. Workshop knowl. Grid and grid intell.
Mannila, H., Toivonen, H., & Verkamo, A.I. (1997). Discovery of frequent episodes in event sequences. Data mining and knowledge discovery, 1(3), 259–289.
Metivier, J., Boizumault, P., Crémilleux, B., Khiari, M., & Loudni, S. (2011). A constraint-based language for declarative pattern discovery. In Data mining workshops (ICDMW), 2011 IEEE 11th international conference on (pp. 1112–1119).
Nėgrevergne, B., & Guns, T. (2015). Constraint-based sequence mining using constraint programming. In Michel, L. (Ed.), Integration of AI and OR techniques in constraint programming - 12th international conference, CPAIOR 2015, barcelona, Spain, May 18-22, 2015, Proceedings, Lecture Notes in Computer Science, (Vol. 9075 pp. 288–305): Springer.
OscaR Team (2012). OscaR: Scala in OR. Available from https://bitbucket.org/oscarlib/oscar.
Parthasarathy, S., Zaki, M.J., Ogihara, M., & Dwarkadas, S. (1999). Incremental and interactive sequence mining. In Proceedings of the 8th international conference on information and knowledge management (pp. 251–258).
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M.C. (2001). Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th international conference on data engineering (pp. 215–224).
Pei, J., Han, J., & Wang, W. (2007). Constraint-based sequential pattern mining: the pattern-growth methods. Journal of Intelligent Information Systems, 28 (2), 133–160.
Pesant, G. (2004). A regular language membership constraint for finite sequences of variables. In International conference on principles and practice of constraint programming (pp. 482–495): Springer.
Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q., & Dayal, U. (2001). Multi-dimensional sequential pattern mining. In Proceedings of the tenth international conference on information and knowledge management (pp. 81–88).
Quimper, C.G., & Walsh, T. (2006). Global grammar constraints. In International conference on principles and practice of constraint programming (pp. 751–755): Springer.
Régin, J. C. (1996). Generalized arc consistency for global cardinality constraint. In Proceedings of the thirteenth national conference on artificial intelligence-volume 1 (pp. 209–215): AAAI press.
Rossi, F., Van Beek, P., & Walsh, T. (2006). Handbook of CP. elsevier.
Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. Springer.
Tatti, N., & Cule, B. (2011). Mining closed episodes with simultaneous events. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11 (pp. 1172–1180). New York: ACM.
Wang, J., Han, J., & Li, C. (2007). Frequent closed sequence mining without candidate maintenance. IEEE Transactions on Knowledge and Data Engineering, 19(8), 1042–1056.
Yan, X., Han, J., & Afshar, R. (2003). Clospan: Mining: Closed sequential patterns in large datasets. In Proceedings of the 2003 SIAM international conference on data mining (pp. 166–177): SIAM.
Zaki, M.J. (1998). Efficient enumeration of frequent sequences. In Proceedings of the seventh international conference on information and knowledge management (pp. 68–75): ACM.
Zaki, M.J. (2000). Sequence mining in categorical domains: incorporating constraints. In Proceedings of the ninth international conference on information and knowledge management (pp. 422–429): ACM.
Zhao, Q., & Bhowmick, S.S. (2003). Sequential pattern mining: a survey. ITechnical Report CAIS Nayang Technological University Singapore pp. 1–26.
Acknowledgements
The research is supported by the FRIA-FNRS (Fonds pour la Formation à la Recherche dans l’Industrie et dans l’Agriculture, Belgium) and FWO (Research Foundation – Flanders).
Author information
Authors and Affiliations
Corresponding author
Additional information
This article belongs to the Topical Collection: Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming
Guest Editors: Michele Lombardi and Domenico Salvagnin
Rights and permissions
About this article
Cite this article
Aoga, J.O.R., Guns, T. & Schaus, P. Mining Time-constrained Sequential Patterns with Constraint Programming. Constraints 22, 548–570 (2017). https://doi.org/10.1007/s10601-017-9272-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10601-017-9272-3