Abstract
Mining sequential patterns is used to discover all the frequent sequences in a sequence database. However, the mining may return a huge number of patterns, while the users are only interested in a particular subset of these. In this paper, we consider the problem of mining sequential patterns with itemset constraints. In order to solve this problem, we propose a new algorithm named MSPIC-DBV, which is a pattern-growth algorithm that uses prefixes and dynamic bit vectors. This algorithm prunes the search space at the beginning and during the mining process. Moreover, it reduces the number of candidates that need to be checked. The experimental results show that the proposed algorithm outperforms the previous methods.










Similar content being viewed by others
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: The 11th international conference on data engineering, pp 3–14
Ayres J, Gehrke JE, Yiu T, Flannick J (2002) Sequential pattern mining using a bitmap representation. In: The 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 429–435
Chen E, Cao H, Li Q, Qian T (2008) Efficient strategies for tough aggregate constraint-based sequential pattern mining. Inf Sci 176(1):1498–1518
Chen YL, Hu YH (2006) Constraint-based sequential pattern mining: the consideration of recency and compactness. Decis Support Syst 42(2):1203–1215
Chen J, Gu J, Yang, Qiao Z (2010) Efficient strategies for average constraint-based sequential pattern mining. In: The 2010 international conference on multimedia communications, pp 254–257
de Amo Sandra, Furtado DA (2007) First-order temporal pattern mining with regular expression constraints. Data Knowl Eng 62(3):401–420
Fumarola F Pasqua, Fabiana Lanotte PF, Ceci M, Malerba D (2016) CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48(2):429–463
Garofalakis MN, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. In: The 25th international conference on very large data bases, pp 7–10
Gouda K, Hassaan M, Zaki MJ (2010) Prism: a primal-encoding approach for frequent sequence mining. J Comput Syst Sci 76(1):88–102
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M-C Freespan (2000) Frequent pattern projected sequential pattern mining. In: The 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359
Ho J, Lukov L, Chawla S (2005) Sequential pattern mining with constraints on large protein databases. In: The 12th international conference on management of data (COMAD 2005), pp 89–100
Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Explor 2(2):86–98
Le B, Tran MT, Vo B (2015) Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors. Appl Intell 43(1):74–84
Liao VCC, Chen MS (2014) DFSP: a depth-first spelling algorithm for sequential pattern mining of biological sequences. Knowl Inf Syst 38(3):623–639
Lin MY, Lee SY (2005) Efficient mining of sequential patterns with time constraints by delimited pattern growth. Knowl Inf Syst 7(4):499–514
Lo D, Khoo SC, Li, J: Mining and ranking generators of sequential patterns. In: The 9th SIAM international conference on data mining, pp 553–564 (2008)
Mallick B, Garg D, Grover PS (2014) Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary. Int Arab J Inf Technol 11(1):33–42
Masseglia F, Poncelet P, Teisseire M (2009) Efficient mining of sequential patterns with time constraints: reducing the combinations. Expert Syst Appl 36(2):2677–2690
Orlando S, Perego R, Silvestri C (2004) A new algorithm for gap constrained sequence mining. In: The 2004 ACM symposium on applied computing, pp 540–547
Orlando S, Perego R, Silvestri C (2004) A new algorithm for gap constrained sequence mining. In: The ACM symposium on applied computing (SAC), pp 540–547
Pei J et al (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Eng 16(11):1424–1440
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160
Pokou JM, Fournier-Viger P, Moghrabi C (2016) Authorship attribution using small sets of frequent part-of-speech skip-grams. In: The international Florida artificial intelligence research society conference, pp 86–91
Senkul P, Salin S (2012) Improving pattern quality in web usage mining by using semantic information. Knowl Inf Syst 30(3):527–541
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: The 5th international conference on extending database technology, pp 3–17
Tran MT, Le B, Vo B (2015) Combination of dynamic bit vectors and transaction information for mining frequent closed sequences efficiently. Eng Appl Artif Intell 38:183–189
Tran MT, Le B, Vo B, Hong TP (2016) Mining non-redundant sequential rules with dynamic bit vectors and pruning techniques. Appl Intell 45(2):333–342
Tsai CY, Lai BH (2015) A location-item-time sequential pattern mining algorithm for route recommendation. Knowl Based Syst 73:97–110
Van TT, Vo B, Le B (2014) IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree. Vietnam J Comput Sci 1(2):97–105
Vo B, Hong TP, Le B (2012) DBV-miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206
Vo B, Tran MT, Nguyen H, Hong TP, Le B (2012) A dynamic bit-vector approach for efficiently mining inter-sequence patterns. In: 2012 third international conference on innovations in bio-inspired computing and applications (IBICA), pp 51–56
Yen SJ, Lee YS (2004) Mining sequential patterns with item constraints. In: Data warehousing and knowledge discovery, pp 381–390
Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Mak 9(4):575–599
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: The 9th international conference on information and knowledge management. ACM, pp 422–429
Zaki MJ (2000) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn J 42(1/2):31–60
Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowl Based Syst 89:1–13
Zhang J, Wang Y, Zhang C, Shi Y (2016) Mining contiguous sequential generators in biological sequences. IEEE/ACM Trans Comput Biol Bioinform 13(5):855–867
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.05-2015.07.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Van, T., Vo, B. & Le, B. Mining sequential patterns with itemset constraints. Knowl Inf Syst 57, 311–330 (2018). https://doi.org/10.1007/s10115-018-1161-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1161-6