Abstract
Repetitive sequential pattern mining (SPM) with gap constraints is a data analysis task that consists of identifying patterns (subsequences) appearing many times in a discrete sequence of symbols or events. By using gap constraints, the user can filter many meaningless patterns, and focus on those that are the most interesting for his needs. However, it is difficult to set appropriate gap constraints without prior knowledge. Hence, users generally find suitable constraints by trial and error, which is time-consuming. Besides, current algorithms are inefficient as they repeatedly check whether the gap constraints are satisfied. To address these problems, this paper presents a complete algorithm called SNP-Miner that has two key phases: candidate pattern generation and support (number of occurrences or occurrence frequency) calculation. To reduce the number of candidate patterns, SNP-Miner employs a pattern join strategy. Moreover, to efficiently calculate the support, SNP-Miner uses an incomplete Nettree structure stored in an array, and scans the structure once to avoid redundant calculations and reduce the time complexity. Experimental results show that SNP-Miner not only outperforms competitive algorithms, but can also discover more valuable patterns without user-predefined gap constraints. Algorithms and data can be downloaded from https://github.com/wuc567/Pattern-Mining/tree/master/SNP-Miner.
References
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C-W, Tseng VS (2014) SPMF: A java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
Kim J, Yun U, Yoon E, Lin JC-W, Fournier-Viger P (2020) One scan based high average-utility pattern mining in static and dynamic databases. Futur Gener Comput Syst 111:143–158
Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1(1):54–77
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation. Data Min Knowl Discov 8(1):53–87
Wu M, Wu X (2019) On big wisdom. Knowl Inf Syst 58(1):1–8
Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl Based Syst 115:27–39
Yun U, Lee G, Yoon E (2019) Advanced approach of sliding window based erasable pattern mining with list structure of industrial fields. Inf Sci 494:37–59
Lin JC-W, Pirouz M, Djenouri Y, Cheng C-F, Ahmed U (2020) Incrementally updating the high average-utility patterns with pre-large concept. Appl Intell 50(11):3788–3807
Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl Based Syst 212(5):106548
Srivastava G, Lin J C -W, Pirouz M, Li Y, Yu U (2020) A pre-large weighted-fusion system of sensed high-utility patterns. IEEE Sensors Journal. https://doi.org/10.1109/JSEN.2020.2991045
Srikant R, Agrawal R (1995) Mining sequential patterns. Proc 11th Int Conf Data Eng 1995:3–14
Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst 183(1):104847
Wu Y, Wang Y, Liu J, Yu M, Liu J, Li Y (2019) Mining distinguishing subsequence patterns with nonoverlapping condition. Clust Comput 22:5905–5917
Wu Y, Zhu C, Li Y, Guo L, Wu X (2020) NetNCSP: Nonoverlapping closed sequential pattern mining. Knowl Based Syst 196(105812)
Ji X, Bailey J, Dong G (2005) Mining minimal distinguishing subsequence patterns with gap constraints. Proc 5th IEEE Int Conf Data Min (ICDM) 2005:194–201
Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580
Dong X, Gong Y, Cao L (2020) e-RNSP: An efficient method for mining repetition negative sequential patterns. IEEE Trans Cybern 50(5):2084–2096
Dong X, Qiu P, Lü J, Cao L (2019) Mining top-k useful negative sequential patterns via learning. IEEE Trans Neural Netw Learn Syst 30(9):2764–2778
Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Sci China Inf Sci 60(1):012101
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Discov Data 13(3):25:1–25, 34
Nam H, Yun U, Yoon E, Lin J C -W (2020) Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions. Inf Sci 529:1–27
Lv Z, Qiao L (2020) Analysis of healthcare big data. Futur Gener Comput Syst 109:103–110
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Yu PS (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327
Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1(2):7
Ding B, Lo D, Han J, Khoo S (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. IEEE 25th Int Conf Data Eng 2009:1024–1035
Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: Nonoverlapping Sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822
Shi Q, Shan J, Yan W, Wu Y, Wu X (2020) NetNPG: Nonoverlapping pattern matching with general gap constraints. Appl Intell 50(6):1832–1845
Wu Y, Liu X, Yan W, Guo L, Wu X (2021) Efficient solving algorithm for strict pattern matching under nonoverlapping condition. Journal of Software. https://doi.org/10.13328/j.cnki.jos.006054
Min F, Zhang Z, Zhai W, Shen R (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732
Huang J-W, Jaysawal B, Chen K-Y, Wu Y-B (2019) Mining frequent and top-K high utility time interval-based events with duration patterns. Knowl Inf Syst 61(3):1331–1359
Renz-Wieland A, Bertsch M, Gemull R (2019) Scalable frequent sequence mining with flexible subsequence constraints. IEEE 35th Int Conf Data Eng 2019:1490–1501
Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci 568:239–264
Okolica J, Peterson G, Mills R, Grimaila M (2020) Sequence pattern mining with variables. IEEE Trans Knowl Data Eng 32(1):177–187
Fournier-Viger P, Li Z, Lin JC-W, Kiran RU, Fujita H (2019) Efficient algorithms to identify periodic patterns in multiple sequences. Inf Sci 489:205–226
Wu X, Zhu X, He Y, Zhao P, Arslan AN (2013) PMBC: Pattern Mining from biological sequences with wildcard constraints. Comput Biol Med 43(5):481–492
Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Fournier-Viger P, Li J, Lin JC-W, Truong T, Kiran RU (2020) Mining cost-effective patterns in event logs. Knowl Based Syst 191(105241)
Yu K, Liu L, Li J, Ding W, Le T (2020) Multi-source causal feature selection. IEEE Trans Pattern Anal Mach Intell 42(9):2240–2256
Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data (TKDD) 6(1):2:1–2:39
Xu T, Li T, Dong X (2018) Efficient high utility negative sequential patterns mining in smart campus. IEEE Access 6:23839–23847
Zhang L, Luo P, Tang L, Chen E, Liu Q, Wang M, Xiong H (2015) Occupancy-based frequent pattern mining. ACM Trans Knowl Discov Data 10(2):14:1–14:33
Gan W, Lin JC-W, Zhang J, Yu PS (2020) Utility mining across multi-sequences with individualized thresholds. ACM/IMS Trans Data Sci 1(2):18:1–18:29
Srivastava G, Lin JC-W, Jolfaei A, Li Y, Djenouri Y (2020) Uncertain-driven analytics of sequence data in IoCV environments. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2020.3012387
Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2021) NTP-Miner: Nonoverlapping three-way sequential pattern mining. ACM Trans Knowl Discov Data 16(3):51
Cheng S, Wu Y, Li Y, Yao F, Min F (2021) TWD-SFNN: Three-way decisions with a single hidden layer feedforward neural network. Information Sciences. https://doi.org/10.1016/j.ins.2021.07.091
Wu Y, Geng M, Li Y, Guo L, Li Z, Fournier-Viger P, Zhu X, Wu X (2021) HANP-Miner: High average utility nonoverlapping sequential pattern mining. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2021.107361
Srivastava G, Lin JC-W, Zhang X, Li Y (2020) Large-scale high-utility sequential pattern analytics in Internet of things. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2020.3026826
Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E, Fujita H (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci 543(8):85–105
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144(15):188–205
Wu Y, Wang Y, Li Y, Zhu X, Wu X (2021) Top-k self-adaptive contrast sequential pattern mining. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3082114
Chen X, Rao Y, Xie H, Wang FL, Zhao Y, Yin J (2019) Sentiment classification using negative and intensive sentiment supplement information. Data Sci Eng 4:109–118
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2020) HUOPM: High-Utility occupancy pattern mining. IEEE Trans Cybern 50(3):1195–1208
Gan W, Lin JC-W, Zhang J, Chao H-C, Fujita H, Yu PS (2020) ProUM: Projection-based utility mining on sequence data. Inf Sci 513:222–240
Wu Y, Fan J, Li Y, Guo L, Wu X (2020) NetDAP: (δ, γ)-approximate pattern matching with length constraints. Appl Intell 50(11):4094–4116
Wang H, Duan L, Zuo J, Wang W, Li Z, Tang C (2016) Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin J Comput 39(10):19791991
Wu Y, Lei R, Li Y, Guo L, Wu X (2021) HAOP-Miner: Self-adaptive high-average utility one-off sequential pattern mining. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2021.115449
Dinh D-T, Le B, Fournier-Viger P, Huynh V-N (2018) An efficient algorithm for mining periodic high-utility sequential patterns. Appl Intell 48(12):4694–4714
Lin JC-W, Li T, Pirouz M, Zhang J, Fournier-Viger P (2020) High average-utility sequential pattern mining based on uncertain databases. Knowl Inf Syst 62(3):1199–1228
Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056
Yun U, Nam H, Kim J, Kim H, Pedrycz W (2020) Efficient transaction deleting approach of pre-large based high utility pattern mining in dynamic databases. Futur Gener Comput Syst 103:58–78
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160
Min F, Wu Y, Wu X (2010) The Apriori property of sequence pattern mining with wildcard gaps. IEEE Int Conf Bioinform Biomed Workshop 2010:138–143
Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39(1):57–74
Wu X, Wang X, Li Y, Guo L, Li Z, Zhang J, Wu X (2021) OWSP-Miner: Self-adaptive one-off weak-gap strong pattern mining. ACM Transactions on Management Information Systems. https://doi.org/10.1145/3476247
Hoang T, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52
Liu H, Liu Z, Huang H, Wu X (2018) Sequential pattern matching with general gap and one-off condition. J Softw 29:363–382
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42:31–60
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Wittkop T, Baumbach J, Lobo F, Rahmann S (2007) Large scale clustering of protein sequences with FORCE-a layout based heuristic for weighted cluster editing. BMC Bioinform 8(1):396
Heimerl F, Lohmann S, Lange S, Ertl T (2014) Word cloud explorer: Text analytics based on word clouds. 2014 47th Hawaii Int Conf Syst Sci 2014:1833–1842
Acknowledgements
This work was supported by National Social Science Fund of China under grant number 18BGL191.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Wu, Y., Li, Y. et al. Self-adaptive nonoverlapping sequential pattern mining. Appl Intell 52, 6646–6661 (2022). https://doi.org/10.1007/s10489-021-02763-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02763-y