Skip to main content
Log in

Self-adaptive nonoverlapping sequential pattern mining

Applied Intelligence Aims and scope Submit manuscript

Abstract

Repetitive sequential pattern mining (SPM) with gap constraints is a data analysis task that consists of identifying patterns (subsequences) appearing many times in a discrete sequence of symbols or events. By using gap constraints, the user can filter many meaningless patterns, and focus on those that are the most interesting for his needs. However, it is difficult to set appropriate gap constraints without prior knowledge. Hence, users generally find suitable constraints by trial and error, which is time-consuming. Besides, current algorithms are inefficient as they repeatedly check whether the gap constraints are satisfied. To address these problems, this paper presents a complete algorithm called SNP-Miner that has two key phases: candidate pattern generation and support (number of occurrences or occurrence frequency) calculation. To reduce the number of candidate patterns, SNP-Miner employs a pattern join strategy. Moreover, to efficiently calculate the support, SNP-Miner uses an incomplete Nettree structure stored in an array, and scans the structure once to avoid redundant calculations and reduce the time complexity. Experimental results show that SNP-Miner not only outperforms competitive algorithms, but can also discover more valuable patterns without user-predefined gap constraints. Algorithms and data can be downloaded from https://github.com/wuc567/Pattern-Mining/tree/master/SNP-Miner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C-W, Tseng VS (2014) SPMF: A java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393

    MATH  Google Scholar 

  2. Kim J, Yun U, Yoon E, Lin JC-W, Fournier-Viger P (2020) One scan based high average-utility pattern mining in static and dynamic databases. Futur Gener Comput Syst 111:143–158

    Article  Google Scholar 

  3. Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1(1):54–77

    Google Scholar 

  4. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  5. Wu M, Wu X (2019) On big wisdom. Knowl Inf Syst 58(1):1–8

    Article  Google Scholar 

  6. Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl Based Syst 115:27–39

    Article  Google Scholar 

  7. Yun U, Lee G, Yoon E (2019) Advanced approach of sliding window based erasable pattern mining with list structure of industrial fields. Inf Sci 494:37–59

    Article  Google Scholar 

  8. Lin JC-W, Pirouz M, Djenouri Y, Cheng C-F, Ahmed U (2020) Incrementally updating the high average-utility patterns with pre-large concept. Appl Intell 50(11):3788–3807

    Article  Google Scholar 

  9. Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl Based Syst 212(5):106548

  10. Srivastava G, Lin J C -W, Pirouz M, Li Y, Yu U (2020) A pre-large weighted-fusion system of sensed high-utility patterns. IEEE Sensors Journal. https://doi.org/10.1109/JSEN.2020.2991045

  11. Srikant R, Agrawal R (1995) Mining sequential patterns. Proc 11th Int Conf Data Eng 1995:3–14

    Google Scholar 

  12. Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst 183(1):104847

  13. Wu Y, Wang Y, Liu J, Yu M, Liu J, Li Y (2019) Mining distinguishing subsequence patterns with nonoverlapping condition. Clust Comput 22:5905–5917

    Article  Google Scholar 

  14. Wu Y, Zhu C, Li Y, Guo L, Wu X (2020) NetNCSP: Nonoverlapping closed sequential pattern mining. Knowl Based Syst 196(105812)

  15. Ji X, Bailey J, Dong G (2005) Mining minimal distinguishing subsequence patterns with gap constraints. Proc 5th IEEE Int Conf Data Min (ICDM) 2005:194–201

    Google Scholar 

  16. Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580

    Article  Google Scholar 

  17. Dong X, Gong Y, Cao L (2020) e-RNSP: An efficient method for mining repetition negative sequential patterns. IEEE Trans Cybern 50(5):2084–2096

  18. Dong X, Qiu P, Lü J, Cao L (2019) Mining top-k useful negative sequential patterns via learning. IEEE Trans Neural Netw Learn Syst 30(9):2764–2778

    Article  Google Scholar 

  19. Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Sci China Inf Sci 60(1):012101

  20. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Discov Data 13(3):25:1–25, 34

  21. Nam H, Yun U, Yoon E, Lin J C -W (2020) Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions. Inf Sci 529:1–27

    Article  MathSciNet  MATH  Google Scholar 

  22. Lv Z, Qiao L (2020) Analysis of healthcare big data. Futur Gener Comput Syst 109:103–110

    Article  Google Scholar 

  23. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Yu PS (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327

    Article  Google Scholar 

  24. Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1(2):7

    Article  Google Scholar 

  25. Ding B, Lo D, Han J, Khoo S (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. IEEE 25th Int Conf Data Eng 2009:1024–1035

    Google Scholar 

  26. Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: Nonoverlapping Sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822

    Article  Google Scholar 

  27. Shi Q, Shan J, Yan W, Wu Y, Wu X (2020) NetNPG: Nonoverlapping pattern matching with general gap constraints. Appl Intell 50(6):1832–1845

    Article  Google Scholar 

  28. Wu Y, Liu X, Yan W, Guo L, Wu X (2021) Efficient solving algorithm for strict pattern matching under nonoverlapping condition. Journal of Software. https://doi.org/10.13328/j.cnki.jos.006054

  29. Min F, Zhang Z, Zhai W, Shen R (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732

    Article  MathSciNet  Google Scholar 

  30. Huang J-W, Jaysawal B, Chen K-Y, Wu Y-B (2019) Mining frequent and top-K high utility time interval-based events with duration patterns. Knowl Inf Syst 61(3):1331–1359

    Article  Google Scholar 

  31. Renz-Wieland A, Bertsch M, Gemull R (2019) Scalable frequent sequence mining with flexible subsequence constraints. IEEE 35th Int Conf Data Eng 2019:1490–1501

    Google Scholar 

  32. Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci 568:239–264

    Article  MathSciNet  Google Scholar 

  33. Okolica J, Peterson G, Mills R, Grimaila M (2020) Sequence pattern mining with variables. IEEE Trans Knowl Data Eng 32(1):177–187

    Article  Google Scholar 

  34. Fournier-Viger P, Li Z, Lin JC-W, Kiran RU, Fujita H (2019) Efficient algorithms to identify periodic patterns in multiple sequences. Inf Sci 489:205–226

    Article  MathSciNet  MATH  Google Scholar 

  35. Wu X, Zhu X, He Y, Zhao P, Arslan AN (2013) PMBC: Pattern Mining from biological sequences with wildcard constraints. Comput Biol Med 43(5):481–492

    Article  Google Scholar 

  36. Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107

    Article  Google Scholar 

  37. Fournier-Viger P, Li J, Lin JC-W, Truong T, Kiran RU (2020) Mining cost-effective patterns in event logs. Knowl Based Syst 191(105241)

  38. Yu K, Liu L, Li J, Ding W, Le T (2020) Multi-source causal feature selection. IEEE Trans Pattern Anal Mach Intell 42(9):2240–2256

    Article  Google Scholar 

  39. Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data (TKDD) 6(1):2:1–2:39

  40. Xu T, Li T, Dong X (2018) Efficient high utility negative sequential patterns mining in smart campus. IEEE Access 6:23839–23847

    Article  Google Scholar 

  41. Zhang L, Luo P, Tang L, Chen E, Liu Q, Wang M, Xiong H (2015) Occupancy-based frequent pattern mining. ACM Trans Knowl Discov Data 10(2):14:1–14:33

  42. Gan W, Lin JC-W, Zhang J, Yu PS (2020) Utility mining across multi-sequences with individualized thresholds. ACM/IMS Trans Data Sci 1(2):18:1–18:29

  43. Srivastava G, Lin JC-W, Jolfaei A, Li Y, Djenouri Y (2020) Uncertain-driven analytics of sequence data in IoCV environments. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2020.3012387

  44. Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2021) NTP-Miner: Nonoverlapping three-way sequential pattern mining. ACM Trans Knowl Discov Data 16(3):51

    Google Scholar 

  45. Cheng S, Wu Y, Li Y, Yao F, Min F (2021) TWD-SFNN: Three-way decisions with a single hidden layer feedforward neural network. Information Sciences. https://doi.org/10.1016/j.ins.2021.07.091

  46. Wu Y, Geng M, Li Y, Guo L, Li Z, Fournier-Viger P, Zhu X, Wu X (2021) HANP-Miner: High average utility nonoverlapping sequential pattern mining. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2021.107361

  47. Srivastava G, Lin JC-W, Zhang X, Li Y (2020) Large-scale high-utility sequential pattern analytics in Internet of things. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2020.3026826

  48. Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E, Fujita H (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci 543(8):85–105

    Article  Google Scholar 

  49. Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144(15):188–205

    Article  Google Scholar 

  50. Wu Y, Wang Y, Li Y, Zhu X, Wu X (2021) Top-k self-adaptive contrast sequential pattern mining. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3082114

  51. Chen X, Rao Y, Xie H, Wang FL, Zhao Y, Yin J (2019) Sentiment classification using negative and intensive sentiment supplement information. Data Sci Eng 4:109–118

    Article  Google Scholar 

  52. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2020) HUOPM: High-Utility occupancy pattern mining. IEEE Trans Cybern 50(3):1195–1208

    Article  Google Scholar 

  53. Gan W, Lin JC-W, Zhang J, Chao H-C, Fujita H, Yu PS (2020) ProUM: Projection-based utility mining on sequence data. Inf Sci 513:222–240

    Article  Google Scholar 

  54. Wu Y, Fan J, Li Y, Guo L, Wu X (2020) NetDAP: (δ, γ)-approximate pattern matching with length constraints. Appl Intell 50(11):4094–4116

    Article  Google Scholar 

  55. Wang H, Duan L, Zuo J, Wang W, Li Z, Tang C (2016) Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin J Comput 39(10):19791991

  56. Wu Y, Lei R, Li Y, Guo L, Wu X (2021) HAOP-Miner: Self-adaptive high-average utility one-off sequential pattern mining. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2021.115449

  57. Dinh D-T, Le B, Fournier-Viger P, Huynh V-N (2018) An efficient algorithm for mining periodic high-utility sequential patterns. Appl Intell 48(12):4694–4714

    Article  MATH  Google Scholar 

  58. Lin JC-W, Li T, Pirouz M, Zhang J, Fournier-Viger P (2020) High average-utility sequential pattern mining based on uncertain databases. Knowl Inf Syst 62(3):1199–1228

    Article  Google Scholar 

  59. Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056

    Article  Google Scholar 

  60. Yun U, Nam H, Kim J, Kim H, Pedrycz W (2020) Efficient transaction deleting approach of pre-large based high utility pattern mining in dynamic databases. Futur Gener Comput Syst 103:58–78

    Article  Google Scholar 

  61. Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160

    Article  Google Scholar 

  62. Min F, Wu Y, Wu X (2010) The Apriori property of sequence pattern mining with wildcard gaps. IEEE Int Conf Bioinform Biomed Workshop 2010:138–143

    Google Scholar 

  63. Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39(1):57–74

    Article  Google Scholar 

  64. Wu X, Wang X, Li Y, Guo L, Li Z, Zhang J, Wu X (2021) OWSP-Miner: Self-adaptive one-off weak-gap strong pattern mining. ACM Transactions on Management Information Systems. https://doi.org/10.1145/3476247

  65. Hoang T, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52

    Article  MathSciNet  MATH  Google Scholar 

  66. Liu H, Liu Z, Huang H, Wu X (2018) Sequential pattern matching with general gap and one-off condition. J Softw 29:363–382

    MathSciNet  Google Scholar 

  67. Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42:31–60

    Article  MATH  Google Scholar 

  68. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  69. Wittkop T, Baumbach J, Lobo F, Rahmann S (2007) Large scale clustering of protein sequences with FORCE-a layout based heuristic for weighted cluster editing. BMC Bioinform 8(1):396

    Article  Google Scholar 

  70. Heimerl F, Lohmann S, Lange S, Ertl T (2014) Word cloud explorer: Text analytics based on word clouds. 2014 47th Hawaii Int Conf Syst Sci 2014:1833–1842

Download references

Acknowledgements

This work was supported by National Social Science Fund of China under grant number 18BGL191.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youxi Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Wu, Y., Li, Y. et al. Self-adaptive nonoverlapping sequential pattern mining. Appl Intell 52, 6646–6661 (2022). https://doi.org/10.1007/s10489-021-02763-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02763-y

Keywords

Navigation