Skip to main content

Advertisement

A tree-based framework to mine top-K closed sequential patterns

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Top-K closed sequential pattern (CSP) mining addresses the challenge of reducing the number of mined patterns and the dependency on the support threshold parameter. This study tackles top-K CSP mining from three angles: top-K generic CSPs, group CSPs, and redundancy-aware CSPs. We propose the novel SP-Tree-based KCloTreeMiner to mine these variations and introduce the PaMHep data structure for efficient candidate pattern maintenance. Two pruning strategies—namely, pattern absorption and SP-Tree-based temporary node projection—are also presented to reduce search space. This study offers a thorough theoretical analysis and establishes bounds for the top-K framework, covering everything from solution design to completeness and optimization. Evaluations on six real-life datasets show up to a 23% average runtime improvement for KCloTreeMiner over the benchmark algorithm TKCS. We also propose two greedy algorithms \(Max_{WC}\) and \(Max_{WOC}\) for pattern summarization and introduce Subset Distance for measuring distances between sequential patterns, improving K-medoid clustering results over average silhouette-width for the reported clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 3
Algorithm 5
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

All the used data are publicly available and accessible.

Notes

  1. Implementation will be available https://github.com/rizveeredwan/top-k-closed-tree-miner

References

  1. Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870

    Article  MathSciNet  Google Scholar 

  2. Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247

    Article  Google Scholar 

  3. Arefin MF, Ahmed CF, Rizvee RA, Leung CK, Cao L (2022) Mining contextual item similarity without concept hierarchy. In: Proceedings of the 16th International Conference on Ubiquitous Information Management and Communication (IMCOM 2022). IEEE, pp 229-236

  4. Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 429–435

  5. Chen Y, Liu Z, Li J, McAuley J, Xiong C (2022) Intent contrastive learning for sequential recommendation. In: Proceedings of the ACM Web Conference 2022. pp 2172–2182

  6. Dhanaraj RK, Ramakrishnan V, Poongodi M, Krishnasamy L, Hamdi M, Kotecha K, Vijayakumar V (2021) Random forest bagging and x-means clustered antipattern detection from SQL query log for accessing secure mobile data. Wirel Commun Mob Comput 1–9:2021

  7. Djenouri Y, Lin JC-W, Nørvåg K, Ramampiaro H, Yu PS (2021) Exploring decomposition for solving pattern mining problems. ACM Trans Manag Inform Syst 12(2):1–36

    Article  Google Scholar 

  8. Ezugwu AE, Ikotun AM, Oyelade OO, Abualigah L, Agushaka JO, Eke CI, Akinyelu AA (2022) A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng Appl Artif Intell 110:104743

    Article  Google Scholar 

  9. Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2014), Part I. Springer, pp 40–52

  10. Fournier-Viger P, Gomariz A, Gueniche T, Mwamikazi E, Thomas R (2013) TKS: efficient mining of top-k sequential patterns. In: Proceedings of the 9th International Conference on Advanced Data Mining and Applications (ADMA 2013), Part I. Springer, pp 109–120

  11. Fournier-Viger P, Lin JC-W, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD) 2016, Part III. Springer, pp 36–40

  12. Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1(1):54–77

    Google Scholar 

  13. Fu L, Wang X, Zhao H, Li M (2022) Interactions among safety risks in metro deep foundation pit projects: An association rule mining-based modeling framework. Reliabil Eng Syst Safety 221:108381

    Article  Google Scholar 

  14. Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48:429–463

    Article  Google Scholar 

  15. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Disc Data 13(3):1–34

    Article  Google Scholar 

  16. Gomasta SS, Dhali A, Anwar MM, Sarker IH (2022) Query-oriented topical influential users detection for top-k trending topics in twitter. Appl Intell 52(12):13415–13434

    Article  Google Scholar 

  17. Guo W, Che H, Leung M-F (2024) Tensor-based adaptive consensus graph learning for multi-view clustering. IEEE Trans Consum Electron 70(2):4767–4784

    Article  Google Scholar 

  18. Guo W, Che H, Leung M-F, Yan Z (2024) Adaptive multi-view subspace learning based on distributed optimization. Internet Things 26:101203

    Article  Google Scholar 

  19. Huang G-Y, Yang F, Hu C-Z, Ren J-D (2010) Fast discovery of frequent closed sequential patterns based on positional data. In: Proceedings of the 2010 International Conference on Machine Learning and Cybernetics (ICMLC), volume 1. IEEE, pp 444–449

  20. Huang S, Gan W, Miao J, Han X, Fournier-Viger P (2023) Targeted mining of top-k high utility itemsets. Eng Appl Artif Intell 126:107047

    Article  Google Scholar 

  21. Huynh B, Vo B, Snasel V (2017) An efficient parallel method for mining frequent closed sequential patterns. IEEE Access 5:17392–17402

    Article  Google Scholar 

  22. Huynh U, Le B, Dinh D-T, Fujita H (2022) Multi-core parallel algorithms for hiding high-utility sequential patterns. Knowl-Based Syst 237:107793

    Article  Google Scholar 

  23. Ishita SZ, Ahmed CF, Leung CK (2022) New approaches for mining regular high utility sequential patterns. Appl Intell 52(4):3781–3806

    Article  Google Scholar 

  24. Islam MA, Rafi MR, Azad A-A, Ovi JA (2022) Weighted frequent sequential pattern mining. Appl Intell 52(1):254–281

    Article  Google Scholar 

  25. Ke Y-H, Huang J-W, Lin W-C, Jaysawal BP (2020) Finding possible promoter binding sites in DNA sequences by sequential patterns mining with specific numbers of gaps. IEEE/ACM Trans Comput Biol Bioinf 18(6):2459–2470

    Article  Google Scholar 

  26. Kumar P, Krishna PR, Bapi RS, De SK (2007) Rough clustering of sequential data. Data Knowl Eng 63(2):183–199

    Article  Google Scholar 

  27. Le T, Vo B, Huynh V-N, Nguyen NT, Baik SW (2020) Mining top-k frequent patterns from uncertain databases. Appl Intell 50:1487–1497

    Article  Google Scholar 

  28. Lin JC-W, Djenouri Y, Srivastava G, Fourier-Viger P (2022) Efficient evolutionary computation model of closed high-utility itemset mining. Appl Intell 52(9):10604–10616

    Article  Google Scholar 

  29. Lin JC-W, Djenouri Y, Srivastava G, Li Y, Yu PS (2021) Scalable mining of high-utility sequential patterns with three-tier mapreduce model. ACM Trans Knowl Disc Data 16(3):1–26

    Google Scholar 

  30. Liu Z, Ma Y, Zheng H, Liu D, Liu J (2022) Human resource recommendation algorithm based on improved frequent itemset mining. Futur Gener Comput Syst 126:284–288

    Article  Google Scholar 

  31. Pamalla V, Rage UK, Penugonda R, Palla L, Hayamizu Y, Goda K, Toyoda M, Zettsu K, Sourabh S (2023) A fundamental approach to discover closed periodic-frequent patterns in very large temporal databases. Appl Intell 53(22):27344–27373

    Article  Google Scholar 

  32. Pan B, Li C, Che H (2024) Error-robust multi-view subspace clustering with nonconvex low-rank tensor approximation and hyper-Laplacian graph embedding. Eng Appl Artif Intell 133:108274

    Article  Google Scholar 

  33. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  34. Pham T-T, Do T, Nguyen A, Vo B, Hong T-P (2020) An efficient method for mining top-k closed sequential patterns. IEEE Access 8:118156–118163

    Article  Google Scholar 

  35. Rizvee RA, Ahmed CF, Arefin MF, Leung CK (2024) A new tree-based approach to mine sequential patterns. Expert Syst Appl 242:122754

    Article  Google Scholar 

  36. Rizvee RA, Arefin MF, Ahmed CF (2020) Tree-miner: Mining sequential patterns from SP-Tree. In: Proceedings of the 24th Pacific-Asia Conference in Knowledge Discovery and Data Mining (PAKDD 2020), Part II. Springer, pp 44–56

  37. Roy KK, Moon MHH, Rahman MM, Ahmed CF, Leung CK-S (2022) Mining weighted sequential patterns in incremental uncertain databases. Inf Sci 582:865–896

    Article  Google Scholar 

  38. Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the Fifth International Conference on Extending Database Technology (EDBT 1996). Springer, pp 1–17

  39. Thiet PT (2016) Applying the attributed prefix tree for mining closed sequential patterns. Viet J Sci Technol 54(3A):106–106

    Article  Google Scholar 

  40. Tripathy B et al (2019) Fuzzy clustering of sequential data. Int J Intell Syst Appl 11(1):43

    MathSciNet  Google Scholar 

  41. Tzvetkov P, Yan X, Han J (2005) TSP: Mining top-k closed sequential patterns. Knowl Inf Syst 7:438–457

    Article  Google Scholar 

  42. Wang J, Fang S, Liu C, Qin J, Li X, Shi Z (2020) Top-k closed co-occurrence patterns mining with differential privacy over multiple streams. Futur Gener Comput Syst 111:339–351

    Article  Google Scholar 

  43. Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: Proceedings of 20th International Conference on Data Engineering (ICDE 2004). IEEE, pp 79–90

  44. Wang T, Duan L, Dong G, Bao Z (2020) Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Trans Knowl Disc Data 14(5):1–26

    Article  Google Scholar 

  45. Wang Y, Chen C, Lai J, Fu L, Zhou Y, Zheng Z (2023) A self-representation method with local similarity preserving for fast multi-view outlier detection. ACM Trans Knowl Disc Data 17(1):2.1–2.20

  46. Wiroonsri N (2024) Clustering performance analysis using a new correlation-based cluster validity index. Pattern Recogn 145:109910

  47. Wu Y, Chen M, Li Y, Liu J, Li Z, Li J, Wu X (2023) ONP-Miner: One-off negative sequential pattern mining. ACM Trans Knowl Discov Data 17(3):1–24

    Article  Google Scholar 

  48. Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2021) NTP-Miner: nonoverlapping three-way sequential pattern mining. ACM Trans Knowl Disc Data 16(3):1–21

    Google Scholar 

  49. Wu Y, Wang Y, Li Y, Zhu X, Wu X (2021) Top-k self-adaptive contrast sequential pattern mining. IEEE Trans Cybern 52(11):11819–11833

  50. Yan X, Han J, Afshar R (2003) CloSpan: Mining closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM International Conference on Data Mining (SDM 2003), pp 166–177

  51. Yang X, Che H, Leung M-F, Wen S (2024) Self-paced regularized adaptive multi-view unsupervised feature selection. Neural Netw 175:106295

    Article  Google Scholar 

  52. Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60

    Article  Google Scholar 

  53. Zhang C, Du Z, Gan W, Philip SY (2021) TKUS: Mining top-k high utility sequential patterns. Inf Sci 570:342–359

    Article  MathSciNet  Google Scholar 

  54. Huang G, Gan W, TaSPM PSYu (2024) Targeted sequential pattern mining. ACM Trans Knowl Disc Data 18(5):1–18

  55. Djenouri Y, Belhadi A, Srivastava G, Lin JC (2023) Advanced pattern-mining system for fake news analysis. IEEE Trans Comput Soc Syst 10(6):2949–2958

  56. Sun C, Ren X, Dong X, Qiu P, Wu X, Zhao L, Guo Y, Gong Y, Zhang C (2024) Mining actionable repetitive positive and negative sequential patterns. Knowl-Based Syst 302:112398. Elsevier

  57. Huang G, Gan W, Yu PS (2024) TaSPM: Targeted sequential pattern mining. ACM Trans Knowl Disc Data 18(5):114

    Google Scholar 

  58. Hu K, Gan W, Huang S, Peng H, Fournier-Viger P (2024) Targeted mining of contiguous sequential patterns. Inform Sci 653:119791. Elsevier

  59. Chen Z, Gan W, Huang G, Zheng Y, Yu PS (2024) Towards utility-driven contiguous sequential patterns in uncertain multi-sequences. Knowl-Based Syst 284:111314. Elsevier

  60. Zhang C, Lyu M, Gan W, Yu PS (2024) Totally-ordered sequential rules for utility maximization. ACM Trans Knowl Disc Data 18(4):80

    Google Scholar 

  61. Wan X, Han X (2024) Efficient top-k frequent itemset mining on massive data. Data Sci Eng 9:177–203. Springer

Download references

Acknowledgements

This work is partially supported by (a) Natural Sciences and Engineering Research Council of Canada (NSERC) and (b) University of Manitoba. We would like to thank all the reviewers for their valuable time and suggestions to help improve the current article.

Author information

Authors and Affiliations

Authors

Contributions

Redwan Ahmed Rizvee: Conceptualization, Methodology, Software, Validation, Formal Analysis, Investigation, Data Curation, Writing - Original Draft. Chowdhury Farhan Ahmed: Supervision, Investigation, Conceptualization, Resources, Writing - Review and Editing. Carson K. Leung: Supervision, Investigation, Conceptualization, Writing - Review and Editing, Funding Acquisition.

Corresponding author

Correspondence to Chowdhury Farhan Ahmed.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed Rizvee, R., Farhan Ahmed, C. & Leung, C.K. A tree-based framework to mine top-K closed sequential patterns. Appl Intell 55, 221 (2025). https://doi.org/10.1007/s10489-024-06137-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06137-y

Keywords