Abstract
Top-K closed sequential pattern (CSP) mining addresses the challenge of reducing the number of mined patterns and the dependency on the support threshold parameter. This study tackles top-K CSP mining from three angles: top-K generic CSPs, group CSPs, and redundancy-aware CSPs. We propose the novel SP-Tree-based KCloTreeMiner to mine these variations and introduce the PaMHep data structure for efficient candidate pattern maintenance. Two pruning strategies—namely, pattern absorption and SP-Tree-based temporary node projection—are also presented to reduce search space. This study offers a thorough theoretical analysis and establishes bounds for the top-K framework, covering everything from solution design to completeness and optimization. Evaluations on six real-life datasets show up to a 23% average runtime improvement for KCloTreeMiner over the benchmark algorithm TKCS. We also propose two greedy algorithms \(Max_{WC}\) and \(Max_{WOC}\) for pattern summarization and introduce Subset Distance for measuring distances between sequential patterns, improving K-medoid clustering results over average silhouette-width for the reported clusters.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
All the used data are publicly available and accessible.
Notes
Implementation will be available https://github.com/rizveeredwan/top-k-closed-tree-miner
References
Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870
Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247
Arefin MF, Ahmed CF, Rizvee RA, Leung CK, Cao L (2022) Mining contextual item similarity without concept hierarchy. In: Proceedings of the 16th International Conference on Ubiquitous Information Management and Communication (IMCOM 2022). IEEE, pp 229-236
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 429–435
Chen Y, Liu Z, Li J, McAuley J, Xiong C (2022) Intent contrastive learning for sequential recommendation. In: Proceedings of the ACM Web Conference 2022. pp 2172–2182
Dhanaraj RK, Ramakrishnan V, Poongodi M, Krishnasamy L, Hamdi M, Kotecha K, Vijayakumar V (2021) Random forest bagging and x-means clustered antipattern detection from SQL query log for accessing secure mobile data. Wirel Commun Mob Comput 1–9:2021
Djenouri Y, Lin JC-W, Nørvåg K, Ramampiaro H, Yu PS (2021) Exploring decomposition for solving pattern mining problems. ACM Trans Manag Inform Syst 12(2):1–36
Ezugwu AE, Ikotun AM, Oyelade OO, Abualigah L, Agushaka JO, Eke CI, Akinyelu AA (2022) A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng Appl Artif Intell 110:104743
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2014), Part I. Springer, pp 40–52
Fournier-Viger P, Gomariz A, Gueniche T, Mwamikazi E, Thomas R (2013) TKS: efficient mining of top-k sequential patterns. In: Proceedings of the 9th International Conference on Advanced Data Mining and Applications (ADMA 2013), Part I. Springer, pp 109–120
Fournier-Viger P, Lin JC-W, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD) 2016, Part III. Springer, pp 36–40
Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1(1):54–77
Fu L, Wang X, Zhao H, Li M (2022) Interactions among safety risks in metro deep foundation pit projects: An association rule mining-based modeling framework. Reliabil Eng Syst Safety 221:108381
Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48:429–463
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Trans Knowl Disc Data 13(3):1–34
Gomasta SS, Dhali A, Anwar MM, Sarker IH (2022) Query-oriented topical influential users detection for top-k trending topics in twitter. Appl Intell 52(12):13415–13434
Guo W, Che H, Leung M-F (2024) Tensor-based adaptive consensus graph learning for multi-view clustering. IEEE Trans Consum Electron 70(2):4767–4784
Guo W, Che H, Leung M-F, Yan Z (2024) Adaptive multi-view subspace learning based on distributed optimization. Internet Things 26:101203
Huang G-Y, Yang F, Hu C-Z, Ren J-D (2010) Fast discovery of frequent closed sequential patterns based on positional data. In: Proceedings of the 2010 International Conference on Machine Learning and Cybernetics (ICMLC), volume 1. IEEE, pp 444–449
Huang S, Gan W, Miao J, Han X, Fournier-Viger P (2023) Targeted mining of top-k high utility itemsets. Eng Appl Artif Intell 126:107047
Huynh B, Vo B, Snasel V (2017) An efficient parallel method for mining frequent closed sequential patterns. IEEE Access 5:17392–17402
Huynh U, Le B, Dinh D-T, Fujita H (2022) Multi-core parallel algorithms for hiding high-utility sequential patterns. Knowl-Based Syst 237:107793
Ishita SZ, Ahmed CF, Leung CK (2022) New approaches for mining regular high utility sequential patterns. Appl Intell 52(4):3781–3806
Islam MA, Rafi MR, Azad A-A, Ovi JA (2022) Weighted frequent sequential pattern mining. Appl Intell 52(1):254–281
Ke Y-H, Huang J-W, Lin W-C, Jaysawal BP (2020) Finding possible promoter binding sites in DNA sequences by sequential patterns mining with specific numbers of gaps. IEEE/ACM Trans Comput Biol Bioinf 18(6):2459–2470
Kumar P, Krishna PR, Bapi RS, De SK (2007) Rough clustering of sequential data. Data Knowl Eng 63(2):183–199
Le T, Vo B, Huynh V-N, Nguyen NT, Baik SW (2020) Mining top-k frequent patterns from uncertain databases. Appl Intell 50:1487–1497
Lin JC-W, Djenouri Y, Srivastava G, Fourier-Viger P (2022) Efficient evolutionary computation model of closed high-utility itemset mining. Appl Intell 52(9):10604–10616
Lin JC-W, Djenouri Y, Srivastava G, Li Y, Yu PS (2021) Scalable mining of high-utility sequential patterns with three-tier mapreduce model. ACM Trans Knowl Disc Data 16(3):1–26
Liu Z, Ma Y, Zheng H, Liu D, Liu J (2022) Human resource recommendation algorithm based on improved frequent itemset mining. Futur Gener Comput Syst 126:284–288
Pamalla V, Rage UK, Penugonda R, Palla L, Hayamizu Y, Goda K, Toyoda M, Zettsu K, Sourabh S (2023) A fundamental approach to discover closed periodic-frequent patterns in very large temporal databases. Appl Intell 53(22):27344–27373
Pan B, Li C, Che H (2024) Error-robust multi-view subspace clustering with nonconvex low-rank tensor approximation and hyper-Laplacian graph embedding. Eng Appl Artif Intell 133:108274
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Pham T-T, Do T, Nguyen A, Vo B, Hong T-P (2020) An efficient method for mining top-k closed sequential patterns. IEEE Access 8:118156–118163
Rizvee RA, Ahmed CF, Arefin MF, Leung CK (2024) A new tree-based approach to mine sequential patterns. Expert Syst Appl 242:122754
Rizvee RA, Arefin MF, Ahmed CF (2020) Tree-miner: Mining sequential patterns from SP-Tree. In: Proceedings of the 24th Pacific-Asia Conference in Knowledge Discovery and Data Mining (PAKDD 2020), Part II. Springer, pp 44–56
Roy KK, Moon MHH, Rahman MM, Ahmed CF, Leung CK-S (2022) Mining weighted sequential patterns in incremental uncertain databases. Inf Sci 582:865–896
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the Fifth International Conference on Extending Database Technology (EDBT 1996). Springer, pp 1–17
Thiet PT (2016) Applying the attributed prefix tree for mining closed sequential patterns. Viet J Sci Technol 54(3A):106–106
Tripathy B et al (2019) Fuzzy clustering of sequential data. Int J Intell Syst Appl 11(1):43
Tzvetkov P, Yan X, Han J (2005) TSP: Mining top-k closed sequential patterns. Knowl Inf Syst 7:438–457
Wang J, Fang S, Liu C, Qin J, Li X, Shi Z (2020) Top-k closed co-occurrence patterns mining with differential privacy over multiple streams. Futur Gener Comput Syst 111:339–351
Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: Proceedings of 20th International Conference on Data Engineering (ICDE 2004). IEEE, pp 79–90
Wang T, Duan L, Dong G, Bao Z (2020) Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Trans Knowl Disc Data 14(5):1–26
Wang Y, Chen C, Lai J, Fu L, Zhou Y, Zheng Z (2023) A self-representation method with local similarity preserving for fast multi-view outlier detection. ACM Trans Knowl Disc Data 17(1):2.1–2.20
Wiroonsri N (2024) Clustering performance analysis using a new correlation-based cluster validity index. Pattern Recogn 145:109910
Wu Y, Chen M, Li Y, Liu J, Li Z, Li J, Wu X (2023) ONP-Miner: One-off negative sequential pattern mining. ACM Trans Knowl Discov Data 17(3):1–24
Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2021) NTP-Miner: nonoverlapping three-way sequential pattern mining. ACM Trans Knowl Disc Data 16(3):1–21
Wu Y, Wang Y, Li Y, Zhu X, Wu X (2021) Top-k self-adaptive contrast sequential pattern mining. IEEE Trans Cybern 52(11):11819–11833
Yan X, Han J, Afshar R (2003) CloSpan: Mining closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM International Conference on Data Mining (SDM 2003), pp 166–177
Yang X, Che H, Leung M-F, Wen S (2024) Self-paced regularized adaptive multi-view unsupervised feature selection. Neural Netw 175:106295
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60
Zhang C, Du Z, Gan W, Philip SY (2021) TKUS: Mining top-k high utility sequential patterns. Inf Sci 570:342–359
Huang G, Gan W, TaSPM PSYu (2024) Targeted sequential pattern mining. ACM Trans Knowl Disc Data 18(5):1–18
Djenouri Y, Belhadi A, Srivastava G, Lin JC (2023) Advanced pattern-mining system for fake news analysis. IEEE Trans Comput Soc Syst 10(6):2949–2958
Sun C, Ren X, Dong X, Qiu P, Wu X, Zhao L, Guo Y, Gong Y, Zhang C (2024) Mining actionable repetitive positive and negative sequential patterns. Knowl-Based Syst 302:112398. Elsevier
Huang G, Gan W, Yu PS (2024) TaSPM: Targeted sequential pattern mining. ACM Trans Knowl Disc Data 18(5):114
Hu K, Gan W, Huang S, Peng H, Fournier-Viger P (2024) Targeted mining of contiguous sequential patterns. Inform Sci 653:119791. Elsevier
Chen Z, Gan W, Huang G, Zheng Y, Yu PS (2024) Towards utility-driven contiguous sequential patterns in uncertain multi-sequences. Knowl-Based Syst 284:111314. Elsevier
Zhang C, Lyu M, Gan W, Yu PS (2024) Totally-ordered sequential rules for utility maximization. ACM Trans Knowl Disc Data 18(4):80
Wan X, Han X (2024) Efficient top-k frequent itemset mining on massive data. Data Sci Eng 9:177–203. Springer
Acknowledgements
This work is partially supported by (a) Natural Sciences and Engineering Research Council of Canada (NSERC) and (b) University of Manitoba. We would like to thank all the reviewers for their valuable time and suggestions to help improve the current article.
Author information
Authors and Affiliations
Contributions
Redwan Ahmed Rizvee: Conceptualization, Methodology, Software, Validation, Formal Analysis, Investigation, Data Curation, Writing - Original Draft. Chowdhury Farhan Ahmed: Supervision, Investigation, Conceptualization, Resources, Writing - Review and Editing. Carson K. Leung: Supervision, Investigation, Conceptualization, Writing - Review and Editing, Funding Acquisition.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ahmed Rizvee, R., Farhan Ahmed, C. & Leung, C.K. A tree-based framework to mine top-K closed sequential patterns. Appl Intell 55, 221 (2025). https://doi.org/10.1007/s10489-024-06137-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06137-y