Abstract
Mining useful sequential patterns has been a recent trend in data mining as the real-life applications are mostly sequence oriented. Researchers have developed many algorithms to find frequent sub-sequences from sequential databases to find useful information. The emerging and tremendous development of technology has been increasing the number of applications that deal with uncertainty. Ordinary uncertain pattern mining algorithms deal with expected support or probabilistic frequentness of a pattern, ignoring the importance of individual items. However, in real-life, different items can have different importance. Some approaches consider the weight (importance) of items but fail to capture the interestingness of mined patterns. The objective of the work is to address weighted sequential uncertain pattern mining in Possible World Semantics (PWS) to better capture inherent relations among the items and events with different weights and developing a novel method uWSpan. Our proposed approach contains some pruning techniques to provide faster mining capability and introduces itemset extension for the first time in PWS. We have analyzed the performance of our proposed approach both theoretically and empirically where we found uWSpan efficient, scalable and effective. Our approach outperforms existing approaches most of the time when compared using approved datasets. We also analyzed the applicability, efficiency and effectiveness of our proposed method. Finally, the paper concludes with future research directions and a gist of the outcomes of the research.
Similar content being viewed by others
Notes
The names WPS and MWPS seem confusing. Inferring any direct relation between them, such as an MWPS is a maximum of a list of WPSs will lead to wrong computation.
References
FIMI - frequent itemset mining dataset repository
Probabilistic database - wikipedia (2008) https://en.wikipedia.org/wiki/Probabilistic_database. Accessed 01 Feb 2022
SPMF: an open-source data mining library (2008) http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php. Accessed 01 Feb 2022
Type 2 diabetes causes: genetics and lifestyle choices play a role (2018) https://www.endocrineweb.com/conditions/type-2-diabetes/type-2-diabetes-causes. Accessed 04 Aug 2021
(2021) Probability mass function. https://en.wikipedia.org/wiki/Probability_mass_function. Accessed 01 Feb 2022
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB 1994), pp 487–499
Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung CKS (2016) Mining interesting patterns from uncertain databases. Inf Sci 354:60–85. https://doi.org/10.1016/j.ins.2016.03.007
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2008) Handling dynamic weights in weighted frequent pattern mining. IEICE Trans Inf Syst 91(11):2578–2588
Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefle A (2009) Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128
Cuzzocrea A, Leung CKS, MacKinnon RK (2014) Mining constrained frequent itemsets from distributed uncertain data. Futur Gener Comput Syst 37:117–126
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Science and Pattern Recognition 1(1):54–77
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359
Hooshsadat M, Bayat S, Naeimi P, Mirian MS, Zaiane OR (2012) UApriori: an algorithm for finding sequential patterns in probabilistic data. In: Uncertainty modeling in knowledge engineering and decision making. World Scientific, pp 907–912
Lan GC, Hong TP, Lee HY (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41(2):439–452
Leung CKS, Brajczuk DA (2009) Efficient algorithms for mining constrained frequent patterns from uncertain data. In: Proceedings of the ACM SIGKDD workshop on knowledge discovery from uncertain data 2009, pp 9–18
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2008). Springer, pp 653–661
Leung CKS, Tanbeer SK (2013) PUF-tree: a compact tree structure for frequent pattern mining of uncertain data. In: Proceedings of the pacific-asia conference on knowledge discovery and data mining (PAKDD 2013). Springer, pp 13–25
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250
Menzel C (2021) Possible worlds. In: The Stanford encyclopedia of philosophy (fall 2021 edn)
Muzammal M, Raman R (2010) On probabilistic models for uncertain sequential pattern mining. In: Proceedings of the international conference on advanced data mining and applications (ADMA 2010). Springer, pp 60–72
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Pei JHJ, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12
Rahman MM, Ahmed CF, Leung CKS (2019) Mining weighted frequent sequences in uncertain databases. Inf Sci 479:76–100
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (ICDT 1996). Springer, pp 1–17
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 273–282
Huynh HM, Nguyen LTT, Vo B, Oplatková ZK, Fournier-Viger P, Yun U (2022) An efficient parallel algorithm for mining weighted clickstream patterns. Inf Sci 582:349–368
Huynh HM, Nguyen LTT, Vo B, Yun U, Oplatková ZK, Hong TP (2020) Efficient algorithms for mining clickstream patterns using pseudo-IDLists. Futur Gener Comput Syst 107:18–30
Islam MA, Rafi MR, Azad A, Ovi JA (2022) Weighted frequent sequential pattern mining. Appl Intell 52(1):254–281
Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci 568:239–264
Vo B, Nguyen HC, Huynh B, Le T (2021) Efficient methods for clickstream pattern mining on incremental databases. IEEE Access 9:161305–161317
Tong W, Leung CK, Liu D, Yu J (2015) Probabilistic frequent pattern mining by PUH-Mine. In: Proceedings of the Asia-Pacific web conference (APWeb 2015). Springer, pp 768–780
You T, Li T, Du C, Zhai X, Jiang N (2017) Discovering probabilistic weighted frequent itemsets over uncertain data. In: Proceedings of 13th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD 2017). IEEE, pp 1728–1734
Yun U (2007) Efficient mining of weighted interesting patterns with a strong weight and/or support affinity. Inf Sci 177(17):3477–3499
Yun U, Leggett JJ (2006) WSpan: weighted sequential pattern mining in large sequence databases. In: Proceedings of the 3rd IEEE international conference on intelligent systems (IS 2006). IEEE, pp 512–517
Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42 (1):31–60
Zhang S, Zhang J, Jin Z (2009) A decremental algorithm of frequent itemset maintenance for mining updated databases. Expert Syst Appl 36(8):10890–10895
Zhao Z, Yan D, Ng W (2013) Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Trans Knowl Data Eng 26(5):1171–1184
Acknowledgements
This work is partially supported by (a) ICT innovation fund of the Department of ICT, Ministry of Posts, Telecommunications and Information Technology, Government Republic of Bangladesh; (b) Natural Sciences and Engineering Research Council of Canada (NSERC); and (c) University of Manitoba.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Islam, M.S., Kar, P.C., Samiullah, M. et al. Discovering probabilistically weighted sequential patterns in uncertain databases. Appl Intell 53, 6525–6553 (2023). https://doi.org/10.1007/s10489-022-03699-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03699-7