Skip to main content
Log in

Discovering probabilistically weighted sequential patterns in uncertain databases

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Mining useful sequential patterns has been a recent trend in data mining as the real-life applications are mostly sequence oriented. Researchers have developed many algorithms to find frequent sub-sequences from sequential databases to find useful information. The emerging and tremendous development of technology has been increasing the number of applications that deal with uncertainty. Ordinary uncertain pattern mining algorithms deal with expected support or probabilistic frequentness of a pattern, ignoring the importance of individual items. However, in real-life, different items can have different importance. Some approaches consider the weight (importance) of items but fail to capture the interestingness of mined patterns. The objective of the work is to address weighted sequential uncertain pattern mining in Possible World Semantics (PWS) to better capture inherent relations among the items and events with different weights and developing a novel method uWSpan. Our proposed approach contains some pruning techniques to provide faster mining capability and introduces itemset extension for the first time in PWS. We have analyzed the performance of our proposed approach both theoretically and empirically where we found uWSpan efficient, scalable and effective. Our approach outperforms existing approaches most of the time when compared using approved datasets. We also analyzed the applicability, efficiency and effectiveness of our proposed method. Finally, the paper concludes with future research directions and a gist of the outcomes of the research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

Notes

  1. The names WPS and MWPS seem confusing. Inferring any direct relation between them, such as an MWPS is a maximum of a list of WPSs will lead to wrong computation.

References

  1. FIMI - frequent itemset mining dataset repository

  2. Probabilistic database - wikipedia (2008) https://en.wikipedia.org/wiki/Probabilistic_database. Accessed 01 Feb 2022

  3. SPMF: an open-source data mining library (2008) http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php. Accessed 01 Feb 2022

  4. Type 2 diabetes causes: genetics and lifestyle choices play a role (2018) https://www.endocrineweb.com/conditions/type-2-diabetes/type-2-diabetes-causes. Accessed 04 Aug 2021

  5. (2021) Probability mass function. https://en.wikipedia.org/wiki/Probability_mass_function. Accessed 01 Feb 2022

  6. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB 1994), pp 487–499

  7. Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung CKS (2016) Mining interesting patterns from uncertain databases. Inf Sci 354:60–85. https://doi.org/10.1016/j.ins.2016.03.007

    Article  MATH  Google Scholar 

  8. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2008) Handling dynamic weights in weighted frequent pattern mining. IEICE Trans Inf Syst 91(11):2578–2588

    Article  Google Scholar 

  9. Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefle A (2009) Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128

  10. Cuzzocrea A, Leung CKS, MacKinnon RK (2014) Mining constrained frequent itemsets from distributed uncertain data. Futur Gener Comput Syst 37:117–126

    Article  Google Scholar 

  11. Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Science and Pattern Recognition 1(1):54–77

    Google Scholar 

  12. Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359

  13. Hooshsadat M, Bayat S, Naeimi P, Mirian MS, Zaiane OR (2012) UApriori: an algorithm for finding sequential patterns in probabilistic data. In: Uncertainty modeling in knowledge engineering and decision making. World Scientific, pp 907–912

  14. Lan GC, Hong TP, Lee HY (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41(2):439–452

    Article  Google Scholar 

  15. Leung CKS, Brajczuk DA (2009) Efficient algorithms for mining constrained frequent patterns from uncertain data. In: Proceedings of the ACM SIGKDD workshop on knowledge discovery from uncertain data 2009, pp 9–18

  16. Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2008). Springer, pp 653–661

  17. Leung CKS, Tanbeer SK (2013) PUF-tree: a compact tree structure for frequent pattern mining of uncertain data. In: Proceedings of the pacific-asia conference on knowledge discovery and data mining (PAKDD 2013). Springer, pp 13–25

  18. Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250

    Article  Google Scholar 

  19. Menzel C (2021) Possible worlds. In: The Stanford encyclopedia of philosophy (fall 2021 edn)

  20. Muzammal M, Raman R (2010) On probabilistic models for uncertain sequential pattern mining. In: Proceedings of the international conference on advanced data mining and applications (ADMA 2010). Springer, pp 60–72

  21. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  22. Pei JHJ, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12

  23. Rahman MM, Ahmed CF, Leung CKS (2019) Mining weighted frequent sequences in uncertain databases. Inf Sci 479:76–100

    Article  Google Scholar 

  24. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (ICDT 1996). Springer, pp 1–17

  25. Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 273–282

  26. Huynh HM, Nguyen LTT, Vo B, Oplatková ZK, Fournier-Viger P, Yun U (2022) An efficient parallel algorithm for mining weighted clickstream patterns. Inf Sci 582:349–368

    Article  MathSciNet  Google Scholar 

  27. Huynh HM, Nguyen LTT, Vo B, Yun U, Oplatková ZK, Hong TP (2020) Efficient algorithms for mining clickstream patterns using pseudo-IDLists. Futur Gener Comput Syst 107:18–30

    Article  Google Scholar 

  28. Islam MA, Rafi MR, Azad A, Ovi JA (2022) Weighted frequent sequential pattern mining. Appl Intell 52(1):254–281

    Article  Google Scholar 

  29. Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci 568:239–264

  30. Vo B, Nguyen HC, Huynh B, Le T (2021) Efficient methods for clickstream pattern mining on incremental databases. IEEE Access 9:161305–161317

  31. Tong W, Leung CK, Liu D, Yu J (2015) Probabilistic frequent pattern mining by PUH-Mine. In: Proceedings of the Asia-Pacific web conference (APWeb 2015). Springer, pp 768–780

  32. You T, Li T, Du C, Zhai X, Jiang N (2017) Discovering probabilistic weighted frequent itemsets over uncertain data. In: Proceedings of 13th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD 2017). IEEE, pp 1728–1734

  33. Yun U (2007) Efficient mining of weighted interesting patterns with a strong weight and/or support affinity. Inf Sci 177(17):3477–3499

    Article  MathSciNet  Google Scholar 

  34. Yun U, Leggett JJ (2006) WSpan: weighted sequential pattern mining in large sequence databases. In: Proceedings of the 3rd IEEE international conference on intelligent systems (IS 2006). IEEE, pp 512–517

  35. Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42 (1):31–60

    Article  MATH  Google Scholar 

  36. Zhang S, Zhang J, Jin Z (2009) A decremental algorithm of frequent itemset maintenance for mining updated databases. Expert Syst Appl 36(8):10890–10895

    Article  Google Scholar 

  37. Zhao Z, Yan D, Ng W (2013) Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Trans Knowl Data Eng 26(5):1171–1184

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially supported by (a) ICT innovation fund of the Department of ICT, Ministry of Posts, Telecommunications and Information Technology, Government Republic of Bangladesh; (b) Natural Sciences and Engineering Research Council of Canada (NSERC); and (c) University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Samiullah.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Islam, M.S., Kar, P.C., Samiullah, M. et al. Discovering probabilistically weighted sequential patterns in uncertain databases. Appl Intell 53, 6525–6553 (2023). https://doi.org/10.1007/s10489-022-03699-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03699-7

Keywords

Navigation