Skip to main content
Log in

Mining top-k frequent patterns from uncertain databases

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Mining uncertain frequent patterns (UFPs) from uncertain databases was recently introduced, and there are various approaches to solve this problem in the last decade. However, systems are often faced with the problem of too many UFPs being discovered by the traditional approaches to this issue, and thus will spend a lot of time and resources to rank and find the most promising patterns. Therefore, this paper introduces a task named mining top-k UFPs from uncertain databases. We then propose an efficient method named TUFP (mining Top-k UFPs) to carry this out. Effective threshold raising strategies are introduced to help the proposed algorithm reduce the number of generated candidates to enhance the performance in terms of the runtime as well as memory usage. Finally, several experiments on the number of generated candidates, mining time, memory usage and scalability of TUFP and two state-of-the-art approaches (CUFP-mine and LUNA) were conducted. The performance studies show that TUFP is efficient in terms of mining time, memory usage and scalability for mining top-k UFPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD’93, pp 207–216

  2. Le T, Vo B (2016) The lattice-based approaches for mining association rules: a review. WIREs Data Mining and Knowledge Discovery 6(2):140–151

    MathSciNet  Google Scholar 

  3. Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Google Scholar 

  4. Nanda SJ, Panda G (2015) Design of computationally efficient density-based clustering algorithms. Data Knowl Eng 95:23–38

    Google Scholar 

  5. Le T, Lee MY, Park JR, Baik SW (2018a) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4):79

    Google Scholar 

  6. Le T, Le HS, Vo MT, Lee MY, Baik SW (2018b) A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset. Symmetry 10(7):250

    Google Scholar 

  7. Le T, Vo B, Fujita H, Nguyen NT, Baik SW (2019a) A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting. Inf Sci 494:294–310

    Google Scholar 

  8. Le T, Vo MT, Vo B, Lee MY, Baik SW (2019b) A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity, ID 8460934

  9. Le T, Baik SW (2019) A robust framework for self-care problem identification for children with disability. Symmetry 11(1):89

    Google Scholar 

  10. Indurkhya N (2015) Emerging directions in predictive text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(4):155–164

    Google Scholar 

  11. Nassirtoussi AK, Aghabozorgi SR, The YW, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41(16):7653–7670

    Google Scholar 

  12. Ruiz MD, Gómez-Romero J, Molina-Solana M, Ros M, Martín-Bautista MJ (2017) Information fusion from multiple databases using meta-association rules. Int J Approx Reason 80:185–198

    MathSciNet  MATH  Google Scholar 

  13. Vairavasundaram S, Varadharajan V, Vairavasundaram I, Ravi L (2015) Data mining-based tag recommendation system: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(3):87–112

    Google Scholar 

  14. Fournier-Viger P, Li Z, Lin JCW, Kiran RU, Fujita H (2019) Efficient algorithms to identify periodic patterns in multiple sequences. Inf Sci 489:205–226

    MathSciNet  Google Scholar 

  15. Gan W, Lin JCW, Fournier-Viger P, Chao HC, Fujita H (2018) Extracting non-redundant correlated purchase behaviors by utility measure. Knowl-Based Syst 143:30–41

    Google Scholar 

  16. Gan W, Lin JCW, Chao HC, Fujita H, Yu PS (2019) Correlated utility-based pattern mining. Inf Sci 504:470–486

    MathSciNet  Google Scholar 

  17. Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205

    Google Scholar 

  18. Djenouri Y, Belhadi A, Fournier-Viger P (2018) Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl-Based Syst 139:132–148

    Google Scholar 

  19. Dong J, Han M (2007) BitTableFI: an efficient mining frequent itemsets algorithm. Knowl-Based Syst 20:329–335

    Google Scholar 

  20. Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7(2):253–265

    Google Scholar 

  21. Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: KDD, pp. 29-38

  22. Lee G, Yun U (2017) A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives. Futur Gener Comput Syst 68:89–110

    Google Scholar 

  23. Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256

    Google Scholar 

  24. Lin CW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016a) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250

    Google Scholar 

  25. Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016b) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187

    Google Scholar 

  26. Liu YH (2015) Mining time-interval univariate uncertain sequential patterns. Data Knowl Eng 100:54–77

    Google Scholar 

  27. Palacios AM, Martínez A, Sánchez L, Couso I (2015) Sequential pattern mining applied to aeroengine condition monitoring with uncertain health data. Eng Appl Artif Intell 44:10–24

    Google Scholar 

  28. Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung CKS (2016) Mining interesting patterns from uncertain databases. Inf Sci 354:60–85

    MATH  Google Scholar 

  29. Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122

    Google Scholar 

  30. Petitjean F, Li T, Tatti N, Webb GI (2016) Skopus: mining top-k sequential patterns under leverage. Data Min Knowl Disc 30(5):1086–1111

    MathSciNet  MATH  Google Scholar 

  31. Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126

    Google Scholar 

  32. Tseng V, Wu C, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-K high utility Itemsets. IEEE Trans Knowl Data Eng 28(1):54–67

    Google Scholar 

  33. Aggarwal CC, Han J (2014) Frequent pattern mining. Springer, ISBN 978-3-319-07820-5

  34. Agrawal R., Srikant R.: Fast algorithms for mining association rules. In: VLDB'94, 487–499, 1994

  35. Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17:1347–1362

    Google Scholar 

  36. Song W, Yang B, Xu Z (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21:507–513

    Google Scholar 

  37. Deng ZH (2016) DiffNodesets: an efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223

    Google Scholar 

  38. Deng ZH, Lv SL (2015) PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432

    Google Scholar 

  39. Fasihy H, Nadimi-Shahraki MH (2018) Incremental mining maximal frequent patterns from univariate uncertain data. Knowl-Based Syst 152:40–50

    Google Scholar 

  40. Vo B, Pham S, Le T, Deng ZH (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186

    Google Scholar 

  41. Dam TL, Li K, Fournier-Viger P (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111

    Google Scholar 

  42. Deng ZH (2014) Fast mining top-rank-k frequent patterns by using node-lists. Expert Syst Appl 41(4):1763–1768

    Google Scholar 

  43. Huynh Q, Le T, Vo B, Le B (2015) An efficient and effective algorithm for mining top-rank-k frequent patterns. Expert Syst Appl 42(1):156–164

    Google Scholar 

  44. Nguyen LTT, Trinh T, Nguyen NT, Vo B (2017) A method for mining top-rank-k frequent closed itemsets. J Intell Fuzzy Syst 32(2):1297–1305

    Google Scholar 

  45. Sahoo J, Das AK, Goswami A (2015) An effective ssociation rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156

    Google Scholar 

  46. Deng ZH (2013) Mining top-rank-k erasable Itemsets by PID_lists. Int J Intell Syst 28(4):366–379

    Google Scholar 

  47. Le T, Vo B, Baik SW (2018) Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng Appl Artif Intell 68:1–9

    Google Scholar 

  48. Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255

    Google Scholar 

  49. Bui N, Vo B, Huynh VN, Lin CW, Nguyen LTT (2016) Mining closed high utility itemsets in uncertain databases. In: SoICT, pp. 7–14

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bay Vo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le, T., Vo, B., Huynh, VN. et al. Mining top-k frequent patterns from uncertain databases. Appl Intell 50, 1487–1497 (2020). https://doi.org/10.1007/s10489-019-01622-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01622-1

Keywords

Navigation