Abstract
Mining uncertain frequent patterns (UFPs) from uncertain databases was recently introduced, and there are various approaches to solve this problem in the last decade. However, systems are often faced with the problem of too many UFPs being discovered by the traditional approaches to this issue, and thus will spend a lot of time and resources to rank and find the most promising patterns. Therefore, this paper introduces a task named mining top-k UFPs from uncertain databases. We then propose an efficient method named TUFP (mining Top-k UFPs) to carry this out. Effective threshold raising strategies are introduced to help the proposed algorithm reduce the number of generated candidates to enhance the performance in terms of the runtime as well as memory usage. Finally, several experiments on the number of generated candidates, mining time, memory usage and scalability of TUFP and two state-of-the-art approaches (CUFP-mine and LUNA) were conducted. The performance studies show that TUFP is efficient in terms of mining time, memory usage and scalability for mining top-k UFPs.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD’93, pp 207–216
Le T, Vo B (2016) The lattice-based approaches for mining association rules: a review. WIREs Data Mining and Knowledge Discovery 6(2):140–151
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Nanda SJ, Panda G (2015) Design of computationally efficient density-based clustering algorithms. Data Knowl Eng 95:23–38
Le T, Lee MY, Park JR, Baik SW (2018a) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4):79
Le T, Le HS, Vo MT, Lee MY, Baik SW (2018b) A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset. Symmetry 10(7):250
Le T, Vo B, Fujita H, Nguyen NT, Baik SW (2019a) A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting. Inf Sci 494:294–310
Le T, Vo MT, Vo B, Lee MY, Baik SW (2019b) A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity, ID 8460934
Le T, Baik SW (2019) A robust framework for self-care problem identification for children with disability. Symmetry 11(1):89
Indurkhya N (2015) Emerging directions in predictive text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(4):155–164
Nassirtoussi AK, Aghabozorgi SR, The YW, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41(16):7653–7670
Ruiz MD, Gómez-Romero J, Molina-Solana M, Ros M, Martín-Bautista MJ (2017) Information fusion from multiple databases using meta-association rules. Int J Approx Reason 80:185–198
Vairavasundaram S, Varadharajan V, Vairavasundaram I, Ravi L (2015) Data mining-based tag recommendation system: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(3):87–112
Fournier-Viger P, Li Z, Lin JCW, Kiran RU, Fujita H (2019) Efficient algorithms to identify periodic patterns in multiple sequences. Inf Sci 489:205–226
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Fujita H (2018) Extracting non-redundant correlated purchase behaviors by utility measure. Knowl-Based Syst 143:30–41
Gan W, Lin JCW, Chao HC, Fujita H, Yu PS (2019) Correlated utility-based pattern mining. Inf Sci 504:470–486
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Djenouri Y, Belhadi A, Fournier-Viger P (2018) Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl-Based Syst 139:132–148
Dong J, Han M (2007) BitTableFI: an efficient mining frequent itemsets algorithm. Knowl-Based Syst 20:329–335
Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7(2):253–265
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: KDD, pp. 29-38
Lee G, Yun U (2017) A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives. Futur Gener Comput Syst 68:89–110
Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256
Lin CW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016a) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016b) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187
Liu YH (2015) Mining time-interval univariate uncertain sequential patterns. Data Knowl Eng 100:54–77
Palacios AM, Martínez A, Sánchez L, Couso I (2015) Sequential pattern mining applied to aeroengine condition monitoring with uncertain health data. Eng Appl Artif Intell 44:10–24
Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung CKS (2016) Mining interesting patterns from uncertain databases. Inf Sci 354:60–85
Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Petitjean F, Li T, Tatti N, Webb GI (2016) Skopus: mining top-k sequential patterns under leverage. Data Min Knowl Disc 30(5):1086–1111
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Tseng V, Wu C, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-K high utility Itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Aggarwal CC, Han J (2014) Frequent pattern mining. Springer, ISBN 978-3-319-07820-5
Agrawal R., Srikant R.: Fast algorithms for mining association rules. In: VLDB'94, 487–499, 1994
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17:1347–1362
Song W, Yang B, Xu Z (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21:507–513
Deng ZH (2016) DiffNodesets: an efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223
Deng ZH, Lv SL (2015) PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432
Fasihy H, Nadimi-Shahraki MH (2018) Incremental mining maximal frequent patterns from univariate uncertain data. Knowl-Based Syst 152:40–50
Vo B, Pham S, Le T, Deng ZH (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186
Dam TL, Li K, Fournier-Viger P (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111
Deng ZH (2014) Fast mining top-rank-k frequent patterns by using node-lists. Expert Syst Appl 41(4):1763–1768
Huynh Q, Le T, Vo B, Le B (2015) An efficient and effective algorithm for mining top-rank-k frequent patterns. Expert Syst Appl 42(1):156–164
Nguyen LTT, Trinh T, Nguyen NT, Vo B (2017) A method for mining top-rank-k frequent closed itemsets. J Intell Fuzzy Syst 32(2):1297–1305
Sahoo J, Das AK, Goswami A (2015) An effective ssociation rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156
Deng ZH (2013) Mining top-rank-k erasable Itemsets by PID_lists. Int J Intell Syst 28(4):366–379
Le T, Vo B, Baik SW (2018) Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng Appl Artif Intell 68:1–9
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255
Bui N, Vo B, Huynh VN, Lin CW, Nguyen LTT (2016) Mining closed high utility itemsets in uncertain databases. In: SoICT, pp. 7–14
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Le, T., Vo, B., Huynh, VN. et al. Mining top-k frequent patterns from uncertain databases. Appl Intell 50, 1487–1497 (2020). https://doi.org/10.1007/s10489-019-01622-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01622-1