Mining top-k frequent patterns from uncertain databases

Le, Tuong; Vo, Bay; Huynh, Van-Nam; Nguyen, Ngoc Thanh; Baik, Sung Wook

doi:10.1007/s10489-019-01622-1

Mining top-k frequent patterns from uncertain databases

Published: 23 January 2020

Volume 50, pages 1487–1497, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tuong Le¹,
Bay Vo ORCID: orcid.org/0000-0002-2723-1138²,
Van-Nam Huynh³,
Ngoc Thanh Nguyen⁴ &
…
Sung Wook Baik⁵

514 Accesses
18 Citations
Explore all metrics

Abstract

Mining uncertain frequent patterns (UFPs) from uncertain databases was recently introduced, and there are various approaches to solve this problem in the last decade. However, systems are often faced with the problem of too many UFPs being discovered by the traditional approaches to this issue, and thus will spend a lot of time and resources to rank and find the most promising patterns. Therefore, this paper introduces a task named mining top-k UFPs from uncertain databases. We then propose an efficient method named TUFP (mining Top-k UFPs) to carry this out. Effective threshold raising strategies are introduced to help the proposed algorithm reduce the number of generated candidates to enhance the performance in terms of the runtime as well as memory usage. Finally, several experiments on the number of generated candidates, mining time, memory usage and scalability of TUFP and two state-of-the-art approaches (CUFP-mine and LUNA) were conducted. The performance studies show that TUFP is efficient in terms of mining time, memory usage and scalability for mining top-k UFPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Top-k Minimal Redundancy Frequent Patterns over Uncertain Databases

Mining Recent High Expected Weighted Itemsets from Uncertain Databases

A Hybrid Solution of Mining Frequent Itemsets from Uncertain Database

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD’93, pp 207–216
Le T, Vo B (2016) The lattice-based approaches for mining association rules: a review. WIREs Data Mining and Knowledge Discovery 6(2):140–151
MathSciNet Google Scholar
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Google Scholar
Nanda SJ, Panda G (2015) Design of computationally efficient density-based clustering algorithms. Data Knowl Eng 95:23–38
Google Scholar
Le T, Lee MY, Park JR, Baik SW (2018a) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4):79
Google Scholar
Le T, Le HS, Vo MT, Lee MY, Baik SW (2018b) A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset. Symmetry 10(7):250
Google Scholar
Le T, Vo B, Fujita H, Nguyen NT, Baik SW (2019a) A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting. Inf Sci 494:294–310
Google Scholar
Le T, Vo MT, Vo B, Lee MY, Baik SW (2019b) A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity, ID 8460934
Le T, Baik SW (2019) A robust framework for self-care problem identification for children with disability. Symmetry 11(1):89
Google Scholar
Indurkhya N (2015) Emerging directions in predictive text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(4):155–164
Google Scholar
Nassirtoussi AK, Aghabozorgi SR, The YW, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41(16):7653–7670
Google Scholar
Ruiz MD, Gómez-Romero J, Molina-Solana M, Ros M, Martín-Bautista MJ (2017) Information fusion from multiple databases using meta-association rules. Int J Approx Reason 80:185–198
MathSciNet MATH Google Scholar
Vairavasundaram S, Varadharajan V, Vairavasundaram I, Ravi L (2015) Data mining-based tag recommendation system: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(3):87–112
Google Scholar
Fournier-Viger P, Li Z, Lin JCW, Kiran RU, Fujita H (2019) Efficient algorithms to identify periodic patterns in multiple sequences. Inf Sci 489:205–226
MathSciNet Google Scholar
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Fujita H (2018) Extracting non-redundant correlated purchase behaviors by utility measure. Knowl-Based Syst 143:30–41
Google Scholar
Gan W, Lin JCW, Chao HC, Fujita H, Yu PS (2019) Correlated utility-based pattern mining. Inf Sci 504:470–486
MathSciNet Google Scholar
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Google Scholar
Djenouri Y, Belhadi A, Fournier-Viger P (2018) Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl-Based Syst 139:132–148
Google Scholar
Dong J, Han M (2007) BitTableFI: an efficient mining frequent itemsets algorithm. Knowl-Based Syst 20:329–335
Google Scholar
Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7(2):253–265
Google Scholar
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: KDD, pp. 29-38
Lee G, Yun U (2017) A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives. Futur Gener Comput Syst 68:89–110
Google Scholar
Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256
Google Scholar
Lin CW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016a) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250
Google Scholar
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016b) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187
Google Scholar
Liu YH (2015) Mining time-interval univariate uncertain sequential patterns. Data Knowl Eng 100:54–77
Google Scholar
Palacios AM, Martínez A, Sánchez L, Couso I (2015) Sequential pattern mining applied to aeroengine condition monitoring with uncertain health data. Eng Appl Artif Intell 44:10–24
Google Scholar
Ahmed AU, Ahmed CF, Samiullah M, Adnan N, Leung CKS (2016) Mining interesting patterns from uncertain databases. Inf Sci 354:60–85
MATH Google Scholar
Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Google Scholar
Petitjean F, Li T, Tatti N, Webb GI (2016) Skopus: mining top-k sequential patterns under leverage. Data Min Knowl Disc 30(5):1086–1111
MathSciNet MATH Google Scholar
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Google Scholar
Tseng V, Wu C, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-K high utility Itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Google Scholar
Aggarwal CC, Han J (2014) Frequent pattern mining. Springer, ISBN 978-3-319-07820-5
Agrawal R., Srikant R.: Fast algorithms for mining association rules. In: VLDB'94, 487–499, 1994
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17:1347–1362
Google Scholar
Song W, Yang B, Xu Z (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21:507–513
Google Scholar
Deng ZH (2016) DiffNodesets: an efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223
Google Scholar
Deng ZH, Lv SL (2015) PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432
Google Scholar
Fasihy H, Nadimi-Shahraki MH (2018) Incremental mining maximal frequent patterns from univariate uncertain data. Knowl-Based Syst 152:40–50
Google Scholar
Vo B, Pham S, Le T, Deng ZH (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186
Google Scholar
Dam TL, Li K, Fournier-Viger P (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111
Google Scholar
Deng ZH (2014) Fast mining top-rank-k frequent patterns by using node-lists. Expert Syst Appl 41(4):1763–1768
Google Scholar
Huynh Q, Le T, Vo B, Le B (2015) An efficient and effective algorithm for mining top-rank-k frequent patterns. Expert Syst Appl 42(1):156–164
Google Scholar
Nguyen LTT, Trinh T, Nguyen NT, Vo B (2017) A method for mining top-rank-k frequent closed itemsets. J Intell Fuzzy Syst 32(2):1297–1305
Google Scholar
Sahoo J, Das AK, Goswami A (2015) An effective ssociation rule mining scheme using a new generic basis. Knowl Inf Syst 43(1):127–156
Google Scholar
Deng ZH (2013) Mining top-rank-k erasable Itemsets by PID_lists. Int J Intell Syst 28(4):366–379
Google Scholar
Le T, Vo B, Baik SW (2018) Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng Appl Artif Intell 68:1–9
Google Scholar
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255
Google Scholar
Bui N, Vo B, Huynh VN, Lin CW, Nguyen LTT (2016) Mining closed high utility itemsets in uncertain databases. In: SoICT, pp. 7–14

Download references

Author information

Authors and Affiliations

Institute of Research and Development, Duy Tan University, Da Nang, 550000, Vietnam
Tuong Le
Faculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Vietnam
Bay Vo
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh
Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Digital Contents Research Institute, Sejong University, Seoul, Republic of Korea
Sung Wook Baik

Authors

Tuong Le
View author publications
You can also search for this author in PubMed Google Scholar
Bay Vo
View author publications
You can also search for this author in PubMed Google Scholar
Van-Nam Huynh
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Sung Wook Baik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bay Vo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le, T., Vo, B., Huynh, VN. et al. Mining top-k frequent patterns from uncertain databases. Appl Intell 50, 1487–1497 (2020). https://doi.org/10.1007/s10489-019-01622-1

Download citation

Published: 23 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10489-019-01622-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining top-k frequent patterns from uncertain databases

Abstract

Access this article

Similar content being viewed by others

Mining Top-k Minimal Redundancy Frequent Patterns over Uncertain Databases

Mining Recent High Expected Weighted Itemsets from Uncertain Databases

A Hybrid Solution of Mining Frequent Itemsets from Uncertain Database

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining top-k frequent patterns from uncertain databases

Abstract

Access this article

Similar content being viewed by others

Mining Top-k Minimal Redundancy Frequent Patterns over Uncertain Databases

Mining Recent High Expected Weighted Itemsets from Uncertain Databases

A Hybrid Solution of Mining Frequent Itemsets from Uncertain Database

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation