A non-group parallel frequent pattern mining algorithm based on conditional patterns

Kuang, Zhe-jun; Zhou, Hang; Zhou, Dong-dai; Zhou, Jin-peng; Yang, Kun

doi:10.1631/FITEE.1800467

A non-group parallel frequent pattern mining algorithm based on conditional patterns

Published: 18 October 2019

Volume 20, pages 1234–1245, (2019)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

106 Accesses
Explore all metrics

Abstract

Frequent itemset mining serves as the main method of association rule mining. With the limitations in computing space and performance, the association of frequent items in large data mining requires both extensive time and effort, particularly when the datasets become increasingly larger. In the process of associated data mining in a big data environment, the MapReduce programming model is typically used to perform task partitioning and parallel processing, which could improve the execution efficiency of the algorithm. However, to ensure that the associated rule is not destroyed during task partitioning and parallel processing, the inner-relationship data must be stored in the computer space. Because inner-relationship data are redundant, storage of these data will significantly increase the space usage in comparison with the original dataset. In this study, we find that the formation of the frequent pattern (FP) mining algorithm depends mainly on the conditional pattern bases. Based on the parallel frequent pattern (PFP) algorithm theory, the grouping model divides frequent items into several groups according to their frequencies. We propose a non-group PFP (NG-PFP) mining algorithm that cancels the grouping model and reduces the data redundancy between sub-tasks. Moreover, we present the NG-PFP algorithm for task partition and parallel processing, and its performance in the Hadoop cluster environment is analyzed and discussed. Experimental results indicate that the non-group model shows obvious improvement in terms of computational efficiency and the space utilization rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parallel algorithm for mining constrained frequent patterns using MapReduce

Article 17 November 2015

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Article 13 March 2021

Association Rules Mining in Parallel Conditional Tree

References

Agrawal R, Srikant R, 1994. Fast algorithms for mining association rules in large databases. Proc Int Conf on Very Large Data Bases, p.487–499.
Agarwal RC, Aggarwal CC, Prasad VVV, 2002. A tree projection algorithm for generation of frequent item sets. J Parall Distrib Comput, 61(3):350–371. https://doi.org/10.1006/jpdc.2000.1693
Article Google Scholar
Bauer M, Bruveris M, Charon N, et al., 2018. A relaxed approach for curve matching with elastic metrics. https://arxiv.gg363.site/abs/1803.10893
Berti-Équille L, Harmouch H, Naumann F, et al., 2018. Discovery of genuine functional dependencies from relational data with missing values. Proc VLDB Endowm, 11(8):880–892. https://doi.org/10.14778/3204028.3204032
Article Google Scholar
Caruccio L, Deufemia V, Polese G, 2016. On the discovery of relaxed functional dependencies. Proc 20^th Int Database Engineering & Applications Symp, p.53–61. https://doi.org/10.1145/2938503.2938519
Caruccio L, Deufemia V, Polese G, 2017. Evolutionary mining of relaxed dependencies from big data collections. Proc 7^th Int Conf on Web Intelligence, Mining and Semantics, Article 5. https://doi.org/10.1145/3102254.3102259
Caruccio L, Polese G, Tortora G, 2018. Dependency-based query/view synchronization upon schema evolutions. Int Conf on Conceptual Modeling, p.91–105. https://doi.org/10.1007/978-3-030-01391-2_17
Google Scholar
Chen JC, Chen YG, Du XY, et al., 2013. Big data challenge: a data management perspective. Front Comput Sci, 7(2):157–164. https://doi.org/10.1007/s11704-013-3903-7
Article MathSciNet Google Scholar
Cong S, Han J, Padua D, 2005. Parallel mining of closed sequential patterns. Proc 11^th ACM SIGKDD Int Conf on Knowledge Discovery in Data Mining, p.562–567.
Deng LL, Lou YS, 2015. Improvement and research of FP-growth algorithm based on distributed spark. Proc Int Conf on Cloud Computing and Big Data, p.105–108. https://doi.org/10.1109/CCBD.2015.15
di-Jorio L, Laurent A, Teisseire M, 2009. Mining frequent gradual itemsets from large databases. Int Symp on Intelligent Data Analysis, p.297–308. https://doi.org/10.1007/978-3-642-03915-7_26
Chapter Google Scholar
El-Hajj M, Zaïane OR, 2006. Parallel bifold: large-scale parallel pattern mining with constraints. Distrib Parall Datab, 20(3):225–243. https://doi.org/10.1007/s10619-006-0445-0
Article Google Scholar
Ge KS, Su HY, Li DS, et al., 2017. Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit. Front Inform Technol Electron Eng, 18(7):915–927. https://doi.org/10.1631/FITEE.1601786
Article Google Scholar
Han JW, Pei J, Yin YW, 2000. Mining frequent patterns without candidate generation. ACM SIGMOD Rec, 29(2):1–12. https://doi.org/10.1145/335191.335372
Article Google Scholar
Huhtala Y, Kärkkäinen J, Porkka P, et al., 1999. Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J, 42(2):100–111. https://doi.org/10.1093/comjnl/42.2.100
Article Google Scholar
Kruse S, Naumann F, 2018. Efficient discovery of approximate dependencies. Proc VLDB Endowm, 11(7):759–772. https://doi.org/10.14778/3192965.3192968
Article Google Scholar
Li HY, Wang Y, Zhang D, et al., 2008. PFP: parallel FP-growth for query recommendation. Proc ACM Conf on Recommender Systems, p.107–114. https://doi.org/10.1145/1454008.1454027
Li N, Zeng L, He Q, et al., 2012. Parallel implementation of Apriori algorithm based on MapReduce. Proc. 13^th ACIS Int Conf on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, p.236–241. https://doi.org/10.1109/SNPD.2012.31
Lin KW, Chung SH, 2015. A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments. Fut Gener Comput Syst, 52:49–58. https://doi.org/10.1016/j.future.2015.05.009
Article Google Scholar
Lin MY, Lee PY, Hsueh SC, 2012. Apriori-based frequent itemset mining algorithms on MapReduce. Proc 6^th Int Conf on Ubiquitous Information Management and Communication, Article 26. https://doi.org/10.1145/2184751.2184842
Liu JQ, Wu YS, Zhou QF, et al., 2015. Parallel Eclat for opportunistic mining of frequent itemsets. Int Conf on Database and Expert Systems Applications, p.401–415. https://doi.org/10.1007/978-3-319-22849-5_27
Google Scholar
Lucchese C, Orlando S, Perego R, et al., 2004. WebDocs: a real-life huge transactional dataset. Proc IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
Mandros P, Boley M, Vreeken J, 2017. Discovering reliable approximate functional dependencies. Proc 23^rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.355–363. https://doi.org/10.1145/3097983.3098062
Riondato M, DeBrabant JA, Fonseca R, et al., 2012. PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce. Proc 21^st ACM Int Conf on Information and Knowledge Management, p.85–94. https://doi.org/10.1145/2396761.2396776
Siddiqa A, Karim A, Gani A, 2017. Big data storage technologies: a survey. Front Inform Technol Electron Eng, 18(8):1040–1070. https://doi.org/10.1631/FITEE.1500441
Article Google Scholar
Srikant R, Agrawal R, 1996. Mining sequential patterns: generalizations and performance improvements. Int Conf on Extending Database Technology, p.1–17. https://doi.org/10.1007/BFb0014140
Google Scholar
Wang F, Hu L, Zhou J, et al., 2015. A survey from the perspective of evolutionary process in the Internet of Things. Int J Distrib Sen Networks, 11(3):462752. https://doi.org/10.1155/2015/462752
Article Google Scholar
Wang J, Han J, 2004. BIDE: efficient mining of frequent closed sequences. Proc 20^th Int Conf on Data Engineering, p.79–90. https://doi.org/10.1109/ICDE.2004.1319986
Xia D, Zhou Y, Rong Z, et al., 2013. IPFP: an improved parallel FP-growth algorithm for frequent itemsets mining. Proc 59^th ISI World Statistics Congress, p.4034–4039.
Xia D, Rong Z, Zhou Y, 2014. A novel parallel algorithm for frequent itemsets mining in massive small files datasets. ICIC Expr Lett Part B, 5(2):459–466.
Google Scholar
Yang Q, Du FY, Zhu X, et al., 2016. Improved balanced parallel FP-growth with MapReduce. Joint Int Conf on Artificial Intelligence and Computer Engineering and Int Conf on Network and Communication Security, p.1–5. https://doi.org/10.12783/dtcse/aice-ncs2016/5681
Yang XY, Liu Z, Fu Y, 2010. MapReduce as a programming model for association rules algorithm on Hadoop. Proc 3^rd Int Conf on Information Sciences and Interaction Sciences, p.99–102. https://doi.org/10.1109/ICICIS.2010.5534718
Yu KM, Zhou JY, Hsiao WC, 2007. Load balancing approach parallel algorithm for frequent pattern mining. Proc Int Conf on Parallel Computing Technologies, p.623–631. https://doi.org/10.1007/978-3-540-73940-1_63
Zaki MJ, 2000. Scalable algorithms for association mining. IEEE Trans Know Data Eng, 12(3):372–390. https://doi.org/10.1109/69.846291
Article Google Scholar
Zaki MJ, 2001a. Parallel sequence mining on shared-memory machines. J Parall Distrib Comput, 61(3):401–426.
Article Google Scholar
Zaki MJ, 2001b. SPADE: an efficient algorithm for mining frequent sequences. Mach Learn, 42(1–2):31–60. https://doi.org/10.1023/A:1007652502315
Article Google Scholar
Zhang XL, Breitinger F, Baggili I, 2016. Rapid Android parser for investigating DEX files (RAPID). Dig Investig, 17:28–39. https://doi.org/10.1016/j.diin.2016.03.002
Article Google Scholar
Zhang XL, Baggili I, Breitinger F, 2017. Breaking into the vault: privacy, security and forensic analysis of Android vault applications. Comput Secur, 70:516–531. https://doi.org/10.1016/j.cose.2017.07.011
Article Google Scholar
Zhang ZG, Ji GL, Tang MM, 2013. MREclat: an algorithm for parallel mining frequent itemsets. Proc Int Conf on Advanced Cloud and Big Data, p.177–180. https://doi.org/10.1109/CBD.2013.22
Zhao YX, Zhang WX, Li DS, et al., 2016. Pegasus: a distributed and load-balancing fingerprint identification system. Front Inform Technol Electron Eng, 17(8):766–780. https://doi.org/10.1631/FITEE.1500487
Article Google Scholar
Zheng XF, Wang S, 2014. Study on the method of road transport management information data mining based on pruning Eclat algorithm and MapReduce. Proc Soc Behav Sci, 138:757–766. https://doi.org/10.1016/j.sbspro.2014.07.254
Article Google Scholar
Zhou L, Zhong ZY, Chang J, et al., 2010. Balanced parallel FP-growth with MapReduce. Proc. IEEE Youth Conf on Information, Computing and Telecommunications, p.243–246. https://doi.org/10.1109/YCICT.2010.5713090
Zhuang YT, Wu F, Chen C, et al., 2017. Challenges and opportunities: from big data to knowledge in AI 2.0. Front Inform Technol Electron Eng, 18(1):3–14. https://doi.org/10.1631/FITEE.1601883
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Changchun University, Changchun, 130022, China
Zhe-jun Kuang
School of Economics, Changchun University, Changchun, 130022, China
Hang Zhou
School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
Dong-dai Zhou
Division of Engineering Science, University of Toronto, Ontario, M5S2E8, Canada
Jin-peng Zhou
School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO43SQ, UK
Kun Yang

Authors

Zhe-jun Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Hang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Dong-dai Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jin-peng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong-dai Zhou.

Ethics declarations

Zhe-jun KUANG, Hang ZHOU, Dong-dai ZHOU, Jin-peng ZHOU, and Kun YANG declare that they have no conflict of interest.

Additional information

Project supported by the Fundamental Research Funds for the Central Universities, China (No. 2412015KJ005), the Twelfth Five-Year Plan Project of the Education Department of Jilin Province, China (No. 557), and the Thirteenth Five-Year Plan for Scientific Research of the Education Department of Jilin Province, China (No. JJKH20191197KJ)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuang, Zj., Zhou, H., Zhou, Dd. et al. A non-group parallel frequent pattern mining algorithm based on conditional patterns. Frontiers Inf Technol Electronic Eng 20, 1234–1245 (2019). https://doi.org/10.1631/FITEE.1800467

Download citation

Received: 05 August 2018
Accepted: 18 December 2018
Published: 18 October 2019
Issue Date: September 2019
DOI: https://doi.org/10.1631/FITEE.1800467

Key words

CLC number

TP301

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A non-group parallel frequent pattern mining algorithm based on conditional patterns

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A parallel algorithm for mining constrained frequent patterns using MapReduce

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Association Rules Mining in Parallel Conditional Tree

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Subscribe and save

Buy Now

Navigation

A non-group parallel frequent pattern mining algorithm based on conditional patterns

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A parallel algorithm for mining constrained frequent patterns using MapReduce

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Association Rules Mining in Parallel Conditional Tree

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now

Search

Navigation