Abstract
Constrained frequent pattern refers to a frequent pattern generated using constrained conditions given by users and has characteristics of stronger pertinence, higher practicability and mining efficiency, etc. With the increasing of datasets, there are defects during the construction of the constrained frequent pattern tree, so that the constrained frequent pattern tree is difficult to apply to massive datasets. In this paper, a parallel mining algorithm of the constrained frequent pattern, called PACFP, is proposed using the MapReduce programming model. First, key steps in the algorithm, such as mapping transaction in datasets to frequent item support count, constructing the constrained frequent pattern tree, generating the constrained frequent pattern, and aggregating frequent patterns, are implemented by three pairs of Map and Reduce functions. Second, migration of data recording is achieved by applying a data grouping strategy based on frequent item support, and load balance is effectively solved while generating the constrained frequent pattern. In the end, experimental results validate availability, scalability, and expandability of the algorithm using celestial spectrum datasets.
Similar content being viewed by others
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216
Chen CC, Tseng CY, Chen MS (2013) Highly scalable sequential pattern mining based on mapreduce model on the cloud. In: 2013 IEEE international congress on big data (BigData Congress), pp 310–317
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Chen K, Zhang L, Li S, Ke W (2011) Research on association rules parallel algorithm based on fp-growth. In: Information computing and applications, pp 249–256
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Gao Y, Zhu S (2010) Improvement and realization of association rules mining algorithm based on FP-tree. In: 2010 2nd International conference on information science and engineering (ICISE), pp 1264–1266
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2):1–12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8(1):53–87
Han J, Kamber M (2006) Data mining. Concepts and techniques. Southeast Asia Edition
Hong S, Huaxuan Z, Shiping C, Chunyan H (2013) The study of improved FP-growth algorithm in MapReduce. In: 1st International workshop on cloud computing and information security
Hui-ling P, Yun-xing S (2012) A new FP-tree-based algorithm MMFI for mining the maximal frequent itemsets. In: 2012 IEEE international conference on computer science and automation engineering (CSAE), vol 2, pp 61–65
Islam ABMR, Chung TS (2011) An improved frequent pattern tree based association rule mining technique. In: 2011 International conference on information science and applications (ICISA), pp 1–8
Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334
Lam C (2010) Hadoop in action. Manning Publications Co
Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, pp 341–355
Li H, Wang Y, Zhang D, Zhang, M, Chang EY (2008) Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM conference on recommender systems, pp 107–114
Rong Z, Xia D, Zhang Z (2013) Complex statistical analysis of big data: implementation and application of apriori and FP-growth algorithm based on MapReduce. In: 2013 4th IEEE international conference on software engineering and service science (ICSESS), pp 968–972
Seki K, Jinno R, Uehara K (2013) Parallel distributed trajectory pattern mining using hierarchical grid with MapReduce. Int J Grid High Perform Comput 5(4):79–96
Tu F, He B (2011) A parallel algorithm for mining association rules based on FP-tree. In: Advances in computer science, environment, ecoinformatics, and education, pp 399–403
Wang HJ, Hu CA (2010) Mining maximal patterns based on improved FP-tree and array technique. In: 2010 Third international symposium on intelligent information technology and security informatics (IITSI), pp 567–571
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc
Yang XY, Liu Z, Fu Y (2010) MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International conference on information sciences and interaction sciences (ICIS), pp 99–102
Zhang J, Zhao X, Zhang S, Yin S, Qin X (2013) Interrelation analysis of celestial spectra data using constrained frequent pattern trees. Knowl Based Syst 41:77–88
Zhou J, Yu KM (2008) Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In: Advances in grid and pervasive computing, pp 18–28
Zhou L, Zhong Z, Chang J, Li J, Huang JZ, Feng S (2010) Balanced parallel fp-growth with mapreduce. In: 2010 IEEE youth conference on information computing and telecommunications (YC-ICT), pp 243–246
Acknowledgments
This work is partially supported by the National Natural Science Foundation of P. R. China (61272263) and the Graduate Scientific and Technological Innovation Projects in TYUST (20134027). Xiao Qin’s work is supported by the U.S. National Science Foundation under Grants CCF-0845257(CAREER), CNS-0917137(CSR), and CCF-0742187(CPA).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Yan, X., Zhang, J., Xun, Y. et al. A parallel algorithm for mining constrained frequent patterns using MapReduce. Soft Comput 21, 2237–2249 (2017). https://doi.org/10.1007/s00500-015-1930-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1930-z