A parallel algorithm for mining constrained frequent patterns using MapReduce

Yan, Xiaowu; Zhang, Jifu; Xun, Yaling; Qin, Xiao

doi:10.1007/s00500-015-1930-z

A parallel algorithm for mining constrained frequent patterns using MapReduce

Methodologies and Application
Published: 17 November 2015

Volume 21, pages 2237–2249, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Xiaowu Yan¹,
Jifu Zhang¹,
Yaling Xun¹ &
…
Xiao Qin²

653 Accesses
14 Citations
Explore all metrics

Abstract

Constrained frequent pattern refers to a frequent pattern generated using constrained conditions given by users and has characteristics of stronger pertinence, higher practicability and mining efficiency, etc. With the increasing of datasets, there are defects during the construction of the constrained frequent pattern tree, so that the constrained frequent pattern tree is difficult to apply to massive datasets. In this paper, a parallel mining algorithm of the constrained frequent pattern, called PACFP, is proposed using the MapReduce programming model. First, key steps in the algorithm, such as mapping transaction in datasets to frequent item support count, constructing the constrained frequent pattern tree, generating the constrained frequent pattern, and aggregating frequent patterns, are implemented by three pairs of Map and Reduce functions. Second, migration of data recording is achieved by applying a data grouping strategy based on frequent item support, and load balance is effectively solved while generating the constrained frequent pattern. In the end, experimental results validate availability, scalability, and expandability of the algorithm using celestial spectrum datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A non-group parallel frequent pattern mining algorithm based on conditional patterns

Article 01 September 2019

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Article 13 March 2021

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216
Article Google Scholar
Chen CC, Tseng CY, Chen MS (2013) Highly scalable sequential pattern mining based on mapreduce model on the cloud. In: 2013 IEEE international congress on big data (BigData Congress), pp 310–317
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Article Google Scholar
Chen K, Zhang L, Li S, Ke W (2011) Research on association rules parallel algorithm based on fp-growth. In: Information computing and applications, pp 249–256
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Gao Y, Zhu S (2010) Improvement and realization of association rules mining algorithm based on FP-tree. In: 2010 2nd International conference on information science and engineering (ICISE), pp 1264–1266
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2):1–12
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Han J, Kamber M (2006) Data mining. Concepts and techniques. Southeast Asia Edition
Hong S, Huaxuan Z, Shiping C, Chunyan H (2013) The study of improved FP-growth algorithm in MapReduce. In: 1st International workshop on cloud computing and information security
Hui-ling P, Yun-xing S (2012) A new FP-tree-based algorithm MMFI for mining the maximal frequent itemsets. In: 2012 IEEE international conference on computer science and automation engineering (CSAE), vol 2, pp 61–65
Islam ABMR, Chung TS (2011) An improved frequent pattern tree based association rule mining technique. In: 2011 International conference on information science and applications (ICISA), pp 1–8
Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334
Article Google Scholar
Lam C (2010) Hadoop in action. Manning Publications Co
Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, pp 341–355
Li H, Wang Y, Zhang D, Zhang, M, Chang EY (2008) Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM conference on recommender systems, pp 107–114
Rong Z, Xia D, Zhang Z (2013) Complex statistical analysis of big data: implementation and application of apriori and FP-growth algorithm based on MapReduce. In: 2013 4th IEEE international conference on software engineering and service science (ICSESS), pp 968–972
Seki K, Jinno R, Uehara K (2013) Parallel distributed trajectory pattern mining using hierarchical grid with MapReduce. Int J Grid High Perform Comput 5(4):79–96
Article Google Scholar
Tu F, He B (2011) A parallel algorithm for mining association rules based on FP-tree. In: Advances in computer science, environment, ecoinformatics, and education, pp 399–403
Wang HJ, Hu CA (2010) Mining maximal patterns based on improved FP-tree and array technique. In: 2010 Third international symposium on intelligent information technology and security informatics (IITSI), pp 567–571
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc
Yang XY, Liu Z, Fu Y (2010) MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International conference on information sciences and interaction sciences (ICIS), pp 99–102
Zhang J, Zhao X, Zhang S, Yin S, Qin X (2013) Interrelation analysis of celestial spectra data using constrained frequent pattern trees. Knowl Based Syst 41:77–88
Article Google Scholar
Zhou J, Yu KM (2008) Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In: Advances in grid and pervasive computing, pp 18–28
Zhou L, Zhong Z, Chang J, Li J, Huang JZ, Feng S (2010) Balanced parallel fp-growth with mapreduce. In: 2010 IEEE youth conference on information computing and telecommunications (YC-ICT), pp 243–246

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation of P. R. China (61272263) and the Graduate Scientific and Technological Innovation Projects in TYUST (20134027). Xiao Qin’s work is supported by the U.S. National Science Foundation under Grants CCF-0845257(CAREER), CNS-0917137(CSR), and CCF-0742187(CPA).

Author information

Authors and Affiliations

School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan, 030024, China
Xiaowu Yan, Jifu Zhang & Yaling Xun
Department of Science and Software Engineering, Auburn University, Auburn, AL, 36849-5347, USA
Xiao Qin

Authors

Xiaowu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jifu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yaling Xun
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jifu Zhang.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, X., Zhang, J., Xun, Y. et al. A parallel algorithm for mining constrained frequent patterns using MapReduce. Soft Comput 21, 2237–2249 (2017). https://doi.org/10.1007/s00500-015-1930-z

Download citation

Published: 17 November 2015
Issue Date: May 2017
DOI: https://doi.org/10.1007/s00500-015-1930-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parallel algorithm for mining constrained frequent patterns using MapReduce

Abstract

Access this article

Similar content being viewed by others

A non-group parallel frequent pattern mining algorithm based on conditional patterns

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A parallel algorithm for mining constrained frequent patterns using MapReduce

Abstract

Access this article

Similar content being viewed by others

A non-group parallel frequent pattern mining algorithm based on conditional patterns

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation