Skip to main content
Log in

A parallel algorithm for mining constrained frequent patterns using MapReduce

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Constrained frequent pattern refers to a frequent pattern generated using constrained conditions given by users and has characteristics of stronger pertinence, higher practicability and mining efficiency, etc. With the increasing of datasets, there are defects during the construction of the constrained frequent pattern tree, so that the constrained frequent pattern tree is difficult to apply to massive datasets. In this paper, a parallel mining algorithm of the constrained frequent pattern, called PACFP, is proposed using the MapReduce programming model. First, key steps in the algorithm, such as mapping transaction in datasets to frequent item support count, constructing the constrained frequent pattern tree, generating the constrained frequent pattern, and aggregating frequent patterns, are implemented by three pairs of Map and Reduce functions. Second, migration of data recording is achieved by applying a data grouping strategy based on frequent item support, and load balance is effectively solved while generating the constrained frequent pattern. In the end, experimental results validate availability, scalability, and expandability of the algorithm using celestial spectrum datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216

    Article  Google Scholar 

  • Chen CC, Tseng CY, Chen MS (2013) Highly scalable sequential pattern mining based on mapreduce model on the cloud. In: 2013 IEEE international congress on big data (BigData Congress), pp 310–317

  • Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883

    Article  Google Scholar 

  • Chen K, Zhang L, Li S, Ke W (2011) Research on association rules parallel algorithm based on fp-growth. In: Information computing and applications, pp 249–256

  • Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Gao Y, Zhu S (2010) Improvement and realization of association rules mining algorithm based on FP-tree. In: 2010 2nd International conference on information science and engineering (ICISE), pp 1264–1266

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2):1–12

    Article  Google Scholar 

  • Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  • Han J, Kamber M (2006) Data mining. Concepts and techniques. Southeast Asia Edition

  • Hong S, Huaxuan Z, Shiping C, Chunyan H (2013) The study of improved FP-growth algorithm in MapReduce. In: 1st International workshop on cloud computing and information security

  • Hui-ling P, Yun-xing S (2012) A new FP-tree-based algorithm MMFI for mining the maximal frequent itemsets. In: 2012 IEEE international conference on computer science and automation engineering (CSAE), vol 2, pp 61–65

  • Islam ABMR, Chung TS (2011) An improved frequent pattern tree based association rule mining technique. In: 2011 International conference on information science and applications (ICISA), pp 1–8

  • Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334

    Article  Google Scholar 

  • Lam C (2010) Hadoop in action. Manning Publications Co

  • Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, pp 341–355

  • Li H, Wang Y, Zhang D, Zhang, M, Chang EY (2008) Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM conference on recommender systems, pp 107–114

  • Rong Z, Xia D, Zhang Z (2013) Complex statistical analysis of big data: implementation and application of apriori and FP-growth algorithm based on MapReduce. In: 2013 4th IEEE international conference on software engineering and service science (ICSESS), pp 968–972

  • Seki K, Jinno R, Uehara K (2013) Parallel distributed trajectory pattern mining using hierarchical grid with MapReduce. Int J Grid High Perform Comput 5(4):79–96

    Article  Google Scholar 

  • Tu F, He B (2011) A parallel algorithm for mining association rules based on FP-tree. In: Advances in computer science, environment, ecoinformatics, and education, pp 399–403

  • Wang HJ, Hu CA (2010) Mining maximal patterns based on improved FP-tree and array technique. In: 2010 Third international symposium on intelligent information technology and security informatics (IITSI), pp 567–571

  • White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc

  • Yang XY, Liu Z, Fu Y (2010) MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International conference on information sciences and interaction sciences (ICIS), pp 99–102

  • Zhang J, Zhao X, Zhang S, Yin S, Qin X (2013) Interrelation analysis of celestial spectra data using constrained frequent pattern trees. Knowl Based Syst 41:77–88

    Article  Google Scholar 

  • Zhou J, Yu KM (2008) Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In: Advances in grid and pervasive computing, pp 18–28

  • Zhou L, Zhong Z, Chang J, Li J, Huang JZ, Feng S (2010) Balanced parallel fp-growth with mapreduce. In: 2010 IEEE youth conference on information computing and telecommunications (YC-ICT), pp 243–246

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation of P. R. China (61272263) and the Graduate Scientific and Technological Innovation Projects in TYUST (20134027). Xiao Qin’s work is supported by the U.S. National Science Foundation under Grants CCF-0845257(CAREER), CNS-0917137(CSR), and CCF-0742187(CPA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jifu Zhang.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, X., Zhang, J., Xun, Y. et al. A parallel algorithm for mining constrained frequent patterns using MapReduce. Soft Comput 21, 2237–2249 (2017). https://doi.org/10.1007/s00500-015-1930-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1930-z

Keywords

Navigation