Skip to main content
Log in

An optimized FP-growth algorithm for discovery of association rules

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Association rule mining (ARM) is a data mining technique to discover interesting associations between datasets. The frequent pattern-growth (FP-growth) is an effective ARM algorithm for compressing information in the tree structure. However, it tends to suffer from the performance gap when processing large databases because of its mining procedure. This study presents a modified FP-growth (MFP-growth) algorithm to enhance the efficiency of the FP-growth by obviating the need for recurrent creation of conditional subtrees. The proposed algorithm uses a header table configuration to reduce the complexity of the whole frequent pattern tree. Four experimental series are conducted using different benchmark datasets to analyze the operating efficiency of the proposed MFP-growth algorithm compared with state-of-the-art machine learning algorithms in terms of runtime, memory consumption, and the effectiveness of generated rules. The experimental results confirm the superiority of the MFP-growth algorithm, which focuses on its potential implementations in various contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Fisch D, Kalkowski E, Sick B (2014) Knowledge fusion for probabilistic generative classifiers with data mining applications. IEEE Trans Knowl Data Eng 26(3):652–666

    Article  Google Scholar 

  2. Ceglar A, Roddick JF (2006) Association mining. ACM Comput Surv 38:5

    Article  Google Scholar 

  3. Han X, Liu X, Chen J, Lai G, Gao H, Li J (2019) Efficiently mining frequent itemsets on massive data. IEEE Access 7:31409–31421

    Article  Google Scholar 

  4. Coenen F, Leng P, Ahmed S (2004) Data structure for association rule mining: T-trees and P-trees. IEEE Trans Knowl Data Eng 16(6):774–778

    Article  Google Scholar 

  5. Han J, Fu Y (1999) Mining multiple-level association rules in large databases. IEEE Transact Knowl Data Eng 11(5):798–805

    Article  Google Scholar 

  6. Son LH, Chiclana F, Kumar R, Mittal M, Khari M, Chatterjee JM, Baik SW (2018) ARM–AMO: An efficient association rule mining algorithm based on animal migration optimization. Knowl Based Syst 154:68–80

    Article  Google Scholar 

  7. Li T-Y, Li X-M (2011) Preprocessing expert system for mining association rules in telecommunication networks. Expert Syst Appl 38:1709–1715. https://doi.org/10.1016/j.eswa.2010.07.096

    Article  Google Scholar 

  8. Yildirim P, Birant D, Alpyildis T (2017) Discovering the relationships between yarn and fabric properties using association rule mining. Turk J Elect Eng Comput Sci 25:4788–4804. https://doi.org/10.3906/elk-1611-16

    Article  Google Scholar 

  9. Zhang T (2018) Automatic evaluation model of physical education based on association rules algorithm. Wirel Pers Commun. https://doi.org/10.1007/s11277-018-5304-6

    Article  Google Scholar 

  10. Khedr AM, Osamy W, Salim A, Abbas S (2020) A novel association rule-based data mining approach for Internet of Things based wireless sensor networks. IEEE Access 8:151574–151588. https://doi.org/10.1109/ACCESS.2020.3017488

    Article  Google Scholar 

  11. Viger F, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. WIREs Data Mining Knowl Discovery. https://doi.org/10.1002/widm.1207

    Article  Google Scholar 

  12. Sinthuja M, Puviarasan N, Arun P (2019) Comparative analysis of association rule mining algorithms in mining frequent patterns. Int J Adv Comput Res 8:1839–1846

    Google Scholar 

  13. Agrawal R, Mannila H, Srikanth R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (Eds.) Advances in knowledge discovery and data mining, pp. 307–328

  14. Wu H, Lu Z, Pan L, Xu R, Jiang W (2009) An improved apriori based algorithm for association rules mining. In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery, IEEE, vol. 2, pp. 51–55, 2009, https://doi.org/10.1109/FSKD.2009.193

  15. Yabing J (2013) Research of an improved apriori algorithm in data mining association rules. Int J Comput Commun Eng 2(1):25

    Article  Google Scholar 

  16. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large databases, VLDB, vol. 1215, pp. 487–499

  17. Gan W, Lin CW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Mining Knowl Discov 7(6):e1216

    Google Scholar 

  18. Abdel-Hamid NB, ElGhamrawy S, El Desouky A, Arafat H (2018) A dynamic spark-based classification framework for imbalanced big data. J Grid Comput 16(4):607–626

    Article  Google Scholar 

  19. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD International Conference on Management of Data, pp. 1–12

  20. Zhong R, Wang H (2011) Research of commonly used association rules mining algorithm in data mining. In: Proc. IEEE Inter. Conf. Internet Comput. Inf. Services, Hong Kong, pp. 219–222, Sep. 2011

  21. Su T, Xu H, Zhou X (2019) Particle swarm optimization based association rule mining in Big Data environment. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2951195

    Article  Google Scholar 

  22. Zaki MJ (1997) Fast mining of sequential patterns in very large databases. University of Rochester Computer Science Department, New York

    Google Scholar 

  23. Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In Data Mining. In: Proc.s IEEE Inter. Conf., IEEE, pp. 441–448

  24. Borgelt C (2005) An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, ACM

  25. Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362. https://doi.org/10.1109/TKDE.2005.166

    Article  Google Scholar 

  26. Ke-Chung L, Liao IE, Sheng C (2011) An improved frequent pattern growth method for mining association rules. Expert Syst Appl 38(5):5154

    Article  Google Scholar 

  27. Tanbeer S, Farhan A, Jeong B, Lee Y (2008) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179:559–583

    Article  MathSciNet  Google Scholar 

  28. Liu L, Li E (2007) Optimization of frequent itemset mining on multiple-core processor. In: International Conference on Very Large Databases, University of Vienna, Austria, pp.1275–1285

  29. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Conference on Symposium on Operating Systems Design and Implementation

  30. Li H, Wang Y, Zhang D, Zhang M, Chang EY (2009) PFP: parallel FP-growth for query recommendation. In: ACM Conference on Recommender Systems, pp. 107–114

  31. El-Elshafeiy E, El-desouky A (2017) A Big Data framework for mining sensor data using hadoop. Stud Inf Control 26(3):365–376

    Google Scholar 

  32. Zhou S, He J, Yang H, Chen D, Zhang R (2020) Big Data-driven abnormal behavior detection in healthcare based on association rules. IEEE Access 8:129002–129011. https://doi.org/10.1109/ACCESS.2020.3009006

    Article  Google Scholar 

  33. Apache. Apache spark repository, 2016.

  34. Qiu H, Gu R, Yuan C, Huang, Y (2014) YAFIM: a parallel frequent itemset mining algorithm with spark. In: Parallel and Distributed Processing Symposium Workshops, pp. 1664–1671

  35. Zhang F, Liu M, Gui F, Shen W, Shami A, Ma Y (2015) A distributed frequent itemset mining algorithm using spark for big data analytics. Clust Comput 18(4):1493–1501

    Article  Google Scholar 

  36. Niu X, Qian M, Wu C, Hou A (2019) On a parallel spark workflow for frequent itemset mining based on array prefix-tree,” IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), Denver, CO, USA, pp. 50-59, 2019

  37. Ma BLWH, Liu B (1998) Integrating classification and association rule mining,” in Proc. 4th KDD, pp. 80–86

  38. Rajab KD (2019) New associative classification method based on rule pruning for classification of datasets. IEEE Access 7:157783

    Article  Google Scholar 

  39. Sornalakshmi M, Balamurali S, Venkatesulu M et al (2020) Hybrid method for mining rules based on enhanced Apriori algorithm with sequential minimal optimization in healthcare industry. Neural Comput Applic. https://doi.org/10.1007/s00521-020-04862-2

    Article  Google Scholar 

  40. Thurachon W, Kreesuradej W (2021) Incremental association rule mining with a fast incremental updating frequent pattern growth algorithm. IEEE Access 9:55726–55741. https://doi.org/10.1109/ACCESS.2021.3071777

    Article  Google Scholar 

  41. Cheng H, Han J (2009) Pattern-growth methods. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Boston

    Google Scholar 

  42. Weka Data Mining Tool, (1999), http:// www.cs.waikato.ac.nz/ml/weka

  43. UCI.Ucimachinelearningrepository, (2013)

  44. Goethals B, Zaki M (2004) Advances in frequent itemset mining implementations: Report on FIMI'03,” SIGKDD Explorations, pp. 109–117

  45. Borah A, Nath B (2021) Comparative evaluation of pattern mining techniques: an empirical study. Complex Intell. Syst. 7:589–619

    Article  Google Scholar 

  46. ElGhamrawy SM (2016) A knowledge management framework for imbalanced data using frequent pattern mining based on bloom filter. 2016 11th International Conference on Computer Engineering & Systems (ICCES), IEEE, 2016

  47. Hassib EM, El-Desouky A, El-Kenawy S, El-Ghamrawy S (2019) An imbalanced big data mining framework for improving optimization algorithms performance. IEEE Access 7:170774–170795

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sally El-ghamrawy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 21 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shawkat, M., Badawi, M., El-ghamrawy, S. et al. An optimized FP-growth algorithm for discovery of association rules. J Supercomput 78, 5479–5506 (2022). https://doi.org/10.1007/s11227-021-04066-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04066-y

Keyword

Navigation