Abstract
Association rule mining (ARM) is a data mining technique to discover interesting associations between datasets. The frequent pattern-growth (FP-growth) is an effective ARM algorithm for compressing information in the tree structure. However, it tends to suffer from the performance gap when processing large databases because of its mining procedure. This study presents a modified FP-growth (MFP-growth) algorithm to enhance the efficiency of the FP-growth by obviating the need for recurrent creation of conditional subtrees. The proposed algorithm uses a header table configuration to reduce the complexity of the whole frequent pattern tree. Four experimental series are conducted using different benchmark datasets to analyze the operating efficiency of the proposed MFP-growth algorithm compared with state-of-the-art machine learning algorithms in terms of runtime, memory consumption, and the effectiveness of generated rules. The experimental results confirm the superiority of the MFP-growth algorithm, which focuses on its potential implementations in various contexts.
Similar content being viewed by others
References
Fisch D, Kalkowski E, Sick B (2014) Knowledge fusion for probabilistic generative classifiers with data mining applications. IEEE Trans Knowl Data Eng 26(3):652–666
Ceglar A, Roddick JF (2006) Association mining. ACM Comput Surv 38:5
Han X, Liu X, Chen J, Lai G, Gao H, Li J (2019) Efficiently mining frequent itemsets on massive data. IEEE Access 7:31409–31421
Coenen F, Leng P, Ahmed S (2004) Data structure for association rule mining: T-trees and P-trees. IEEE Trans Knowl Data Eng 16(6):774–778
Han J, Fu Y (1999) Mining multiple-level association rules in large databases. IEEE Transact Knowl Data Eng 11(5):798–805
Son LH, Chiclana F, Kumar R, Mittal M, Khari M, Chatterjee JM, Baik SW (2018) ARM–AMO: An efficient association rule mining algorithm based on animal migration optimization. Knowl Based Syst 154:68–80
Li T-Y, Li X-M (2011) Preprocessing expert system for mining association rules in telecommunication networks. Expert Syst Appl 38:1709–1715. https://doi.org/10.1016/j.eswa.2010.07.096
Yildirim P, Birant D, Alpyildis T (2017) Discovering the relationships between yarn and fabric properties using association rule mining. Turk J Elect Eng Comput Sci 25:4788–4804. https://doi.org/10.3906/elk-1611-16
Zhang T (2018) Automatic evaluation model of physical education based on association rules algorithm. Wirel Pers Commun. https://doi.org/10.1007/s11277-018-5304-6
Khedr AM, Osamy W, Salim A, Abbas S (2020) A novel association rule-based data mining approach for Internet of Things based wireless sensor networks. IEEE Access 8:151574–151588. https://doi.org/10.1109/ACCESS.2020.3017488
Viger F, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. WIREs Data Mining Knowl Discovery. https://doi.org/10.1002/widm.1207
Sinthuja M, Puviarasan N, Arun P (2019) Comparative analysis of association rule mining algorithms in mining frequent patterns. Int J Adv Comput Res 8:1839–1846
Agrawal R, Mannila H, Srikanth R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (Eds.) Advances in knowledge discovery and data mining, pp. 307–328
Wu H, Lu Z, Pan L, Xu R, Jiang W (2009) An improved apriori based algorithm for association rules mining. In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery, IEEE, vol. 2, pp. 51–55, 2009, https://doi.org/10.1109/FSKD.2009.193
Yabing J (2013) Research of an improved apriori algorithm in data mining association rules. Int J Comput Commun Eng 2(1):25
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large databases, VLDB, vol. 1215, pp. 487–499
Gan W, Lin CW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Mining Knowl Discov 7(6):e1216
Abdel-Hamid NB, ElGhamrawy S, El Desouky A, Arafat H (2018) A dynamic spark-based classification framework for imbalanced big data. J Grid Comput 16(4):607–626
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD International Conference on Management of Data, pp. 1–12
Zhong R, Wang H (2011) Research of commonly used association rules mining algorithm in data mining. In: Proc. IEEE Inter. Conf. Internet Comput. Inf. Services, Hong Kong, pp. 219–222, Sep. 2011
Su T, Xu H, Zhou X (2019) Particle swarm optimization based association rule mining in Big Data environment. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2951195
Zaki MJ (1997) Fast mining of sequential patterns in very large databases. University of Rochester Computer Science Department, New York
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In Data Mining. In: Proc.s IEEE Inter. Conf., IEEE, pp. 441–448
Borgelt C (2005) An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, ACM
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362. https://doi.org/10.1109/TKDE.2005.166
Ke-Chung L, Liao IE, Sheng C (2011) An improved frequent pattern growth method for mining association rules. Expert Syst Appl 38(5):5154
Tanbeer S, Farhan A, Jeong B, Lee Y (2008) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179:559–583
Liu L, Li E (2007) Optimization of frequent itemset mining on multiple-core processor. In: International Conference on Very Large Databases, University of Vienna, Austria, pp.1275–1285
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Conference on Symposium on Operating Systems Design and Implementation
Li H, Wang Y, Zhang D, Zhang M, Chang EY (2009) PFP: parallel FP-growth for query recommendation. In: ACM Conference on Recommender Systems, pp. 107–114
El-Elshafeiy E, El-desouky A (2017) A Big Data framework for mining sensor data using hadoop. Stud Inf Control 26(3):365–376
Zhou S, He J, Yang H, Chen D, Zhang R (2020) Big Data-driven abnormal behavior detection in healthcare based on association rules. IEEE Access 8:129002–129011. https://doi.org/10.1109/ACCESS.2020.3009006
Apache. Apache spark repository, 2016.
Qiu H, Gu R, Yuan C, Huang, Y (2014) YAFIM: a parallel frequent itemset mining algorithm with spark. In: Parallel and Distributed Processing Symposium Workshops, pp. 1664–1671
Zhang F, Liu M, Gui F, Shen W, Shami A, Ma Y (2015) A distributed frequent itemset mining algorithm using spark for big data analytics. Clust Comput 18(4):1493–1501
Niu X, Qian M, Wu C, Hou A (2019) On a parallel spark workflow for frequent itemset mining based on array prefix-tree,” IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), Denver, CO, USA, pp. 50-59, 2019
Ma BLWH, Liu B (1998) Integrating classification and association rule mining,” in Proc. 4th KDD, pp. 80–86
Rajab KD (2019) New associative classification method based on rule pruning for classification of datasets. IEEE Access 7:157783
Sornalakshmi M, Balamurali S, Venkatesulu M et al (2020) Hybrid method for mining rules based on enhanced Apriori algorithm with sequential minimal optimization in healthcare industry. Neural Comput Applic. https://doi.org/10.1007/s00521-020-04862-2
Thurachon W, Kreesuradej W (2021) Incremental association rule mining with a fast incremental updating frequent pattern growth algorithm. IEEE Access 9:55726–55741. https://doi.org/10.1109/ACCESS.2021.3071777
Cheng H, Han J (2009) Pattern-growth methods. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Boston
Weka Data Mining Tool, (1999), http:// www.cs.waikato.ac.nz/ml/weka
UCI.Ucimachinelearningrepository, (2013)
Goethals B, Zaki M (2004) Advances in frequent itemset mining implementations: Report on FIMI'03,” SIGKDD Explorations, pp. 109–117
Borah A, Nath B (2021) Comparative evaluation of pattern mining techniques: an empirical study. Complex Intell. Syst. 7:589–619
ElGhamrawy SM (2016) A knowledge management framework for imbalanced data using frequent pattern mining based on bloom filter. 2016 11th International Conference on Computer Engineering & Systems (ICCES), IEEE, 2016
Hassib EM, El-Desouky A, El-Kenawy S, El-Ghamrawy S (2019) An imbalanced big data mining framework for improving optimization algorithms performance. IEEE Access 7:170774–170795
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Shawkat, M., Badawi, M., El-ghamrawy, S. et al. An optimized FP-growth algorithm for discovery of association rules. J Supercomput 78, 5479–5506 (2022). https://doi.org/10.1007/s11227-021-04066-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04066-y