An optimized FP-growth algorithm for discovery of association rules

Shawkat, Mai; Badawi, Mahmoud; El-ghamrawy, Sally; Arnous, Reham; El-desoky, Ali

doi:10.1007/s11227-021-04066-y

An optimized FP-growth algorithm for discovery of association rules

Published: 24 September 2021

Volume 78, pages 5479–5506, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Mai Shawkat¹,
Mahmoud Badawi^2,3,
Sally El-ghamrawy ORCID: orcid.org/0000-0002-5430-390X⁴,
Reham Arnous⁵ &
…
Ali El-desoky³

1133 Accesses
14 Citations
Explore all metrics

Abstract

Association rule mining (ARM) is a data mining technique to discover interesting associations between datasets. The frequent pattern-growth (FP-growth) is an effective ARM algorithm for compressing information in the tree structure. However, it tends to suffer from the performance gap when processing large databases because of its mining procedure. This study presents a modified FP-growth (MFP-growth) algorithm to enhance the efficiency of the FP-growth by obviating the need for recurrent creation of conditional subtrees. The proposed algorithm uses a header table configuration to reduce the complexity of the whole frequent pattern tree. Four experimental series are conducted using different benchmark datasets to analyze the operating efficiency of the proposed MFP-growth algorithm compared with state-of-the-art machine learning algorithms in terms of runtime, memory consumption, and the effectiveness of generated rules. The experimental results confirm the superiority of the MFP-growth algorithm, which focuses on its potential implementations in various contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

A comprehensive survey of data mining

Article 06 February 2020

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

References

Fisch D, Kalkowski E, Sick B (2014) Knowledge fusion for probabilistic generative classifiers with data mining applications. IEEE Trans Knowl Data Eng 26(3):652–666
Article Google Scholar
Ceglar A, Roddick JF (2006) Association mining. ACM Comput Surv 38:5
Article Google Scholar
Han X, Liu X, Chen J, Lai G, Gao H, Li J (2019) Efficiently mining frequent itemsets on massive data. IEEE Access 7:31409–31421
Article Google Scholar
Coenen F, Leng P, Ahmed S (2004) Data structure for association rule mining: T-trees and P-trees. IEEE Trans Knowl Data Eng 16(6):774–778
Article Google Scholar
Han J, Fu Y (1999) Mining multiple-level association rules in large databases. IEEE Transact Knowl Data Eng 11(5):798–805
Article Google Scholar
Son LH, Chiclana F, Kumar R, Mittal M, Khari M, Chatterjee JM, Baik SW (2018) ARM–AMO: An efficient association rule mining algorithm based on animal migration optimization. Knowl Based Syst 154:68–80
Article Google Scholar
Li T-Y, Li X-M (2011) Preprocessing expert system for mining association rules in telecommunication networks. Expert Syst Appl 38:1709–1715. https://doi.org/10.1016/j.eswa.2010.07.096
Article Google Scholar
Yildirim P, Birant D, Alpyildis T (2017) Discovering the relationships between yarn and fabric properties using association rule mining. Turk J Elect Eng Comput Sci 25:4788–4804. https://doi.org/10.3906/elk-1611-16
Article Google Scholar
Zhang T (2018) Automatic evaluation model of physical education based on association rules algorithm. Wirel Pers Commun. https://doi.org/10.1007/s11277-018-5304-6
Article Google Scholar
Khedr AM, Osamy W, Salim A, Abbas S (2020) A novel association rule-based data mining approach for Internet of Things based wireless sensor networks. IEEE Access 8:151574–151588. https://doi.org/10.1109/ACCESS.2020.3017488
Article Google Scholar
Viger F, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. WIREs Data Mining Knowl Discovery. https://doi.org/10.1002/widm.1207
Article Google Scholar
Sinthuja M, Puviarasan N, Arun P (2019) Comparative analysis of association rule mining algorithms in mining frequent patterns. Int J Adv Comput Res 8:1839–1846
Google Scholar
Agrawal R, Mannila H, Srikanth R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (Eds.) Advances in knowledge discovery and data mining, pp. 307–328
Wu H, Lu Z, Pan L, Xu R, Jiang W (2009) An improved apriori based algorithm for association rules mining. In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery, IEEE, vol. 2, pp. 51–55, 2009, https://doi.org/10.1109/FSKD.2009.193
Yabing J (2013) Research of an improved apriori algorithm in data mining association rules. Int J Comput Commun Eng 2(1):25
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large databases, VLDB, vol. 1215, pp. 487–499
Gan W, Lin CW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Mining Knowl Discov 7(6):e1216
Google Scholar
Abdel-Hamid NB, ElGhamrawy S, El Desouky A, Arafat H (2018) A dynamic spark-based classification framework for imbalanced big data. J Grid Comput 16(4):607–626
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD International Conference on Management of Data, pp. 1–12
Zhong R, Wang H (2011) Research of commonly used association rules mining algorithm in data mining. In: Proc. IEEE Inter. Conf. Internet Comput. Inf. Services, Hong Kong, pp. 219–222, Sep. 2011
Su T, Xu H, Zhou X (2019) Particle swarm optimization based association rule mining in Big Data environment. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2951195
Article Google Scholar
Zaki MJ (1997) Fast mining of sequential patterns in very large databases. University of Rochester Computer Science Department, New York
Google Scholar
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In Data Mining. In: Proc.s IEEE Inter. Conf., IEEE, pp. 441–448
Borgelt C (2005) An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, ACM
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362. https://doi.org/10.1109/TKDE.2005.166
Article Google Scholar
Ke-Chung L, Liao IE, Sheng C (2011) An improved frequent pattern growth method for mining association rules. Expert Syst Appl 38(5):5154
Article Google Scholar
Tanbeer S, Farhan A, Jeong B, Lee Y (2008) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179:559–583
Article MathSciNet Google Scholar
Liu L, Li E (2007) Optimization of frequent itemset mining on multiple-core processor. In: International Conference on Very Large Databases, University of Vienna, Austria, pp.1275–1285
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Conference on Symposium on Operating Systems Design and Implementation
Li H, Wang Y, Zhang D, Zhang M, Chang EY (2009) PFP: parallel FP-growth for query recommendation. In: ACM Conference on Recommender Systems, pp. 107–114
El-Elshafeiy E, El-desouky A (2017) A Big Data framework for mining sensor data using hadoop. Stud Inf Control 26(3):365–376
Google Scholar
Zhou S, He J, Yang H, Chen D, Zhang R (2020) Big Data-driven abnormal behavior detection in healthcare based on association rules. IEEE Access 8:129002–129011. https://doi.org/10.1109/ACCESS.2020.3009006
Article Google Scholar
Apache. Apache spark repository, 2016.
Qiu H, Gu R, Yuan C, Huang, Y (2014) YAFIM: a parallel frequent itemset mining algorithm with spark. In: Parallel and Distributed Processing Symposium Workshops, pp. 1664–1671
Zhang F, Liu M, Gui F, Shen W, Shami A, Ma Y (2015) A distributed frequent itemset mining algorithm using spark for big data analytics. Clust Comput 18(4):1493–1501
Article Google Scholar
Niu X, Qian M, Wu C, Hou A (2019) On a parallel spark workflow for frequent itemset mining based on array prefix-tree,” IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), Denver, CO, USA, pp. 50-59, 2019
Ma BLWH, Liu B (1998) Integrating classification and association rule mining,” in Proc. 4th KDD, pp. 80–86
Rajab KD (2019) New associative classification method based on rule pruning for classification of datasets. IEEE Access 7:157783
Article Google Scholar
Sornalakshmi M, Balamurali S, Venkatesulu M et al (2020) Hybrid method for mining rules based on enhanced Apriori algorithm with sequential minimal optimization in healthcare industry. Neural Comput Applic. https://doi.org/10.1007/s00521-020-04862-2
Article Google Scholar
Thurachon W, Kreesuradej W (2021) Incremental association rule mining with a fast incremental updating frequent pattern growth algorithm. IEEE Access 9:55726–55741. https://doi.org/10.1109/ACCESS.2021.3071777
Article Google Scholar
Cheng H, Han J (2009) Pattern-growth methods. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Boston
Google Scholar
Weka Data Mining Tool, (1999), http:// www.cs.waikato.ac.nz/ml/weka
UCI.Ucimachinelearningrepository, (2013)
Goethals B, Zaki M (2004) Advances in frequent itemset mining implementations: Report on FIMI'03,” SIGKDD Explorations, pp. 109–117
Borah A, Nath B (2021) Comparative evaluation of pattern mining techniques: an empirical study. Complex Intell. Syst. 7:589–619
Article Google Scholar
ElGhamrawy SM (2016) A knowledge management framework for imbalanced data using frequent pattern mining based on bloom filter. 2016 11th International Conference on Computer Engineering & Systems (ICCES), IEEE, 2016
Hassib EM, El-Desouky A, El-Kenawy S, El-Ghamrawy S (2019) An imbalanced big data mining framework for improving optimization algorithms performance. IEEE Access 7:170774–170795
Article Google Scholar

Download references

Author information

Authors and Affiliations

Communications and Information Engineering Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Mai Shawkat
Department of Computer Science and Informatics, Taibah University, Medina, Saudi Arabia
Mahmoud Badawi
Department of Computer Engineering and Systems, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Mahmoud Badawi & Ali El-desoky
Computer Engineering Department, MISR Higher Institute for Engineering and Technology, Scientific Research Group in Egypt (SRGE), Mansoura, Egypt
Sally El-ghamrawy
Computer and Systems Engineering Department, Delta Higher Institute for Engineering and Technology (DHIET), Mansoura, Egypt
Reham Arnous

Authors

Mai Shawkat
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud Badawi
View author publications
You can also search for this author in PubMed Google Scholar
Sally El-ghamrawy
View author publications
You can also search for this author in PubMed Google Scholar
Reham Arnous
View author publications
You can also search for this author in PubMed Google Scholar
Ali El-desoky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sally El-ghamrawy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 21 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shawkat, M., Badawi, M., El-ghamrawy, S. et al. An optimized FP-growth algorithm for discovery of association rules. J Supercomput 78, 5479–5506 (2022). https://doi.org/10.1007/s11227-021-04066-y

Download citation

Accepted: 01 September 2021
Published: 24 September 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11227-021-04066-y

Keyword

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimized FP-growth algorithm for discovery of association rules

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A comprehensive survey of data mining

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 21 KB)

Rights and permissions

About this article

Cite this article

Keyword

Navigation

An optimized FP-growth algorithm for discovery of association rules

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A comprehensive survey of data mining

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 21 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Search

Navigation