skip to main content
10.1145/3523286.3524685acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbicConference Proceedingsconference-collections
research-article

Optimization of Big Data Mining Algorithm Based on Spark Framework: Preparation of Camera-Ready Contributions to SCITEPRESS Proceedings

Published: 31 May 2022 Publication History

Abstract

Abstract: Frequent itemsets mining is the core of association rule mining data. However, with the continuous increase of data, the traditional Apriori algorithm cannot meet people's daily needs, and the algorithm efficiency is low. This paper proposes the Eclat algorithm based on the Spark framework. In view of the shortcomings of serial algorithm in processing big data, it is modified. Using the vertical structure to avoid repetitive traversal of large amounts of data, while computing based on memory can greatly reduce I/O load and reduce computing time. Combined with the pruning strategy, the calculation of irrelevant itemsets is reduced, and the parallel computing capability of the algorithm is improved. The experimental results show that the efficiency of the Eclat algorithm based on the Spark framework is far better than that of the Eclat algorithm, and it has high efficiency and good scalability when processing massive data.

References

[1]
Cong, Y. (2020). Research on Data Association Rules Mining Method Based on Improved Apriori Algorithm. C. IEEE ICBASE 2020, Bangkok, Thailand, October 30 - November 1, 2020,373-376.
[2]
Ding, W. L. (2021). Application of Apriori Algorithm Based on Association Rules in Recommender System. J. Information and Computer. 33, 44-46.
[3]
Feng, X. J., Pan, X. (2019). Parallel Eclat Algorithm Based on Spark. J. Application Research of Computers. 36, 18-20.
[4]
Hu, X. J., Wei, Z. J. (2019). Internet Public Opinion Analysis Based on Apriori Association Rule Mining. C. IEEE IMCEC 2019, Chongqing, China, October 11 - 13, 2019,1855-1858.
[5]
Huang, Q. F., Li, Q. J., Huang, S. Y., Chen, Y. C. (2020). Research on Distributed Parallel Eclat Optimization Algorithm. C. IEEE ICAIBD 2020, Chengdu, China, May 28 - 31, 2020,149-154.
[6]
Li, C. Y., Xin, X., Zhao, S., Feng, S. X. (2021). Sp-IEclat: A Big Data Parallel Association Rule Mining Algorithm. J. Journal of Harbin University of Science and Technology. 26,109-118.
[7]
Mohapatra, D., Tripathy, J., Mohanty, K. K., Kumar Nayak, D. S. (2021). Interpretation of Optimized Hyper Parameters in Associative Rule Learning using Eclat and Apriori. C. IEEE ICCMC 2021, Erode, India, April 8-10, 2021,879-882.
[8]
Pandey, K. K., Shukla, D. (2018). Mining on Relationships in Big Data era using Improve Apriori Algorithm with MapReduce Approach. C. IEEE ICACAT 2018, Bhopal, India, Dec 28 - 29,2018,1-5.
[9]
Robu, V., dos Santos, V. D. (2019). Mining Frequent Patterns in Data Using Apriori and Eclat: A Comparison of the Algorithm Performance and Association Rule Generation. C. IEEE ICSAI 2019, Shanghai, China, November 2 - 4, 2019,1478-1481.
[10]
Shen, C., Zhang, K., Wang, H. F. (2018). System Design of Special Drug Delivery Robot Based On Sensor Node Localization. J. Indian Journal of Pharmaceutical Sciences. 80, 24-35.
[11]
Shishido, H. Y., Estrella, J. C., Toledo, C. F. M. (2018). Genetic-based Algorithms Applied to A Workflow Scheduling Algorithm with Security and Deadline Constraints in Clouds. J. Computers & Electrical Engineering. 69, 378-394.
[12]
Shen, C., Zhang, K., Gao, Q. (2018). Intelligent Power Capsule Micro-Robot Based on UWB Precision Positioning Technology. J. Basic & Clinical Pharmacology & Toxicology. 123, 11-12.
[13]
Thakur, S. K, Bhagat, B., Bhattacharjee, S. (2018). Privacy-Preserving Outsourced Mining of D-Eclat Association Rules on Vertically Partitioned Databases. C. IEEE ICCUBEA 2018, Pune, India, August 16-18, 2018,1-5.
[14]
Wang, H., Jiang, H. Y., Wang, H. X. Yuan, L. N. (2020). Research on an improved algorithm of Apriori based on Hadoop. C. IEEE ISPDS 2020, Xi'an, China, August 14 - 16, 2020,242-245.
[15]
Wang, X. F., Luo, L., Zou, Q. Y., Liu, F. Y., Liu, J. W., Huang, D. (2020). Constructing Naive Bayesian Classification Model by Spark for Big Data. C. IEEE ICCWAMTIP 2020, Chengdu, China, December 18 - 20, 2020,306-309.
[16]
Wang, M., Cui, X. Y. (2015). Optimization of Apriori algorithm based on cloud computing and medical big data. J. Beijing University of Posts and telecommunications.
[17]
Xiao, W., Hu, J., Zhou, X. F. (2018). A review of Parallel Association Rule Mining Algorithms Based on MapReduce Computing Model. J. Application Research of Computers. 35, 13-23.
[18]
Zhang, K., Chen, X. Y., Shen, C. (2017). Intelligent Analysis and Research on Clinical Data of Traditional Chinese Medicine Diagnosis and Treatment of Coronary Heart Disease Based on Data Mining. J. Advances in Engineering Research. 141, 1205-1209.
[19]
Zhang, B., Li, X. H., Yang, Y. K., Ma, Y. N., Geng, J. J., Chen, L. (2019). Experimental Comparative Research on Clustering Algorithm Based on Spark Platform. C. IEEE ICIS 2019, Beijing, China, June 17 - 19, 2019,470-473.
[20]
Zhang, T., Shi, M. L., Wang, J. W., Yang, G. M. (2019). P-EAARM: A Generic Framework Based on Spark for EAs-based Association Rule Mining. C. IEEE ICCCBDA 2019, Chengdu, China, April 12 - 15, 2019,99-104.
[21]
Zhang, K., Chen, C., Gao, Q., Li, Z. (2018). Design of Intelligent Medical Service Robot Drug Delivery System Based on UWB Precise Indoor Positioning Technology. J. Basic & Clinical Pharmacology & Toxicology. 123, 12-13.
[22]
Zhang, K., Chen, C., Bao, M. X. (2021). Research on Optimization Algorithm Based on PDOA. C. IEEE ICCT 2021, Tianjin, China, October 13 - 16, 2021,1427-1430.
[23]
Zhang, K., Chen, C., Li, H. W. (2020). Direction of Arrival Estimation and Robust Adaptive Beamforming with Unfolded Augmented Coprime Array. J. IEEE Access. 8, 22314-22323.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
BIC '22: Proceedings of the 2022 2nd International Conference on Bioinformatics and Intelligent Computing
January 2022
551 pages
ISBN:9781450395755
DOI:10.1145/3523286
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Eclat
  2. Spark framework
  3. association rule mining
  4. big data

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

BIC 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 22
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media