short-paper

Improvement parallelization in Apriori Algorithm

Author:
Qiu Huiqi

University of Shanghai for Science and Technology, Shanghai, China

University of Shanghai for Science and Technology, Shanghai, China
View Profile

CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced EducationOctober 2020Pages 235–238https://doi.org/10.1145/3419635.3419712

Published:16 October 2020Publication History

CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education

Pages 235–238

ABSTRACT

When people look for the internal relationship of massive data, the classical Apriori association algorithm can not meet the mining scenario of massive data in terms of algorithm efficiency and I/O performance of data processing process. When mining and analyzing the classical Apriori association algorithm, all transactions are always located in the file system or database, and each iteration needs to scan the transaction database; when mining frequent sets by Apriori algorithm, a large number of project sets will be generated, resulting in excessive I/O overhead and excessive system resources. This paper proposes a distributed apriori algorithm based on MapReduce. The Hadoop distributed file system HDFS is used to automatically realize the distributed storage (fragmentation) of big data. Combined with MapReduce computing framework, map and reduce are used for parallel processing in generating candidate itemsets and computing frequent itemsets respectively. The distributed system is fully utilized to improve the processing ability, At the same time, the monotonicity of frequent itemsets is used to optimize the transaction database. Experimental analysis, the addition of compute nodes can significantly provide mining algorithm performance. At the same time, the improved Apriori algorithm has a great improvement in operating efficiency when it conducts association rule analysis on the data set with a large amount of item indexes.

References

Cui Yan, Bao Zhiqiang, Survey of association rule mining[J]. Application Research of Computers, 2016, 33(2):330--334.Google Scholar
Zhang Zhonglin, Tian Miaofeng, Liu Zongcheng. Parallel hierarchical association rule mining in big data environment[J]. Computer Science, 2016, 43(1):286--289.Google Scholar
Lin Xueyan. MR-Apriori:association rules algorithm based on MapReduce[C]//Proc of the 5th IEEE International Conference on Software Engineering and Service Science. Piscataway, NJ:IEEE Press, 2014:141--144.Google Scholar
Zhou Fachao, Research and Improvement of Apriori Algorithm for Mining Association Rules Journal of Frontiers of Computer Science and Technology[J] 2015, 9(9):1075--1083(in Chinese).Google Scholar
Shinnar A, Cunningham D, Saraswat V, et al. M3:increased performance for inmemory Hadoop jobs[J], Proceedings of the VLDB Endowment, 2012, 5(12):1736--1747.Google ScholarDigital Library
Wang Yonggui;Xie Nan;Qu Haicheng, Partitioned parallel association rules mining algorithm based on storage improvement [J]. Application Research of Computers, 2020, 37(1):167--171Google Scholar

Index Terms

Improvement parallelization in Apriori Algorithm
1. Information systems
  1. Information systems applications
    1. Process control systems

Recommendations

Mining Frequent Itemsets Using Improved Apriori on Spark
ICISDM '19: Proceedings of the 2019 3rd International Conference on Information System and Data Mining

Finding the frequent itemset is one of the most investigated extents of data mining. The Apriori algorithm is the most established algorithm for frequent itemset mining, but it has issues regarding scanning frequent databases and generating a large ...
Read More
The Study on the Application of Data Mining Based on Association Rules
CSNT '12: Proceedings of the 2012 International Conference on Communication Systems and Network Technologies

Association rule mining finds interesting association or correlation relationships among a large set of data items, which is an important task of data mining. Meanwhile, Apriori is an influential algorithm for mining frequent itemsets for Boolean ...
Read More
Mining Association Rules Based on Apriori Algorithm and Application
IFCSTA '09: Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 01

In the data mining research, mining association rules is an important topic. Apriori algorithm submitted by Agrawal and R. Srikant in 1994 is the most effective algorithm. Aimed at two problems of discovering frequent itemsets in a large database and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education
October 2020
527 pages
ISBN:9781450387729
DOI:10.1145/3419635

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Apriori algorithm
Association rule
Data mining
parallelization
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
CIPAE 2020 Paper Acceptance Rate101of216submissions,47%Overall Acceptance Rate101of216submissions,47%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 58
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improvement parallelization in Apriori Algorithm

CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining Frequent Itemsets Using Improved Apriori on Spark

The Study on the Application of Data Mining Based on Association Rules

Mining Association Rules Based on Apriori Algorithm and Application

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improvement parallelization in Apriori Algorithm

CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining Frequent Itemsets Using Improved Apriori on Spark

The Study on the Application of Data Mining Based on Association Rules

Mining Association Rules Based on Apriori Algorithm and Application

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media