ABSTRACT
When people look for the internal relationship of massive data, the classical Apriori association algorithm can not meet the mining scenario of massive data in terms of algorithm efficiency and I/O performance of data processing process. When mining and analyzing the classical Apriori association algorithm, all transactions are always located in the file system or database, and each iteration needs to scan the transaction database; when mining frequent sets by Apriori algorithm, a large number of project sets will be generated, resulting in excessive I/O overhead and excessive system resources. This paper proposes a distributed apriori algorithm based on MapReduce. The Hadoop distributed file system HDFS is used to automatically realize the distributed storage (fragmentation) of big data. Combined with MapReduce computing framework, map and reduce are used for parallel processing in generating candidate itemsets and computing frequent itemsets respectively. The distributed system is fully utilized to improve the processing ability, At the same time, the monotonicity of frequent itemsets is used to optimize the transaction database. Experimental analysis, the addition of compute nodes can significantly provide mining algorithm performance. At the same time, the improved Apriori algorithm has a great improvement in operating efficiency when it conducts association rule analysis on the data set with a large amount of item indexes.
- Cui Yan, Bao Zhiqiang, Survey of association rule mining[J]. Application Research of Computers, 2016, 33(2):330--334.Google Scholar
- Zhang Zhonglin, Tian Miaofeng, Liu Zongcheng. Parallel hierarchical association rule mining in big data environment[J]. Computer Science, 2016, 43(1):286--289.Google Scholar
- Lin Xueyan. MR-Apriori:association rules algorithm based on MapReduce[C]//Proc of the 5th IEEE International Conference on Software Engineering and Service Science. Piscataway, NJ:IEEE Press, 2014:141--144.Google Scholar
- Zhou Fachao, Research and Improvement of Apriori Algorithm for Mining Association Rules Journal of Frontiers of Computer Science and Technology[J] 2015, 9(9):1075--1083(in Chinese).Google Scholar
- Shinnar A, Cunningham D, Saraswat V, et al. M3:increased performance for inmemory Hadoop jobs[J], Proceedings of the VLDB Endowment, 2012, 5(12):1736--1747.Google ScholarDigital Library
- Wang Yonggui;Xie Nan;Qu Haicheng, Partitioned parallel association rules mining algorithm based on storage improvement [J]. Application Research of Computers, 2020, 37(1):167--171Google Scholar
Index Terms
- Improvement parallelization in Apriori Algorithm
Recommendations
Mining Frequent Itemsets Using Improved Apriori on Spark
ICISDM '19: Proceedings of the 2019 3rd International Conference on Information System and Data MiningFinding the frequent itemset is one of the most investigated extents of data mining. The Apriori algorithm is the most established algorithm for frequent itemset mining, but it has issues regarding scanning frequent databases and generating a large ...
The Study on the Application of Data Mining Based on Association Rules
CSNT '12: Proceedings of the 2012 International Conference on Communication Systems and Network TechnologiesAssociation rule mining finds interesting association or correlation relationships among a large set of data items, which is an important task of data mining. Meanwhile, Apriori is an influential algorithm for mining frequent itemsets for Boolean ...
Mining Association Rules Based on Apriori Algorithm and Application
IFCSTA '09: Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 01In the data mining research, mining association rules is an important topic. Apriori algorithm submitted by Agrawal and R. Srikant in 1994 is the most effective algorithm. Aimed at two problems of discovering frequent itemsets in a large database and ...
Comments