skip to main content
10.1145/3419635.3419712acmotherconferencesArticle/Chapter ViewAbstractPublication PagescipaeConference Proceedingsconference-collections
short-paper

Improvement parallelization in Apriori Algorithm

Authors Info & Claims
Published:16 October 2020Publication History

ABSTRACT

When people look for the internal relationship of massive data, the classical Apriori association algorithm can not meet the mining scenario of massive data in terms of algorithm efficiency and I/O performance of data processing process. When mining and analyzing the classical Apriori association algorithm, all transactions are always located in the file system or database, and each iteration needs to scan the transaction database; when mining frequent sets by Apriori algorithm, a large number of project sets will be generated, resulting in excessive I/O overhead and excessive system resources. This paper proposes a distributed apriori algorithm based on MapReduce. The Hadoop distributed file system HDFS is used to automatically realize the distributed storage (fragmentation) of big data. Combined with MapReduce computing framework, map and reduce are used for parallel processing in generating candidate itemsets and computing frequent itemsets respectively. The distributed system is fully utilized to improve the processing ability, At the same time, the monotonicity of frequent itemsets is used to optimize the transaction database. Experimental analysis, the addition of compute nodes can significantly provide mining algorithm performance. At the same time, the improved Apriori algorithm has a great improvement in operating efficiency when it conducts association rule analysis on the data set with a large amount of item indexes.

References

  1. Cui Yan, Bao Zhiqiang, Survey of association rule mining[J]. Application Research of Computers, 2016, 33(2):330--334.Google ScholarGoogle Scholar
  2. Zhang Zhonglin, Tian Miaofeng, Liu Zongcheng. Parallel hierarchical association rule mining in big data environment[J]. Computer Science, 2016, 43(1):286--289.Google ScholarGoogle Scholar
  3. Lin Xueyan. MR-Apriori:association rules algorithm based on MapReduce[C]//Proc of the 5th IEEE International Conference on Software Engineering and Service Science. Piscataway, NJ:IEEE Press, 2014:141--144.Google ScholarGoogle Scholar
  4. Zhou Fachao, Research and Improvement of Apriori Algorithm for Mining Association Rules Journal of Frontiers of Computer Science and Technology[J] 2015, 9(9):1075--1083(in Chinese).Google ScholarGoogle Scholar
  5. Shinnar A, Cunningham D, Saraswat V, et al. M3:increased performance for inmemory Hadoop jobs[J], Proceedings of the VLDB Endowment, 2012, 5(12):1736--1747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Wang Yonggui;Xie Nan;Qu Haicheng, Partitioned parallel association rules mining algorithm based on storage improvement [J]. Application Research of Computers, 2020, 37(1):167--171Google ScholarGoogle Scholar

Index Terms

  1. Improvement parallelization in Apriori Algorithm

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education
      October 2020
      527 pages
      ISBN:9781450387729
      DOI:10.1145/3419635

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed limited

      Acceptance Rates

      CIPAE 2020 Paper Acceptance Rate101of216submissions,47%Overall Acceptance Rate101of216submissions,47%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader