Abstract:
Frequent itemsets mining (FIM) plays an important role in many data mining areas. With the explosion of data scale, a number of parallel FIM algorithms have been proposed...Show MoreMetadata
Abstract:
Frequent itemsets mining (FIM) plays an important role in many data mining areas. With the explosion of data scale, a number of parallel FIM algorithms have been proposed. Although existing solutions have outstanding scalability, they suffer from high consumption of CPU and memory for recursively mining frequent itemsets based on a tree-structure. In this paper, we propose a novel parallel algorithm, named PNPFI. It employs three novel key optimizations. In detail, the itemsets are stored by the N-list structure, which is more compact than existing tree-based structure. It uses a new structure, called P-Subsume, to generate some frequent itemsets without the process of N-list intersection. In addition, PNPFI proposes a new load balancing strategy, which intelligently divides a large-scale FIM problem into a set of tasks based on the profiled load of each item. Compared with the state-of-the-art algorithms, experimental results show that PNPFI gets a performance improvement of 39% on average (max to 79%), and reduces the memory usage by 58% on average (max to 90%).
Published in: 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD))
Date of Conference: 09-11 May 2018
Date Added to IEEE Xplore: 16 September 2018
ISBN Information: