Load Balancing Approach Parallel Algorithm for Frequent Pattern Mining

Yu, Kun-Ming; Zhou, Jiayi; Hsiao, Wei Chen

doi:10.1007/978-3-540-73940-1_63

Load Balancing Approach Parallel Algorithm for Frequent Pattern Mining

Kun-Ming Yu¹,
Jiayi Zhou² &
Wei Chen Hsiao³

Conference paper

799 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4671))

Abstract

Association rules mining from transaction-oriented databases is an important issue in data mining. Frequent pattern is crucial for association rules generation, time series analysis, classification, etc. There are two categories of algorithms that had been proposed, candidate set generate-and-test approach (Apriori-like) and Pattern growth approach. Many methods had been proposed to solve the association rules mining problem based on FP-tree instead of Apriori-like, since apriori-like algorithm scans the database many times. However, the computation time is costly when the database size is large with FP-tree data structure. Parallel and distributed computing is a good strategy to solve this circumstance. Some parallel algorithms had been proposed, however, most of them did not consider the load balancing issue. In this paper, we proposed a parallel and distributed mining algorithm based on FP-tree structure, Load Balancing FP-Tree (LFP-tree). The algorithm divides the item set for mining by evaluating the tree’s width and depth. Moreover, a simple and trusty calculate formulation for loading degree is proposed. The experimental results show that LFP-tree can reduce the computation time and has less idle time compared with Parallel FP-Tree (PFP-tree). In addition, it has better speed-up ratio than PFP-tree when number of processors grow. The communication time can be reduced by preserving the heavy loading items in their local computing node.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for Mining Association Rules in Large Database. In: Proceedings of the 20th International conference on Very Large Data Base, pp. 487–499 (1994)
Google Scholar
Almaden, I.: Quest synthetic data generation code, http://www.almaden.ibm.com/cs/quest/syndata.html
Coenen, F., Leng, P., Ahmed, S.: Data structure for association rule mining: T-trees and P-trees. IEEE Transactions on Knowledge and Data Engineering 16(6), 774–778 (2004)
Article Google Scholar
Gorodetsky, V., Karasaeyv, O., Samoilov, V.: Multi-agent Technology for Distributed Data Mining and Classification. In: Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology, pp. 438–441 (2003)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. J. of Data Mining and Knowledge Discovery 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Holt, J.D., Chung, S.M.: Parallel mining of association rules from text databases on a cluster of workstations. In: Proceedings of 18th International Symposium on Parallel and Distributed Processing, p. 86 (2004)
Google Scholar
Iko, P., Kitsuregawa, M.: Shared Nothing Parallel Execution of FP-growth. DBSJ Letters 2(1), 43–46 (2003)
Google Scholar
Javed, A., Khokhar, A.: Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel database 16(3), 321–334 (2004)
Article Google Scholar
Li, T., Zhu, S., Ogihara, M.: A New Distributed Data Mining Model Based on Similarity. Symposium on Applied Computing, pp. 432–436 (2003)
Google Scholar
Lin, C.-R., Lee, C.-H., Chen, M.-S., Yu, P.S.: Distributed Data Mining in a Chain Store Database of Short Transactions. In: Conference on Knowledge Discovery in Data, pp. 576–581 (2002)
Google Scholar
Park, J.S., Chen, M.-S., Yu, P.S.: An Effective Hash-Based Algorithm for Mining Association Rules. ACM SIGMOD Record 24(2), 175–186 (1995)
Article Google Scholar
Tang, P., Turkia, M.P.: Parallelizing Frequent Itemset Mining with FP-Trees. Computers and Their Applications, pp. 30–35 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Chung Hua University,
Kun-Ming Yu
Institute of Engineering Science, Chung Hua University,
Jiayi Zhou
Department of Information Management, Chung Hua University,
Wei Chen Hsiao

Authors

Kun-Ming Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiayi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen Hsiao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Victor Malyshkin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, KM., Zhou, J., Hsiao, W.C. (2007). Load Balancing Approach Parallel Algorithm for Frequent Pattern Mining. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2007. Lecture Notes in Computer Science, vol 4671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73940-1_63

Download citation

DOI: https://doi.org/10.1007/978-3-540-73940-1_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73939-5
Online ISBN: 978-3-540-73940-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics