Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters

Zhou, Jiayi; Yu, Kun-Ming

doi:10.1007/978-3-540-68083-3_5

Jiayi Zhou¹ &
Kun-Ming Yu²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5036))

Included in the following conference series:

International Conference on Grid and Pervasive Computing

726 Accesses
20 Citations

Abstract

Mining association rules from a transaction-oriented database is a problem in data mining. Frequent patterns are essential for generating associa-tion rules, time series analysis, classification, etc. There are two categories of algorithms for data mining, the generate-and-test approach (Apriori-like) and the pattern growth approach (FP-tree). Recently, many methods have been proposed for solving this problem based on an FP-tree as a replacement for Apriori-like algorithms, because these need to scan the database many times. However, even for the pattern growth method, the execution time takes long when the database is large or the given support is low. Parallel- distributed computing is good strategy for solving this problem. Some parallel algorithms have been proposed, however, the execution time increases rapidly when the database increases or when the given minimum threshold is small. In this study, an efficient parallel- distributed mining algorithm based on an FP-tree structure – the Tidset-based Parallel FP-tree (TPFP-tree) – is proposed. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly choose transactions without scanning databases. The algorithm was verified on a Linux cluster with 16 computing nodes. It was also compared with a PFP-tree algorithm. The dataset generated by IBM’s Quest Synthetic Data Generator to verify the performance of algorithms was used. The experimental results showed that this algorithm can reduce the execution time when the database grows. Moreover, it was also observed that this algorithm had better scalability than the PFP-tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for Mining Association Rules in Large Database. In: Proceedings of the 20th International conference on Very Large Data Base, pp. 487–499 (1994)
Google Scholar
Almaden, I.: Quest synthetic data generation code, http://www.almaden.ibm.com/cs/quest/syndata.html
Coenen, F., Leng, P., Ahmed, S.: Data structure for association rule mining: T-trees and P-trees. IEEE Transactions on Knowledge and Data Engineering 16(6), 774–778 (2004)
Article Google Scholar
Gorodetsky, V., Karasaeyv, O., Samoilov, V.: Multi-agent Technology for Distributed Data Mining and Classification. In: Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology, pp. 438–441 (2003)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. J. of Data Mining and Knowledge Discovery 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Holt, J.D., Chung, S.M.: Parallel mining of association rules from text databases on a cluster of workstations. In: Proceedings of 18th International Symposium on Parallel and Distributed Processing, p. 86 (2004)
Google Scholar
Iko, P., Kitsuregawa, M.: Shared Nothing Parallel Execution of FP-growth. DBSJ Letters 2(1), 43–46 (2003)
Google Scholar
Javed, A., Khokhar, A.: Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel database 16(3), 321–334 (2004)
Article Google Scholar
Li, T., Zhu, S., Ogihara, M.: A New Distributed Data Mining Model Based on Similarity. In: Symposium on Applied Computing, pp. 432–436 (2003)
Google Scholar
Lin, C.-R., Lee, C.-H., Chen, M.-S., Yu, P.S.: Distributed Data Mining in a Chain Store Database of Short Transactions. In: Conference on Knowledge Discovery in Data, pp. 576–581 (2002)
Google Scholar
Park, J.S., Chen, M.-S., Yu, P.S.: An Effective Hash-Based Algorithm for Mining Association Rules. ACM SIGMOD Record 24(2), 175–186 (1995)
Article Google Scholar
Tang, P., Turkia, M.P.: Parallelizing Frequent Itemset Mining with FP-Trees. Computers and Their Applications, 30–35 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Engineering Science, Chung Hua University,
Jiayi Zhou
Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, 300, Taiwan
Kun-Ming Yu

Authors

Jiayi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Ming Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Song Wu Laurence T. Yang Tony Li Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, J., Yu, KM. (2008). Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters. In: Wu, S., Yang, L.T., Xu, T.L. (eds) Advances in Grid and Pervasive Computing. GPC 2008. Lecture Notes in Computer Science, vol 5036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68083-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-68083-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68081-9
Online ISBN: 978-3-540-68083-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics