Skip to main content

Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters

  • Conference paper
Advances in Grid and Pervasive Computing (GPC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5036))

Included in the following conference series:

Abstract

Mining association rules from a transaction-oriented database is a problem in data mining. Frequent patterns are essential for generating associa-tion rules, time series analysis, classification, etc. There are two categories of algorithms for data mining, the generate-and-test approach (Apriori-like) and the pattern growth approach (FP-tree). Recently, many methods have been proposed for solving this problem based on an FP-tree as a replacement for Apriori-like algorithms, because these need to scan the database many times. However, even for the pattern growth method, the execution time takes long when the database is large or the given support is low. Parallel- distributed computing is good strategy for solving this problem. Some parallel algorithms have been proposed, however, the execution time increases rapidly when the database increases or when the given minimum threshold is small. In this study, an efficient parallel- distributed mining algorithm based on an FP-tree structure – the Tidset-based Parallel FP-tree (TPFP-tree) – is proposed. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly choose transactions without scanning databases. The algorithm was verified on a Linux cluster with 16 computing nodes. It was also compared with a PFP-tree algorithm. The dataset generated by IBM’s Quest Synthetic Data Generator to verify the performance of algorithms was used. The experimental results showed that this algorithm can reduce the execution time when the database grows. Moreover, it was also observed that this algorithm had better scalability than the PFP-tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for Mining Association Rules in Large Database. In: Proceedings of the 20th International conference on Very Large Data Base, pp. 487–499 (1994)

    Google Scholar 

  2. Almaden, I.: Quest synthetic data generation code, http://www.almaden.ibm.com/cs/quest/syndata.html

  3. Coenen, F., Leng, P., Ahmed, S.: Data structure for association rule mining: T-trees and P-trees. IEEE Transactions on Knowledge and Data Engineering 16(6), 774–778 (2004)

    Article  Google Scholar 

  4. Gorodetsky, V., Karasaeyv, O., Samoilov, V.: Multi-agent Technology for Distributed Data Mining and Classification. In: Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology, pp. 438–441 (2003)

    Google Scholar 

  5. Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. J. of Data Mining and Knowledge Discovery 8(1), 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  6. Holt, J.D., Chung, S.M.: Parallel mining of association rules from text databases on a cluster of workstations. In: Proceedings of 18th International Symposium on Parallel and Distributed Processing, p. 86 (2004)

    Google Scholar 

  7. Iko, P., Kitsuregawa, M.: Shared Nothing Parallel Execution of FP-growth. DBSJ Letters 2(1), 43–46 (2003)

    Google Scholar 

  8. Javed, A., Khokhar, A.: Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel database 16(3), 321–334 (2004)

    Article  Google Scholar 

  9. Li, T., Zhu, S., Ogihara, M.: A New Distributed Data Mining Model Based on Similarity. In: Symposium on Applied Computing, pp. 432–436 (2003)

    Google Scholar 

  10. Lin, C.-R., Lee, C.-H., Chen, M.-S., Yu, P.S.: Distributed Data Mining in a Chain Store Database of Short Transactions. In: Conference on Knowledge Discovery in Data, pp. 576–581 (2002)

    Google Scholar 

  11. Park, J.S., Chen, M.-S., Yu, P.S.: An Effective Hash-Based Algorithm for Mining Association Rules. ACM SIGMOD Record 24(2), 175–186 (1995)

    Article  Google Scholar 

  12. Tang, P., Turkia, M.P.: Parallelizing Frequent Itemset Mining with FP-Trees. Computers and Their Applications, 30–35 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Song Wu Laurence T. Yang Tony Li Xu

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, J., Yu, KM. (2008). Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters. In: Wu, S., Yang, L.T., Xu, T.L. (eds) Advances in Grid and Pervasive Computing. GPC 2008. Lecture Notes in Computer Science, vol 5036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68083-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68083-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68081-9

  • Online ISBN: 978-3-540-68083-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics