Frequent Pattern Mining on Message Passing Multiprocessor Systems

Javed, Asif; Khokhar, Ashfaq

doi:10.1023/B:DAPD.0000031634.19130.bd

Frequent Pattern Mining on Message Passing Multiprocessor Systems

Published: November 2004

Volume 16, pages 321–334, (2004)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Asif Javed¹ &
Ashfaq Khokhar¹

289 Accesses
60 Citations
3 Altmetric
Explore all metrics

Abstract

Extraction of frequent patterns in transaction-oriented database is crucial to several data mining tasks such as association rule generation, time series analysis, classification, etc. Most of these mining tasks require multiple passes over the database and if the database size is large, which is usually the case, scalable high performance solutions involving multiple processors are required. This paper presents an efficient scalable parallel algorithm for mining frequent patterns on parallel shared nothing platforms. The proposed algorithm is based on one of the best known sequential techniques referred to as Frequent Pattern (FP) Growth algorithm. Unlike most of the earlier parallel approaches based on different variants of the Apriori Algorithm, the algorithm presented in this paper does not explicitly result in having entire counting data structure duplicated on each processor. Furthermore, the proposed algorithm introduces minimum communication (and hence synchronization) overheads by efficiently partitioning the list of frequent elements list over processors. The experimental results show scalable performance over different machine and problem sizes. The comparison of implementation results with existing parallel approaches show significant gains in the speedup. On an 8-processor machine, we report an average speedup of 6 for different problem sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R. Agarwal, C. Aggarwal, and V.V.V. Prasad, “A tree projection algorithm for generation of frequent item sets,” J. Parallel and Distributed Computing, 2000.
R. Agrawal and J.C. Shafer, “Parallel mining of association rules,” in TKDE'96, vol. 8, no. 6, pp. 962–969, 1996.
Google Scholar
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in VLDB'94, 1994, pp. 487–499.
R. Agrawal and R. Srikant, “Mining sequential patterns,” in ICDE'95, 1995, pp. 3–14.
I. Almaden, “Quest synthetic data generation code,” http://www.almaden.ibm.com/cs/quest/syndata.html.
R.J. Bayardo, “Efficiently mining long patterns from databases,” in SIGMOD'98, 1998, pp. 85–93.
S. Brin, R. Motwani, and C. Silverstein, “Beyond market basket: Generalizing association rules to correlations,” in SIGMOD'97, 1997, pp. 265–276.
D.W. Cheung, J. Han, V.T. Ng, A.W. Fu, and Y. Fu, “A fast distributed algorithm for mining association rules,” in PDIS' 1996.
G. Dong and J. Li, “Efficient mining for emerging patterns: Discovering trends and differences,” in KDD'99, 1999, pp. 106–115.
S. Hambrusch, F. Hameed, and A. Khokhar, “Communication operations on coarse grained Mesh architectures,” Parallel Computing, vol. 21, pp. 731–751, 1995.
Article Google Scholar
J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in Proc. 2000 ACM-SIGMOD, 2000, pp. 1–12.
E. Hong, G. Karypis, and V. Kumar, “Scalable parallel data mining for association rules,” TKDE'2000, vol. 12, no. 3, pp. 337–352, 2000.
Google Scholar
M. Kamber, J. Han, and J.Y. Chiang, “Metarule-guided mining of multi-dimensional association rules using data cubes,” in KDD'97, 1997, pp. 207–210.
B. Lent, A. Swami, and J. Widom, “Clustering association rules,” in ICDE'97, 1997, pp. 220–231.
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, May 1994. Available at http://www.mpi-forum.org.
J.S. Park, M.S. Chen, and P.S. Yu, “An effective hash-based algorithm for mining association rules,” in SIGMOD'95, 1995, pp. 175–186.
S. Parthasarathy, M.J. Zaki, M. Ogihara, and W. Li, “Parallel data mining for association rules on shared-memory systems,” KAIS'2001, vol. 3, no. 1, pp. 1–29, 2001.
Google Scholar
R. Relue and X. Wu, “Rule generation with the pattern repository,” in ICAIS'2002, 2002, pp. 186–191.
C. Silverstein, S. Brin, R. Mowani, and J. Ullman, “Scalable techniques for mining causal structures,” in VLDB'98, 1998, pp. 594–605.
Osmar R. Zäiane, Mohammad El-Hajj, and Paul Lu, “Fast parallel association rule mining without candidacy generation,” in Proc. ICDM'2001, 2001, pp. 665–668.
M.J. Zaki, “Parallel and distributed association mining: A survey,” IEEE Concurrency, vol. 7, no. 4, pp. 14–24, 1999.
Article Google Scholar
Q. Zou, W. Chu, D. Johnson, and H. Chiu, “A pattern decomposition (PD) algorithm for finding all frequent patterns in large datasets,” in ISDM'2001, 2001, pp. 674–674.

Download references

Author information

Authors and Affiliations

University of Illinois at Chicago, USA
Asif Javed & Ashfaq Khokhar

Authors

Asif Javed
View author publications
You can also search for this author in PubMed Google Scholar
Ashfaq Khokhar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Javed, A., Khokhar, A. Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel Databases 16, 321–334 (2004). https://doi.org/10.1023/B:DAPD.0000031634.19130.bd

Download citation

Issue Date: November 2004
DOI: https://doi.org/10.1023/B:DAPD.0000031634.19130.bd

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Frequent Pattern Mining on Message Passing Multiprocessor Systems

Abstract

Access this article

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Parallel programming models for heterogeneous many-cores: a comprehensive survey

A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Frequent Pattern Mining on Message Passing Multiprocessor Systems

Abstract

Access this article

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Parallel programming models for heterogeneous many-cores: a comprehensive survey

A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation