Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets

Veloso, Adriano; Otey, Matthew Eric; Parthasarathy, Srinivasan; Meira, Wagner

doi:10.1007/978-3-540-24596-4_20

Adriano Veloso^6,7,
Matthew Eric Otey⁷,
Srinivasan Parthasarathy⁷ &
…
Wagner Meira Jr.⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2913))

Included in the following conference series:

International Conference on High-Performance Computing

436 Accesses
9 Citations

Abstract

Traditional methods for data mining typically make the assumption that data is centralized and static. This assumption is no longer tenable. Such methods waste computational and I/O resources when the data is dynamic, and they impose excessive communication overhead when the data is distributed. As a result, the knowledge discovery process is harmed by slow response times. Efficient implementation of incremental data mining ideas in distributed computing environments is thus becoming crucial for ensuring scalability and facilitating knowledge discovery when data is dynamic and distributed. In this paper we address this issue in the context of frequent itemset mining, an important data mining task. Frequent itemsets are most often used to generate correlations and association rules, but more recently they have been used in such far-reaching domains as bio-informatics and e-commerce applications. We first present an efficient algorithm which dynamically maintains the required information in the presence of data updates without examining the entire dataset. We then show how to parallelize the incremental algorithm, so that it can asynchronously mine frequent itemsets. We also propose a distributed algorithm, which imposes low communication overhead for mining distributed datasets. Several experiments confirm that our algorithm results in excellent execution time improvements.

This work was done while the first author was visiting the Ohio State University

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. on Knowledge and Data Engg. 8, 962–969 (1996)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th Int’l Conf. on Very Large Databases, SanTiago, Chile (June 1994)
Google Scholar
Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A fast distributed algorithm for mining association rules. In: 4th Int’l. Conf. Parallel and Distributed Info. Systems (1996)
Google Scholar
Cheung, D., Lee, S., Kao, B.: A general incremental technique for maintaining discovered association rules. In: Proc. of the 5th Int’l. Conf. on Database Systems for Advanced Applications, April 1997, pp. 1–4 (1997)
Google Scholar
Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient mining of association rules in distributed databases. IEEE Trans. on Knowledge and Data Engg. 8, 911–922 (1996)
Article Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: Demon: Mining and monitoring evolving data. In: Proc. of the 16th Int’l Conf. on Data Engineering, San Diego, USA, pp. 439–448 (2000)
Google Scholar
Gouda, K., Zaki, M.: Efficiently mining maximal frequent itemsets. In: Proc. of the 1st IEEE Int’l Conf. on Data Mining, San Jose, USA (November 2001)
Google Scholar
Han, E.-H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: ACM SIGMOD Conf. Management of Data (1997)
Google Scholar
Lee, S., Cheung, D.: Maintenance of discovered association rules: When to update? Research Issues on Data Mining and Knowledge Discovery (1997)
Google Scholar
Park, B.-H., Kargupta, H.: Distributed data mining: Algorithms, systems, and applications. In: Ye, N. (ed.) Data Mining Handbook (2002)
Google Scholar
Park, J.S., Chen, M., Yu, P.S.: CACTUS - clustering categorical data using summaries. In: ACM Int’l. Conf. on Information and Knowledge Management (1995)
Google Scholar
Parthasarathy, S., Ramakrishnan, A.: Parallel incremental 2d discretization. In: Proc. IEEE Int’l Conf. on Parallel and Distributed Processing (2002)
Google Scholar
Schuster, A., Wolff, R.: Communication efficient distributed mining of association rules. In: ACM SIGMOD Int’l. Conf. on Management of Data (2001)
Google Scholar
Thomas, S., Bodagala, S., Alsabti, K., Ranka, S.: An efficient algorithm for the incremental updation of association rules. In: Proc. of the 3rd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 1997)
Google Scholar
Veloso, A., Meira Jr., W., Bunte, M., Parthasarathy, S., Zaki, M.: Mining frequent itemsets in evolving databases. In: Proc. of the 2nd SIAM Int’l Conf. on Data Mining, USA (2002)
Google Scholar
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal 4(1), 343–373 (1997)
Article Google Scholar
Zaki, M.J.: Parallel and distributed association mining:A survey. IEEE Concurrency 7(4), 14–25 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universidade Federal de Minas Gerais, Brazil
Adriano Veloso & Wagner Meira Jr.
Department of Computer and Information Science, The Ohio State University, USA
Adriano Veloso, Matthew Eric Otey & Srinivasan Parthasarathy

Authors

Adriano Veloso
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Eric Otey
View author publications
You can also search for this author in PubMed Google Scholar
Srinivasan Parthasarathy
View author publications
You can also search for this author in PubMed Google Scholar
Wagner Meira Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Southern California, CA 90089-2562, Los Angeles
Timothy Mark Pinkston
Department of Electrical Engineering, University of Southern California, CA 90089-2562, Los Angeles, USA
Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veloso, A., Otey, M.E., Parthasarathy, S., Meira, W. (2003). Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets. In: Pinkston, T.M., Prasanna, V.K. (eds) High Performance Computing - HiPC 2003. HiPC 2003. Lecture Notes in Computer Science, vol 2913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24596-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-24596-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20626-2
Online ISBN: 978-3-540-24596-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics