Abstract
Most algorithms for association rule mining are variants of the basic Apriori algorithm [2]. One characteristic of these Aprioribased algorithms is that candidate itemsets are generated in rounds, with the size of the itemsets incremented by one per round. The number of database scans required by Apriori-based algorithms thus depends on the size of the largest large itemsets. In this paper we devise a more general candidate set generation algorithm, LGen, which generates candidate itemsets of multiple sizes during each database scan. We show that, given a reasonable set of suggested large itemsets, LGen can significantly reduce the number of I/O passes required. In the best cases, only two passes are sufficient to discover all the large itemsets irrespective of the size of the largest ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD, Washington, D.C., May 1993.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. of the 20th VLDB Conference, Santiago, Chile, 1994.
Garrett Birkhoff. Lattice Theory, volume 2 5of AMS Colloquium Publications. AMS, 1984.
David W. Cheung, Jiawei Han, Vincent T. Ng, Ada Fu, and Yongjian Fu. A fast distributed algorithm for mining association rules. In Proc. Fourth International Conference on Parallel and Distributed Information Systems, Miami Beach, Florida, December 1996.
David W. Cheung, Jiawei Han, Vincent T. Ng, and C. Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proceedings of the Twelfth International Conference on Data Engineering, New Orleans, Louisiana, 1996. IEEE computer Society.
B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 1990. ISBN 0 521 36584 8.
Jiawei Han and Yongjian Fu. Discovery of multiple-level association rules from large databases. In Proc. of the 21st VLDB Conference, Zurich, Switzerland, 1995.
Jong Soo Park, Ming-Syan Chen, and Philip S. Yu. An effective hash-based algorithm for mining association rules. In Proc. ACM SIGMOD, San Jose, California, May 1995.
Ramakrishnan Srikant and Rakesh Agrawal. Mining quantitative association rules in large relational tables. In H. V. Jagadish and Inderpal singh Mumick, editors, Proc. ACM SIGMOD, Montreal, Canada, June 1996.
Hannu Toivonen. Sampling large databases for association rules. In Proc. of the 22th VLDB Conference, Bombay, India, September 1996.
C.L. Yip, K.K. Loo, B. Kao, D. Cheung, and C.K. Cheng. LGen — a latticebased candidate set generation algorithm for I/O efficient association rule mining. Technical report TR-99-01, Dept. of CSIS, The University of Hong Kong, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yip, C.L., Loo, K.K., Kao, B., Cheung, D., Cheng, C.K. (1999). LGen — A Lattice-Based Candidate Set Generation Algorithm for I/O Efficient Association Rule Mining. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_8
Download citation
DOI: https://doi.org/10.1007/3-540-48912-6_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive