Abstract
In this paper we propose DCP, a new algorithm for solving the Frequent Set Counting problem, which enhances Apriori. Our goal was to optimize the initial iterations of Apriori, i.e. the most time consuming ones when datasets characterized by short or medium length frequent patterns are considered. The main improvements regard the use of an innovative method for storing candidate set of items and counting their support, and the exploitation of effective pruning techniques which significantly reduce the size of the dataset as execution progresses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad. Depth first generation of long patterns. In Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 108–118, Boston, MA, USA, 2000.
R. Agrawal, T. Imielinski, and Swami A. Mining Associations between Sets of Items in Massive Databases. In Proc. of the ACM-SIGMOD 1993 Int’l Conf. on Management of Data, pages 207–216, Washington D.C., USA, 1993.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. InkeriVerkamo. Fast Discovery of Association Rules in Large Databases. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.
R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. of the 20th VLDB Conf., pages 487–499, Santiago, Chile, 1994.
R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation issues in the design of I/O intensive data mining applications on clusters of workstations. In Proc. of the 3rd Workshop on High Performance Data Mining, in conjunction with IPDPS-2000, Cancun, Mexico, pages 350–357. LNCS 1800 Spinger-Verlag, 2000.
R.J. Bayardo Jr. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 85–93, Seattle, Washington, USA, 1998.
Brian Dunkel and Nandit Soparkar. Data organization and access for efficient data mining. In Proceedings of the 15th ICDE Int. Conf. on Data Engineering, pages 522–529, Sydney, Australia, 1999. IEEE Computer Society.
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1998.
V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999.
E.H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337–352, May/June 2000.
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Dallas, Texas, USA, 2000.
J.-L. Lin and M.H. Dunham. Mining association rules: Anti-skew algorithms. In Proceedings of the 14-th Int. Conf. on Data Engineering, pages 486–493, Orlando, Florida, USA, 1998. IEEE Computer Society.
A. Mueller. Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison. Technical Report CS-TR-3515, Univ. of Maryland, College Park, 1995.
S. Orlando, P. Palmerini, and R. Perego. The DCP algorithm for Frequent Set Counting. Technical Report CS-2001-7, Dip. di Informatica, Università di Venezia, 2001. Available at http://www.dsi.unive.it/~orlando/TR01-7.pdf.
J.S. Park, M.-S. Chen, and P.S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of the 1995 ACM SIGMOD International Conference on Management of Data, pages 175–186, San Jose, California, 1995.
N. Ramakrishnan and A.Y. Grama. Data Mining: From Serendipity to Science. IEEE Computer, 32(8):34–37, 1999.
A. Savasere, E. Omiecinski, and S.B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of the 21th VLDB Conference, pages 432–444, Zurich, Switzerland, 1995.
H. Toivonen. Sampling Large Databases for Association Rules. In Proceedings of the 22th VLDB Conference, pages 134–145, Mumbai (Bombay), IndiaA, 1996.
M.J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000.
M.J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara. Evaluation of Sampling for Data Mining of Association Rules. In 7th Int. Workshop on Research Issues in Data Engineering (RIDE), pages 42–50, Birmingham, UK, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perego, R., Orlando, S., Palmerini, P. (2001). Enhancing the Apriori Algorithm for Frequent Set Counting. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2001. Lecture Notes in Computer Science, vol 2114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44801-2_8
Download citation
DOI: https://doi.org/10.1007/3-540-44801-2_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42553-3
Online ISBN: 978-3-540-44801-3
eBook Packages: Springer Book Archive