Enhancing the Apriori Algorithm for Frequent Set Counting

Perego, Raffaele; Orlando, Salvatore; Palmerini, P.

doi:10.1007/3-540-44801-2_8

Raffaele Perego⁷,
Salvatore Orlando⁸ &
P. Palmerini^7,8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2114))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1109 Accesses
31 Citations

Abstract

In this paper we propose DCP, a new algorithm for solving the Frequent Set Counting problem, which enhances Apriori. Our goal was to optimize the initial iterations of Apriori, i.e. the most time consuming ones when datasets characterized by short or medium length frequent patterns are considered. The main improvements regard the use of an innovative method for storing candidate set of items and counting their support, and the exploitation of effective pruning techniques which significantly reduce the size of the dataset as execution progresses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad. Depth first generation of long patterns. In Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 108–118, Boston, MA, USA, 2000.
Google Scholar
R. Agrawal, T. Imielinski, and Swami A. Mining Associations between Sets of Items in Massive Databases. In Proc. of the ACM-SIGMOD 1993 Int’l Conf. on Management of Data, pages 207–216, Washington D.C., USA, 1993.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. InkeriVerkamo. Fast Discovery of Association Rules in Large Databases. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.
Google Scholar
R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. of the 20th VLDB Conf., pages 487–499, Santiago, Chile, 1994.
Google Scholar
R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation issues in the design of I/O intensive data mining applications on clusters of workstations. In Proc. of the 3rd Workshop on High Performance Data Mining, in conjunction with IPDPS-2000, Cancun, Mexico, pages 350–357. LNCS 1800 Spinger-Verlag, 2000.
Google Scholar
R.J. Bayardo Jr. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 85–93, Seattle, Washington, USA, 1998.
Google Scholar
Brian Dunkel and Nandit Soparkar. Data organization and access for efficient data mining. In Proceedings of the 15th ICDE Int. Conf. on Data Engineering, pages 522–529, Sydney, Australia, 1999. IEEE Computer Society.
Google Scholar
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1998.
Google Scholar
V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999.
Google Scholar
E.H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337–352, May/June 2000.
Article Google Scholar
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Dallas, Texas, USA, 2000.
Google Scholar
J.-L. Lin and M.H. Dunham. Mining association rules: Anti-skew algorithms. In Proceedings of the 14-th Int. Conf. on Data Engineering, pages 486–493, Orlando, Florida, USA, 1998. IEEE Computer Society.
Google Scholar
A. Mueller. Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison. Technical Report CS-TR-3515, Univ. of Maryland, College Park, 1995.
Google Scholar
S. Orlando, P. Palmerini, and R. Perego. The DCP algorithm for Frequent Set Counting. Technical Report CS-2001-7, Dip. di Informatica, Università di Venezia, 2001. Available at http://www.dsi.unive.it/~orlando/TR01-7.pdf.
J.S. Park, M.-S. Chen, and P.S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of the 1995 ACM SIGMOD International Conference on Management of Data, pages 175–186, San Jose, California, 1995.
Google Scholar
N. Ramakrishnan and A.Y. Grama. Data Mining: From Serendipity to Science. IEEE Computer, 32(8):34–37, 1999.
Google Scholar
A. Savasere, E. Omiecinski, and S.B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of the 21th VLDB Conference, pages 432–444, Zurich, Switzerland, 1995.
Google Scholar
H. Toivonen. Sampling Large Databases for Association Rules. In Proceedings of the 22th VLDB Conference, pages 134–145, Mumbai (Bombay), IndiaA, 1996.
Google Scholar
M.J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000.
Article Google Scholar
M.J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara. Evaluation of Sampling for Data Mining of Association Rules. In 7th Int. Workshop on Research Issues in Data Engineering (RIDE), pages 42–50, Birmingham, UK, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Istituto CNUCE, Pisa, Italy
Raffaele Perego & P. Palmerini
Dipartimento di Informatica, Universitá Ca’ Foscari di Venezia, Italy
Salvatore Orlando & P. Palmerini

Authors

Raffaele Perego
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Orlando
View author publications
You can also search for this author in PubMed Google Scholar
P. Palmerini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Kyoto University, Kyoto, 606-8501, Japan
Yahiko Kambayashi
EC3, Siebensterngasse 21/3, 1070, Wien
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba Meguro-ku, Tokyo, 153-8904, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perego, R., Orlando, S., Palmerini, P. (2001). Enhancing the Apriori Algorithm for Frequent Set Counting. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2001. Lecture Notes in Computer Science, vol 2114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44801-2_8

Download citation

DOI: https://doi.org/10.1007/3-540-44801-2_8
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42553-3
Online ISBN: 978-3-540-44801-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics