Discovering Frequent Itemsets in the Presence of Highly Frequent Items

Groth, Dennis P.; Robertson, Edward L.

doi:10.1007/3-540-36524-9_21

Dennis P. Groth⁵ &
Edward L. Robertson⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2543))

Included in the following conference series:

International Conference on Applications of Prolog

405 Accesses

Abstract

This paper presents new techniques for focusing the discovery of frequent itemsets within large, dense datasets containing highly frequent items. The existence of highly frequent items adds significantly to the cost of computing the complete set of frequent itemsets. Our approach allows for the exclusion of such items during the candidate generation phase of the Apriori algorithm. Afterwards, the highly frequent items can be reintroduced, via an inferencing framework, providing for a capability to generate frequent itemsets without counting their frequency. We demonstrate the use of these new techniques within the well-studied framework of the Apriori algorithm. Furthermore, we provide empirical results using our techniques on both synthetic and real datasets - both relevant since the real datasets exhibit statistical characteristics different from the probabilistic assumptions behind the synthetic data. The source we used for real data was the U.S. Census.

The authors were supported by NSF Grant IIS-0082407

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Iterative sampling based frequent itemset mining for big data

Article 20 March 2015

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

References

Agarwal, R. C., Aggarwal, C. C., and Prasad, V. V. Depth first generation of long patterns. In SIGKDD 2000, Proceedings ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA,USA (2000), ACM Press, pp. 108–118.
Google Scholar
Agrawal, R., and Srikant, R. Fast algorithms for mining association rules in large databases. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile (1994), J. B. Bocca, M. Jarke, and C. Zaniolo, Eds., Morgan Kaufmann, pp. 487–499.
Google Scholar
Bayardo Jr., R. J. Efficiently mining long patterns from databases. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA (1998), L. M. Haas and A. Tiwary, Eds., ACM Press, pp. 85–93.
Google Scholar
Bayardo Jr., R. J., Agrawal, R., and Gunopulos, D. Constraint-based rule mining in large, dense databases. In Proceedings of the 15th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia (1999), IEEE Computer Society, pp. 188–197.
Google Scholar
Brin, S., Motwani, R., and Silverstein, C. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, Tucson, Arizona, USA (1997), J. Peckham, Ed., ACM Press, pp. 265–276.
Google Scholar
Brin, S., Motwani, R., Ullman, J. D., and Tsur, S. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, Tucson, Arizona, USA (1997), J. Peckham, Ed., ACM Press, pp. 255–264.
Google Scholar
Calders, T., and Paredaens, J. Axiomatization of frequent sets. In Proceedings of the 8th International Conference on Database Theory (London, UK, January 4-6 2001), pp. 204–218.
Google Scholar
Han, J., and Fu, Y. Discovery of multiple-level association rules from large databases. In VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland (1995), U. Dayal, P. M. D. Gray, and S. Nishio, Eds., Morgan Kaufmann, pp. 420–431.
Google Scholar
Liu, B., Hsu, W., and Ma, Y. Mining association rules with multiple minimum supports. In Knowledge Discovery and Data Mining (1999), pp. 337–341.
Google Scholar
Meo, R., Psaila, G., and Ceri, S. A new sql-like operator for mining association rules. In VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India (1996), T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda, Eds., Morgan Kaufmann, pp. 122–133.
Google Scholar
Sarawagi, S., Thomas, S., and Agrawal, R. Integrating mining with relational database systems: Alternatives and implications. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA (1998), L. M. Haas and A. Tiwary, Eds., ACM Press, pp. 343–354.
Google Scholar
Savasere, A., Omiecinski, E., and Navathe, S. B. An efficient algorithm for mining association rules in large databases. In VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland (1995), U. Dayal, P. M. D. Gray, and S. Nishio, Eds., Morgan Kaufmann, pp. 432–444.
Google Scholar
Srikant, R., and Agrawal, R. Mining generalized association rules. In VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland (1995), U. Dayal, P. M. D. Gray, and S. Nishio, Eds., Morgan Kaufmann, pp. 407–419.
Google Scholar
Toivonen, H. Sampling large databases for association rules. In VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India (1996), T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda, Eds., Morgan Kaufmann, pp. 134–145.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, Indiana University, 47405, Bloomington, IN, USA
Dennis P. Groth
Computer Science, Indiana University, 47405, Bloomington, IN, USA
Edward L. Robertson

Authors

Dennis P. Groth
View author publications
You can also search for this author in PubMed Google Scholar
Edward L. Robertson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IF Computer Japan, 5-28-2 Sendagi, Bunkyo-ku, 113-0022, Tokyo, Japan
Oskar Bartenstein
Fraunhofer FIRST, Kekulé 7, 12489, Berlin, Germany
Ulrich Geske
think-cell Software GmbH, Invalidenstraße 34, 10115, Berlin, Germany
Markus Hannebauer
Waseda University, 2-7 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka, Japan
Osamu Yoshie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Groth, D.P., Robertson, E.L. (2003). Discovering Frequent Itemsets in the Presence of Highly Frequent Items. In: Bartenstein, O., Geske, U., Hannebauer, M., Yoshie, O. (eds) Web Knowledge Management and Decision Support. INAP 2001. Lecture Notes in Computer Science(), vol 2543. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36524-9_21

Download citation

DOI: https://doi.org/10.1007/3-540-36524-9_21
Published: 14 March 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00680-0
Online ISBN: 978-3-540-36524-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Discovering Frequent Itemsets in the Presence of Highly Frequent Items

Abstract

Access this chapter

Preview

Similar content being viewed by others

Iterative sampling based frequent itemset mining for big data

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Discovering Frequent Itemsets in the Presence of Highly Frequent Items

Abstract

Access this chapter

Preview

Similar content being viewed by others

Iterative sampling based frequent itemset mining for big data

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation