skip to main content
article

Tight upper bounds on the number of candidate patterns

Published: 01 June 2005 Publication History

Abstract

In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing tight upper bounds, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to secure existing algorithms from a combinatorial explosion of the number of candidate patterns.

References

[1]
Agarwal, R., Aggarwal, C., and Prasad, V. 2000. Depth first generation of long patterns. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, R. Ramakrishnan, S. Stolfo, R. Bayardo, and I. Parsa, Eds. ACM Press, 108--118.]]
[2]
Agarwal, R., Aggarwal, C., and Prasad, V. 2001. A tree projection algorithm for generation of frequent itemsets. J. Parallel Distrib. Comput. 61, 3 (March), 350--371.]]
[3]
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, P. Buneman and S. Jajodia, Eds. SIGMOD Record, vol. 22:2. ACM Press, 207--216.]]
[4]
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds. MIT Press, 307--328.]]
[5]
Agrawal, R. and Srikant, R. 1994a. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, J. Bocca, M. Jarke, and C. Zaniolo, Eds. Morgan Kaufmann, 487--499.]]
[6]
Agrawal, R. and Srikant, R. 1994b. Fast algorithms for mining association rules. IBM Research Report RJ9839, IBM Alamaden Research Center, San Jose, California. June.]]
[7]
Agrawal, R. and Srikant, R. 1994c. Quest Synthetic Data Generator. IBM Alamaden Research Center, http://www.almaden.ibm.com/software/quest/Resources/index.shtml.]]
[8]
Bayardo, R. 1998. Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, L. Haas and A. Tiwary, Eds. SIGMOD Record, vol. 27:2. ACM Press, 85--93.]]
[9]
Blake, C. and Merz, C. 1998. UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, http://www.ics.uci.edu/~mlearn/MLRepository.html.]]
[10]
Bollobás, B. 1986. Combinatorics. Cambridge University Press.]]
[11]
Boulicaut, J.-F., Bykowski, A., and Rigotti, C. 2003. Free-sets: a condensed representation of Boolean data for frequency query approximation. Data Mining and Knowledge Discovery 7, 1, 5--22.]]
[12]
Brin, S., Motwani, R., Ullman, J., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 26:2. ACM Press, 255--264.]]
[13]
Burdick, D., Calimlim, M., and Gehrke, J. 2001. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, 443--452.]]
[14]
Frankl, P. 1984. A new short proof for the Kruskal--Katona theorem. Discrete Mathematics 48, 327--329.]]
[15]
Geerts, F., Goethals, B., and Van den Bussche, J. 2001. A tight upper bound on the number of candidate patterns. In Proceedings of the 2001 IEEE International Conference on Data Mining, N. Cercone, T. Lin, and X. Wu, Eds. IEEE Computer Society, 155--162.]]
[16]
Goethals, B. and Zaki, M. J., Eds. 2003. Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI-03), Melbourne Florida, USA, November 19, 2003. CEUR Workshop Proceedings, vol. 90. http://CEUR-WS.org/Vol-90/.]]
[17]
Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, W. Chen, J. Naughton, and P. Bernstein, Eds. SIGMOD Record, vol. 29:2. ACM Press, 1--12.]]
[18]
Katona, G. 1968. A theorem of finite sets. In Theory Of Graphs. Akadémia Kiadó, 187--207.]]
[19]
Kohavi, R., Brodley, C., Frasca, B., Mason, L., and Zheng, Z. 2000. KDD-Cup 2000 organizers' report: Peeling the onion. SIGKDD Explorations 2, 2, 86--98. http://www.ecn.purdue.edu/KDDCUP.]]
[20]
Kruskal, J. 1963. The number of simplices in a complex. In Mathematical Optimization Techniques. Univ. of California Press, 251--278.]]
[21]
Lin, D. and Kedem, Z. 1998. Pincer-search: A new algorithm for discovering the maximum frequent set. In EDBT, H.-J. Schek, F. Saltor, I. Ramos, and G. Alonso, Eds. Lecture Notes in Computer Science, vol. 1377. Springer, 105--119.]]
[22]
Liu, J., Pan, Y., Wang, K., and Han, J. 2002. Mining frequent item sets by opportunistic projection. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, D. Hand, D. Keim, and R. Ng, Eds. ACM Press, 229--238.]]
[23]
Orlando, S., Palmerini, P., Perego, R., and Silvestri, F. 2002. Adaptive and resource-aware mining of frequent sets. In Proceedings of the 2002 IEEE International Conference on Data Mining, V. Kumar, S. Tsumoto, P. Yu, and N. Zhong, Eds. IEEE Computer Society, to appear.]]
[24]
Park, J., Chen, M.-S., and Yu, P. 1995. An effective hash based algorithm for mining association rules. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 24:2. ACM Press, 175--186.]]
[25]
Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. 1999. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory, C. Beeri and P. Buneman, Eds. lncs, vol. 1540. Springer, 398--416.]]
[26]
Pei, J., Han, J., and Mao, R. 2000. Closet: An efficient algorithm for mining frequent closed itemsets. ACM SIGMOD'00 Workshop on Research Issues in Data Mining and Knowledge Discovery.]]
[27]
Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21th International Conference on Very Large Data Bases, U. Dayal, P. Gray, and S. Nishio, Eds. Morgan Kaufmann, 432--444.]]
[28]
Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22th International Conference on Very Large Data Bases, T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda, Eds. Kaufmann, 134--145.]]
[29]
Zaki, M. and Hsiao, C.-J. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the Second SIAM International Conference on Data Mining, R. Grossman, J. Han, V. Kumar, H. Mannila, and R. Motwani, Eds. SIAM.]]
[30]
Zaki, M., Parthasarathy, S., Ogihara, M., and Li, W. 1997. New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, D. Heckerman, H. Mannila, and D. Pregibon, Eds. AAAI Press, 283--296.]]
[31]
Zheng, Z., Kohavi, R., and Mason, L. 2001. Real world performance of association rule algorithms. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, F. Provost and R. Srikant, Eds. ACM Press, 401--406.]]

Cited By

View all
  • (2022)The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern miningInternational Journal of Data Science and Analytics10.1007/s41060-022-00340-116:1(43-83)Online publication date: 20-Aug-2022
  • (2019)Maximizing Gain over Flexible Attributes in Peer to Peer MarketplacesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-16142-2_26(327-345)Online publication date: 14-Apr-2019
  • (2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 30, Issue 2
June 2005
328 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1071610
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2005
Published in TODS Volume 30, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data mining
  2. frequent patterns
  3. upper bounds

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern miningInternational Journal of Data Science and Analytics10.1007/s41060-022-00340-116:1(43-83)Online publication date: 20-Aug-2022
  • (2019)Maximizing Gain over Flexible Attributes in Peer to Peer MarketplacesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-16142-2_26(327-345)Online publication date: 14-Apr-2019
  • (2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
  • (2015)The Data Problem in Data MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370616:2(38-45)Online publication date: 21-May-2015
  • (2015)The blind men and the elephantMachine Language10.1007/s10994-013-5334-y98:1-2(121-155)Online publication date: 1-Jan-2015
  • (2014)EvoMinerKnowledge and Information Systems10.5555/2687513.268757441:3(559-590)Online publication date: 1-Dec-2014
  • (2014)Consistent Subset SamplingAlgorithm Theory – SWAT 201410.1007/978-3-319-08404-6_26(294-305)Online publication date: 2014
  • (2013)Frequent Itemset Mining for Big Data2013 IEEE International Conference on Big Data10.1109/BigData.2013.6691742(111-118)Online publication date: Oct-2013
  • (2013)EvoMiner: frequent subtree mining in phylogenetic databasesKnowledge and Information Systems10.1007/s10115-013-0676-041:3(559-590)Online publication date: 30-Jul-2013
  • (2011)Efficient prime-based method for interactive mining of frequent patternsExpert Systems with Applications: An International Journal10.1016/j.eswa.2011.04.05338:10(12654-12670)Online publication date: 15-Sep-2011
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media