article

Tight upper bounds on the number of candidate patterns

Authors:

Jan Van Den BusscheAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 30, Issue 2

Pages 333 - 363

https://doi.org/10.1145/1071610.1071611

Published: 01 June 2005 Publication History

Abstract

In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing tight upper bounds, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to secure existing algorithms from a combinatorial explosion of the number of candidate patterns.

References

[1]

Agarwal, R., Aggarwal, C., and Prasad, V. 2000. Depth first generation of long patterns. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, R. Ramakrishnan, S. Stolfo, R. Bayardo, and I. Parsa, Eds. ACM Press, 108--118.]]

Digital Library

[2]

Agarwal, R., Aggarwal, C., and Prasad, V. 2001. A tree projection algorithm for generation of frequent itemsets. J. Parallel Distrib. Comput. 61, 3 (March), 350--371.]]

Digital Library

[3]

Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, P. Buneman and S. Jajodia, Eds. SIGMOD Record, vol. 22:2. ACM Press, 207--216.]]

Digital Library

[4]

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds. MIT Press, 307--328.]]

Digital Library

[5]

Agrawal, R. and Srikant, R. 1994a. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, J. Bocca, M. Jarke, and C. Zaniolo, Eds. Morgan Kaufmann, 487--499.]]

Digital Library

[6]

Agrawal, R. and Srikant, R. 1994b. Fast algorithms for mining association rules. IBM Research Report RJ9839, IBM Alamaden Research Center, San Jose, California. June.]]

[7]

Agrawal, R. and Srikant, R. 1994c. Quest Synthetic Data Generator. IBM Alamaden Research Center, http://www.almaden.ibm.com/software/quest/Resources/index.shtml.]]

[8]

Bayardo, R. 1998. Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, L. Haas and A. Tiwary, Eds. SIGMOD Record, vol. 27:2. ACM Press, 85--93.]]

Digital Library

[9]

Blake, C. and Merz, C. 1998. UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, http://www.ics.uci.edu/~mlearn/MLRepository.html.]]

[10]

Bollobás, B. 1986. Combinatorics. Cambridge University Press.]]

[11]

Boulicaut, J.-F., Bykowski, A., and Rigotti, C. 2003. Free-sets: a condensed representation of Boolean data for frequency query approximation. Data Mining and Knowledge Discovery 7, 1, 5--22.]]

Digital Library

[12]

Brin, S., Motwani, R., Ullman, J., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 26:2. ACM Press, 255--264.]]

Digital Library

[13]

Burdick, D., Calimlim, M., and Gehrke, J. 2001. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, 443--452.]]

Digital Library

[14]

Frankl, P. 1984. A new short proof for the Kruskal--Katona theorem. Discrete Mathematics 48, 327--329.]]

Digital Library

[15]

Geerts, F., Goethals, B., and Van den Bussche, J. 2001. A tight upper bound on the number of candidate patterns. In Proceedings of the 2001 IEEE International Conference on Data Mining, N. Cercone, T. Lin, and X. Wu, Eds. IEEE Computer Society, 155--162.]]

Digital Library

[16]

Goethals, B. and Zaki, M. J., Eds. 2003. Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI-03), Melbourne Florida, USA, November 19, 2003. CEUR Workshop Proceedings, vol. 90. http://CEUR-WS.org/Vol-90/.]]

[17]

Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, W. Chen, J. Naughton, and P. Bernstein, Eds. SIGMOD Record, vol. 29:2. ACM Press, 1--12.]]

Digital Library

[18]

Katona, G. 1968. A theorem of finite sets. In Theory Of Graphs. Akadémia Kiadó, 187--207.]]

[19]

Kohavi, R., Brodley, C., Frasca, B., Mason, L., and Zheng, Z. 2000. KDD-Cup 2000 organizers' report: Peeling the onion. SIGKDD Explorations 2, 2, 86--98. http://www.ecn.purdue.edu/KDDCUP.]]

Digital Library

[20]

Kruskal, J. 1963. The number of simplices in a complex. In Mathematical Optimization Techniques. Univ. of California Press, 251--278.]]

[21]

Lin, D. and Kedem, Z. 1998. Pincer-search: A new algorithm for discovering the maximum frequent set. In EDBT, H.-J. Schek, F. Saltor, I. Ramos, and G. Alonso, Eds. Lecture Notes in Computer Science, vol. 1377. Springer, 105--119.]]

Digital Library

[22]

Liu, J., Pan, Y., Wang, K., and Han, J. 2002. Mining frequent item sets by opportunistic projection. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, D. Hand, D. Keim, and R. Ng, Eds. ACM Press, 229--238.]]

Digital Library

[23]

Orlando, S., Palmerini, P., Perego, R., and Silvestri, F. 2002. Adaptive and resource-aware mining of frequent sets. In Proceedings of the 2002 IEEE International Conference on Data Mining, V. Kumar, S. Tsumoto, P. Yu, and N. Zhong, Eds. IEEE Computer Society, to appear.]]

Digital Library

[24]

Park, J., Chen, M.-S., and Yu, P. 1995. An effective hash based algorithm for mining association rules. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 24:2. ACM Press, 175--186.]]

Digital Library

[25]

Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. 1999. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory, C. Beeri and P. Buneman, Eds. lncs, vol. 1540. Springer, 398--416.]]

Digital Library

[26]

Pei, J., Han, J., and Mao, R. 2000. Closet: An efficient algorithm for mining frequent closed itemsets. ACM SIGMOD'00 Workshop on Research Issues in Data Mining and Knowledge Discovery.]]

[27]

Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21th International Conference on Very Large Data Bases, U. Dayal, P. Gray, and S. Nishio, Eds. Morgan Kaufmann, 432--444.]]

Digital Library

[28]

Toivonen, H. 1996. Sampling large databases for association rules. In Proceedings of the 22th International Conference on Very Large Data Bases, T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda, Eds. Kaufmann, 134--145.]]

Digital Library

[29]

Zaki, M. and Hsiao, C.-J. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the Second SIAM International Conference on Data Mining, R. Grossman, J. Han, V. Kumar, H. Mannila, and R. Motwani, Eds. SIAM.]]

[30]

Zaki, M., Parthasarathy, S., Ogihara, M., and Li, W. 1997. New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, D. Heckerman, H. Mannila, and D. Pregibon, Eds. AAAI Press, 283--296.]]

[31]

Zheng, Z., Kohavi, R., and Mason, L. 2001. Real world performance of association rule algorithms. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, F. Provost and R. Srikant, Eds. ACM Press, 401--406.]]

Digital Library

Cited By

Wang T(2022)The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern miningInternational Journal of Data Science and Analytics10.1007/s41060-022-00340-116:1(43-83)Online publication date: 20-Aug-2022
https://doi.org/10.1007/s41060-022-00340-1
Asudeh ANazi AKoudas NDas G(2019)Maximizing Gain over Flexible Attributes in Peer to Peer MarketplacesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-16142-2_26(327-345)Online publication date: 14-Apr-2019
https://dl.acm.org/doi/10.1007/978-3-030-16142-2_26
Zimmermann A(2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
https://doi.org/10.1002/widm.1330
Show More Cited By

Index Terms

Tight upper bounds on the number of candidate patterns

Recommendations

Closed frequent similar pattern mining

The concept of closed frequent similar pattern mining is introduced.Several lemmas to prune the search space are introduced and proved.A novel closed frequent similar pattern mining algorithm (CFSP-Miner), is proposed.CFSP-Miner is more efficient than ...
New Tighter Upper Bounds for Mining High Average-Utility Itemsets
ICBDE '18: Proceedings of the 2018 International Conference on Big Data and Education

In the past, frequent itemset mining (FIM) revealed the high-frequent patterns but ignored the more important concepts such as unit of profit and quality of the items. Recently, high-utility mining (HUIM) has caused wide public concern in the data ...
Efficient mining of extraordinary patterns by pruning and predicting
Highlights
- Extraordinary patterns are those with supports and utilities in opposite extremes.
Abstract
Pattern mining is an important data mining technology. The existing pattern mining algorithms mainly focus on discovery of ordinary patterns in databases, for example, frequent pattern mining finds patterns with high frequencies and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 30, Issue 2

June 2005

328 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/1071610

Issue’s Table of Contents

Copyright © 2005 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2005

Published in TODS Volume 30, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
806
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang T(2022)The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern miningInternational Journal of Data Science and Analytics10.1007/s41060-022-00340-116:1(43-83)Online publication date: 20-Aug-2022
https://doi.org/10.1007/s41060-022-00340-1
Asudeh ANazi AKoudas NDas G(2019)Maximizing Gain over Flexible Attributes in Peer to Peer MarketplacesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-16142-2_26(327-345)Online publication date: 14-Apr-2019
https://dl.acm.org/doi/10.1007/978-3-030-16142-2_26
Zimmermann A(2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
https://doi.org/10.1002/widm.1330
Zimmermann A(2015)The Data Problem in Data MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370616:2(38-45)Online publication date: 21-May-2015
https://dl.acm.org/doi/10.1145/2783702.2783706
Zimek AVreeken J(2015)The blind men and the elephantMachine Language10.1007/s10994-013-5334-y98:1-2(121-155)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1007/s10994-013-5334-y
Deepak AFernández-Baca DTirthapura SSanderson MMcmahon M(2014)EvoMinerKnowledge and Information Systems10.5555/2687513.268757441:3(559-590)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.5555/2687513.2687574
Kutzkov KPagh R(2014)Consistent Subset SamplingAlgorithm Theory – SWAT 201410.1007/978-3-319-08404-6_26(294-305)Online publication date: 2014
https://doi.org/10.1007/978-3-319-08404-6_26
Moens SAksehirli EGoethals B(2013)Frequent Itemset Mining for Big Data2013 IEEE International Conference on Big Data10.1109/BigData.2013.6691742(111-118)Online publication date: Oct-2013
https://doi.org/10.1109/BigData.2013.6691742
Deepak AFernández-Baca DTirthapura SSanderson MMcMahon M(2013)EvoMiner: frequent subtree mining in phylogenetic databasesKnowledge and Information Systems10.1007/s10115-013-0676-041:3(559-590)Online publication date: 30-Jul-2013
https://doi.org/10.1007/s10115-013-0676-0
Nadimi-Shahraki MMustapha NSulaiman MMamat A(2011)Efficient prime-based method for interactive mining of frequent patternsExpert Systems with Applications: An International Journal10.1016/j.eswa.2011.04.05338:10(12654-12670)Online publication date: 15-Sep-2011
https://dl.acm.org/doi/10.1016/j.eswa.2011.04.053
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents