Abstract
Let A be an m × n binary matrix, t ∈,{1,… ,m} be a threshold, and ε > 0 be a positive parameter. We show that given a family of O(nε) maximal t-frequent column sets for A, it is NP-complete to decide whether A has any further maximal t-frequent sets, or not, even when the number of such additional maximal t-frequent column sets may be exponentially large. In contrast, all minimal t-infrequent sets of columns of A can be enumerated in incremental quasi-polynomial time. The proof of the latter result follows from the inequality α ≤ (m-t+1)β, where α and β are respectively the numbers of all maximal t-frequent and all minimal t-infrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed t-frequent column sets for a given binary matrix.
This research is supported in part by the National Science Foundation (Grant IIS- 0118635), the Office of Naval Research (Grant N00014-92-J-1375), and Grants-in- Aid for Scientific Research of the Ministry of Education, Culture, Sports, Science and Technology of Japan. Visits of the second author to Rutgers University were also supported by DIMACS, the National Science Foundation’s Center for Discrete Mathematics and Theoretical Computer Science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski and A. Swami. Mining associations between sets of items in massive databases. In: Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data, pp. 207–216.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, Fast discovery of association rules, In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy eds., Advances in Knowledge Discoveryand Data Mining, 307–328, AAAI Press, Menlo Park, California, 1996.
R. Agrawal and R. Srikant. Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, 1995, pp.3–14.
R.J. Bayardo, Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data, pp. 85–93.
J. C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions, Information and Computation 123 (1995) 50–63.
E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Generating partial and multiple transversals of a hypergraph. In: Proceedings of the 27th International Colloquium on Automata, Languages and Programming (ICALP), (U. Montanari, J.D.P. Rolim and E. Welzl, eds.) Lecture Notes in Computer Science 1853 pp. 588–599, (Springer Verlag, Berlin, Heidelberg, New York, 2000).
E. Boros, V. Gurvich, L. Khachiyan and K. Makino, GeneratingWeighted Transversals of a Hypergraph, DIMACS Technical Report 00-17, Rutgers University, 2000. (http://dimacs.rutgers.edu/TechnicalReports/2000.html)
S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generaliing association rules to correlations. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 265–276.
S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data, pp. 255–264.
G. Dong and J. Li. Efficient mining of emerging patterns. In: Proceeding of the 1999 ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, pp. 43–52.
T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM Journal on Computing, 24 (1995) 1278–1304.
D. Eppstein, Arboricity and bipartite subgraph listing algorithms, Information Processing Letters 51 (1994), pp. 207–211.
J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, In: Proceedings of the 2000 ACM-SIGMOD Conference on Management of Data, pp. 1–12.
M. L. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms. J. Algorithms, 21 (1996) 618–628.
M. R. Garey and D. S. Johnson, Computers and Intractability, Freeman, New York, 1979.
D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen, Data mining, hypergraph transversals and machine learning. In: Proceedings of the 16th ACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (1997) pp. 12–15.
V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions, Discrete Applied Mathematics, 1996–97, issue 1–3, (1999) 363–373.
S. O. Kuznetsov, Interpretation on graphs and complexity characteristics of a search for specifi0c patterns, Nauchn. Tekh. Inf., Ser. 2 (Automatic Document. Math. Linguist.) 23(1), (1989) pp. 23–37.
D. Lin and Z.M. Kedem. Pincer-search: a new algorithm for discovering the maximum frequent set. In: Proceedings of the Sixth European Conference on Extending Database Technology, to appear.
K. Makino and T. Ibaraki, Inner-core and outer-core functions of partially defined Boolean functions, Discrete Applied Mathematics, 1996–97, issue 1–3 (1999), 307–326.
H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations. In: Proceedings of the 2nd International Conference on Knowledge Discoveryand Data Mining, (1996) pp. 189–194.
H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery. Series of Publications C C-1997-8, University of Helsinki, Department of Computer Science (1997).
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1 (1997), 259–289.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Discovering frequent closed itemsets for association rules. Proc. of the 7th ICDT Conference, Jerusalem, Israel, January 10-12, 1999; Lecture Notes in Computer Science, 1540, pp. 398–416, Springer Verlag, 1999.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Closed Set Based Discovery of Small Covers for Association Rules, Proc. 15emes Journees Bases de Donnees Avancees, BDA, pp. 361–381, 1999.
R. H. Sloan, K. Takata, G. Turan, On frequent sets of Boolean matrices, Annals of Mathematics and Artificial Intelligence 24 (1998) 1–4.
M.J. Zaki and M. Ogihara, Theoretical foundations of association rules, 3rd SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, June 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boros, E., Gurvich, V., Khachiyan, L., Makino, K. (2002). On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets. In: Alt, H., Ferreira, A. (eds) STACS 2002. STACS 2002. Lecture Notes in Computer Science, vol 2285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45841-7_10
Download citation
DOI: https://doi.org/10.1007/3-540-45841-7_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43283-8
Online ISBN: 978-3-540-45841-8
eBook Packages: Springer Book Archive