Abstract
Given an m×n binary matrix A, a subset C of the columns is called t-frequent if there are at least t rows in A in which all entries belonging to C are non-zero. Let us denote by α the number of maximal t-frequent sets of A, and let β denote the number of those minimal column subsets of A which are not t-frequent (so called t-infrequent sets). We prove that the inequality α≤(m−t+1)β holds for any binary matrix A in which not all column subsets are t-frequent. This inequality is sharp, and allows for an incremental quasi-polynomial algorithm for generating all minimal t-infrequent sets. We also prove that the analogous generation problem for maximal t-frequent sets is NP-hard. Finally, we discuss the complexity of generating closed frequent sets and some other related problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
R. Agrawal, T. Imielinski and A. Swami, Mining associations between sets of items in massive databases, in: Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data (1993) pp. 207–216.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo, Fast discovery of association rules, in: Advances in Knowledge Discovery and Data Mining, eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (AAAI Press, Menlo Park, CA, 1996) pp. 307–328.
R. Agrawal and R. Srikant, Mining sequential patterns, in: Proceedings of the 11th International Conference on Data Engineering (1995) pp. 3–14.
R.J. Bayardo, Efficiently mining long patterns from databases, in: Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data (1998) pp. 85–93.
J.C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions, Information and Computation 123 (1995) 50–63.
M.M. Bongard, Problema Uznavania (Nauka Press, Moscow, 1967). English translation: Pattern Recognition (Hayden Book Co., Spartan Book, Rochelle Park, NJ, USA, 1970).
E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Generating partial and multiple transversals of a hypergraph, in: Proceedings of the 27th International Colloquium on Automata, Languages and Programming (ICALP), eds. U. Montanari, J.D.P. Rolim and E. Welzl, Lecture Notes in Computer Science, Vol. 1853 (Springer, Berlin, 2000) pp. 588–599.
E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Dual-bounded generating problems: Partial and multiple transversals of a hypergraph, SIAM Journal on Computing 30 (2001) 2036–2050.
S. Brin, R. Motwani and C. Silverstein, Beyond market basket: Generalizing association rules to correlations, in: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data (1997) pp. 265–276.
S. Brin, R. Motwani, J. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data, in: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data (1997) pp. 255–264.
B.A. Davey and H.A. Priestley, Introduction to Lattices and Order (Cambridge University Press, 1990).
G. Dong and J. Li, Efficient mining of emerging patterns, in: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999) pp. 43–52.
T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM Journal on Computing 24 (1995) 1278–1304.
D. Eppstein, Arboricity and bipartite subgraph listing algorithms, Information Processing Letters 51 (1994) 207–211.
M.L. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms, Journal of Algorithms 21 (1996) 618–628.
B. Ganter and R. Wille, Formal Concept Analysis (Springer, 1996).
M.R. Garey and D.S. Johnson, Computers and Intractability (Freeman, New York, 1979).
D. Gunopulos, R. Khardon, H. Mannila and H. Toivonen, Data mining, hypergraph transversals and machine learning, in: Proceedings of the 16th ACM-SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (1997) pp. 12–15.
V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions, Discrete Applied Mathematics 1996–97, 1–3 (1999) 363–373.
J. Han, J. Pei and Y. Yin, Mining frequent patterns without candidate generation, in: Proceedings of the 2000 ACM-SIGMOD Conference on Management of Data (2000) pp. 1–12.
D.S. Johnson, M. Yannakakis and C.H. Papadimitriou, On generating all maximal independent sets, Information Processing Letters 27 (1988) 119–123.
S.O. Kuznetsov, Interpretation on graphs and complexity characteristics of a search for specific patterns, Nauchn. Tekh. Inf., Ser. 2 (Automatic Document. Math. Linguist.) 23(1) (1989) 23–37.
V. Levit, private communication (2000).
D. Lin and Z.M. Kedem, Pincer-search: a new algorithm for discovering the maximum frequent set, in: Proceedings of the Sixth European Conference on Extending Database Technology, to appear.
K. Makino and T. Ibaraki, Inner-core and outer-core functions of partially defined Boolean functions, Discrete Applied Mathematics 1996–97, 1–3 (1999) 307–326.
H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations, in: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (1996) pp. 189–194.
H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery, Series of Publications C C-1997-8, Department of Computer Science, University of Helsinki (1997).
H. Mannila, H. Toivonen and A.I. Verkamo, Discovery of frequent episodes in event sequences, Data Mining and Knowledge Discovery 1 (1997) 259–289.
N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proceedings of the 7th ICDT Conference, Jerusalem, Israel, January 10–12, 1999; Lecture Notes in Computer Science, Vol. 1540 (Springer, 1999) pp. 398–416.
N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, Closed set based discovery of small covers for association rules, in: Proc. 15emes Journees Bases de Donnees Avancees, BDA (1999) pp. 361–381.
R.H. Sloan, K. Takata and G. Turan, On frequent sets of Boolean matrices, Annals of Mathematics and Artificial Intelligence 24 (1998) 1–4.
M.J. Zaki and M. Ogihara, Theoretical foundations of association rules, in: 3rd SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1998).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Boros, E., Gurvich, V., Khachiyan, L. et al. On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices. Annals of Mathematics and Artificial Intelligence 39, 211–221 (2003). https://doi.org/10.1023/A:1024605820527
Issue Date:
DOI: https://doi.org/10.1023/A:1024605820527