Skip to main content
Log in

On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Given an m×n binary matrix A, a subset C of the columns is called t-frequent if there are at least t rows in A in which all entries belonging to C are non-zero. Let us denote by α the number of maximal t-frequent sets of A, and let β denote the number of those minimal column subsets of A which are not t-frequent (so called t-infrequent sets). We prove that the inequality α≤(mt+1)β holds for any binary matrix A in which not all column subsets are t-frequent. This inequality is sharp, and allows for an incremental quasi-polynomial algorithm for generating all minimal t-infrequent sets. We also prove that the analogous generation problem for maximal t-frequent sets is NP-hard. Finally, we discuss the complexity of generating closed frequent sets and some other related problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Agrawal, T. Imielinski and A. Swami, Mining associations between sets of items in massive databases, in: Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data (1993) pp. 207–216.

  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo, Fast discovery of association rules, in: Advances in Knowledge Discovery and Data Mining, eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (AAAI Press, Menlo Park, CA, 1996) pp. 307–328.

    Google Scholar 

  3. R. Agrawal and R. Srikant, Mining sequential patterns, in: Proceedings of the 11th International Conference on Data Engineering (1995) pp. 3–14.

  4. R.J. Bayardo, Efficiently mining long patterns from databases, in: Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data (1998) pp. 85–93.

  5. J.C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions, Information and Computation 123 (1995) 50–63.

    Google Scholar 

  6. M.M. Bongard, Problema Uznavania (Nauka Press, Moscow, 1967). English translation: Pattern Recognition (Hayden Book Co., Spartan Book, Rochelle Park, NJ, USA, 1970).

    Google Scholar 

  7. E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Generating partial and multiple transversals of a hypergraph, in: Proceedings of the 27th International Colloquium on Automata, Languages and Programming (ICALP), eds. U. Montanari, J.D.P. Rolim and E. Welzl, Lecture Notes in Computer Science, Vol. 1853 (Springer, Berlin, 2000) pp. 588–599.

    Google Scholar 

  8. E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Dual-bounded generating problems: Partial and multiple transversals of a hypergraph, SIAM Journal on Computing 30 (2001) 2036–2050.

    Google Scholar 

  9. S. Brin, R. Motwani and C. Silverstein, Beyond market basket: Generalizing association rules to correlations, in: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data (1997) pp. 265–276.

  10. S. Brin, R. Motwani, J. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data, in: Proceedings of the 1997 ACM-SIGMOD Conference on Management of Data (1997) pp. 255–264.

  11. B.A. Davey and H.A. Priestley, Introduction to Lattices and Order (Cambridge University Press, 1990).

  12. G. Dong and J. Li, Efficient mining of emerging patterns, in: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999) pp. 43–52.

  13. T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM Journal on Computing 24 (1995) 1278–1304.

    Google Scholar 

  14. D. Eppstein, Arboricity and bipartite subgraph listing algorithms, Information Processing Letters 51 (1994) 207–211.

    Google Scholar 

  15. M.L. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms, Journal of Algorithms 21 (1996) 618–628.

    Google Scholar 

  16. B. Ganter and R. Wille, Formal Concept Analysis (Springer, 1996).

  17. M.R. Garey and D.S. Johnson, Computers and Intractability (Freeman, New York, 1979).

    Google Scholar 

  18. D. Gunopulos, R. Khardon, H. Mannila and H. Toivonen, Data mining, hypergraph transversals and machine learning, in: Proceedings of the 16th ACM-SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (1997) pp. 12–15.

  19. V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions, Discrete Applied Mathematics 1996–97, 1–3 (1999) 363–373.

    Google Scholar 

  20. J. Han, J. Pei and Y. Yin, Mining frequent patterns without candidate generation, in: Proceedings of the 2000 ACM-SIGMOD Conference on Management of Data (2000) pp. 1–12.

  21. D.S. Johnson, M. Yannakakis and C.H. Papadimitriou, On generating all maximal independent sets, Information Processing Letters 27 (1988) 119–123.

    Google Scholar 

  22. S.O. Kuznetsov, Interpretation on graphs and complexity characteristics of a search for specific patterns, Nauchn. Tekh. Inf., Ser. 2 (Automatic Document. Math. Linguist.) 23(1) (1989) 23–37.

    Google Scholar 

  23. V. Levit, private communication (2000).

  24. D. Lin and Z.M. Kedem, Pincer-search: a new algorithm for discovering the maximum frequent set, in: Proceedings of the Sixth European Conference on Extending Database Technology, to appear.

  25. K. Makino and T. Ibaraki, Inner-core and outer-core functions of partially defined Boolean functions, Discrete Applied Mathematics 1996–97, 1–3 (1999) 307–326.

    Google Scholar 

  26. H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations, in: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (1996) pp. 189–194.

  27. H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery, Series of Publications C C-1997-8, Department of Computer Science, University of Helsinki (1997).

  28. H. Mannila, H. Toivonen and A.I. Verkamo, Discovery of frequent episodes in event sequences, Data Mining and Knowledge Discovery 1 (1997) 259–289.

    Google Scholar 

  29. N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proceedings of the 7th ICDT Conference, Jerusalem, Israel, January 10–12, 1999; Lecture Notes in Computer Science, Vol. 1540 (Springer, 1999) pp. 398–416.

  30. N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, Closed set based discovery of small covers for association rules, in: Proc. 15emes Journees Bases de Donnees Avancees, BDA (1999) pp. 361–381.

  31. R.H. Sloan, K. Takata and G. Turan, On frequent sets of Boolean matrices, Annals of Mathematics and Artificial Intelligence 24 (1998) 1–4.

    Google Scholar 

  32. M.J. Zaki and M. Ogihara, Theoretical foundations of association rules, in: 3rd SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1998).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boros, E., Gurvich, V., Khachiyan, L. et al. On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices. Annals of Mathematics and Artificial Intelligence 39, 211–221 (2003). https://doi.org/10.1023/A:1024605820527

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024605820527

Navigation