Abstract
Given a Boolean matrix and a threshold t, a subset of the columns is frequent if there are at least t rows having a 1 entry in each corresponding position. This concept is used in the algorithmic, combinatorial approach to knowledge discovery and data mining. We consider the complexity aspects of frequent sets. An explicit family of subsets is given that requires exponentially many rows to be represented as the family of frequent sets of a matrix, with any threshold. Examples are given of families that can be represented by a small matrix with threshold t, but that require a significantly larger matrix if the threshold is less than t. We also discuss the connections of these problems to circuit complexity and the existence of efficient listing algorithms.
Similar content being viewed by others
References
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo, Fast discovery of association rules, in: Advances in Knowledge Discovery and Data Mining, eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (AAAI Press, Menlo Park, CA, 1996) pp. 307–328.
R. Beigel, N. Reingold and D. Spielman, The perceptron strikes back (preliminary report), in: Proceedings of the 6th Annual Structure in Complexity Theory Conference (1991) pp. 286–291.
J.C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions, Information and Computation 123 (1995) 50–63.
B. Bollobás, Combinatorics: Set Systems, Hypergraphs, Families of Vectors and Combinatorial Probability (Cambridge University Press, Cambridge, 1986).
T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related problems, SIAM J. Comput. 24 (1995) 1278–1304.
M.L. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive normal forms, J. Algorithms 21 (1996) 618–628.
L.A. Goldberg, Efficient Algorithms for Listing Combinatorial Objects, Distinguished Dissertations in Computer Science (Cambridge University Press, Cambridge, 1993).
R.L. Graham, B.L. Rothschild and J.H. Spencer, Ramsey Theory, Interscience Series in Discrete Mathematics (Wiley, New York, 1980).
D. Gunopulos, R. Khardon, H. Mannila and H. Toivonen, Data mining, hypergraph transversals, and machine learning, in: Proceedings of the 16th ACM SIGACT–SIGMOD–SIGART Symposium on Principles of Database Systems (1997) pp. 12–15.
V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions, RUTCOR Research Report RRR 35-95, Rutgers Center for Operations Research (1997). Also available as LCSR-TR-251, Department of Computer Science, Rutgers University (1995). To appear in Discrete Appl. Math.
V. Gurvich and L. Khachiyan, On the frequency of the most frequently occurring variable in dual monotone DNFs, Discrete Math. 169 (1997) 245–248.
P. Hájek and T. Havránek, Mechanizing Hypothesis Formation: Mathematical Foundations for a General Theory (Springer, 1978).
A. Hajnal, W. Maass, P. Pudlák, M. Szegedy and G. Turán, Threshold circuits of bounded depth, J. Comput. System Sci. 46 (1993) 129–154.
D.S. Johnson, M. Yannakakis and C.H. Papadimitriou, On generating all maximal independent sets, Inform. Process. Lett. 27 (1988) 119–123.
S. Jukna, Computing threshold functions by depth-3 threshold circuits with smaller thresholds of their gates, Inform. Process. Lett. 56 (1995) 147–150.
G.O.H. Katona, T. Nemetz and M. Simonovits, On a problem of Turán in the theory of graphs, Mat. Lapok 15 (1964) 228–238 (in Hungarian).
E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan, Generating all maximal independent sets: NP-hardness and polynomial-time algorithms, SIAM J. Comput. 9(3) (1980) 558–565.
H. Mannila and H. Toivonen, Multiple uses of frequent sets and condensed representations, in: Proc. 2nd International Conference on Knowledge Discovery and Data Mining (1996) pp. 189–194.
H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery, Series of Publications C C-1997-8, University of Helsinki, Department of Computer Science (1997).
M. Minsky and S. Papert, Perceptrons (MIT Press, Cambridge, MA, 1969).
N. Mishra and L. Pitt, Generating all maximal independent sets of bounded-degree hypergraphs, in: Proc. 10th Annu. Conf. on Comput. Learning Theory (ACM Press, New York, 1997) pp. 211–217.
P. Pudlák and F.N. Springsteel, Complexity of mechanized hypothesis formation, Theoret. Comput. Sci. 8 (1979) 203–225.
K.-Y. Siu, V. Roychowdhury and T. Kailath, Discrete Neural Computation: A Theoretical Foundation (Prentice-Hall, Englewood Cliffs, NJ, 1995).
S. Tsukiyama, M. Ide, H. Ariyoshi and I. Shirakawa, A new algorithm for generating all the maximal independent sets, SIAM J. Comput. 6(3) (1977) 505–517.
I. Wegener, The Complexity of Boolean Functions (Wiley–Teubner, 1987).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sloan, R.H., Takata, K. & Turán, G. On frequent sets of Boolean matrices. Annals of Mathematics and Artificial Intelligence 24, 193–209 (1998). https://doi.org/10.1023/A:1018905417023
Issue Date:
DOI: https://doi.org/10.1023/A:1018905417023