Computational aspects of mining maximal frequent patterns

doi:10.1016/j.tcs.2006.05.029

Theoretical Computer Science

Volume 362, Issues 1–3, 11 October 2006, Pages 63-85

https://doi.org/10.1016/j.tcs.2006.05.029 Get rights and content

Under an Elsevier user license

open archive

Abstract

In this paper we study the complexity-theoretic aspects of mining maximal frequent patterns, from the perspective of counting the number of all distinct solutions. We present the first formal proof that the problem of counting the number of maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. We also extend our complexity analysis to other similar data mining problems that deal with complex data structures, such as sequences, trees, and graphs. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.

Keywords

Data mining

Complexity

Maximal frequent patterns

#P-complete

Cited by (0)

^☆: The extended abstract of this paper appeared in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2004.

¹: This work was done while the author was a faculty member in Department of Computer Science and Engineering, University at Buffalo, The State University of New York.

Theoretical Computer Science

Computational aspects of mining maximal frequent patterns☆

Abstract

Keywords