Abstract
Mining frequently appearing patterns in a database is a basic problem in recent informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, called transaction, the problem is called the frequent itemset mining problem, and it has been extensively studied. The items in a frequent itemset appear in many records simultaneously, thus they can be considered to be a cluster with respect to these records. However, in this sense, the condition that every item appears in each record is quite strong. We should allow for several missing items in these records. In this paper, we approach this problem from the algorithm theory, and consider the model that can be solved efficiently and possibly valuable in practice. We introduce ambiguous frequent itemsets which allow missing items in their occurrence records. More precisely, for given thresholds θ and σ, an ambiguous frequent itemset P has a transaction set \(\cal T\), \(|{\cal T}| \ge \sigma\), such that on average, transactions in \(\cal T\) include ratio θ of items of P. We formulate the problem of enumerating ambiguous frequent itemsets, and propose an efficient polynomial delay polynomial space algorithm. The practical performance is evaluated by computational experiments. Our algorithm can be naturally extended to the weighted version of the problem. The weighted version is a natural extension of the ordinary frequent itemset to weighted transaction databases, and is equivalent to finding submatrices with large average weights in their cells. An implementation is available at the author’s homepage.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328 (1996)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)
Avis, D., Fukuda, K.: Reverse Search for Enumeration. Disc. App. Math. 65, 21–46 (1996)
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: SIGMOD 1998, pp. 85–93 (1998)
Besson, J., Robardet, C., Boulicaut, J.F.: Mining Formal Concepts with a Bounded Number of Exceptions from Transactional Data. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 33–45. Springer, Heidelberg (2005)
Goethals, B.: The FIMI repository (2003), http://fimi.cs.helsinki.fi/
Liu, J., Paulsen, S., Wang, W., Nobel, A., Prins, J.: Mining Approximate Frequent Itemsets from Noisy Data. In: ICDM 2005, pp. 721–724 (2005)
Seppanen, J.K., Mannila, H.: Dense Itemsets. In: SIGKDD 2004 (2004)
Shen-Shung, W., Suh-Yin, L.: Mining Fault-Tolerant Frequent Patterns in Large Databases. In: ICS 2002 (2002)
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering Most Classificatory Patterns for Very Expressive Pattern Classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Uno, T., Asai, T., Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: IEEE ICDM 2004 Workshop FIMI 2004 (2004)
Uno, T., Arimura, H.: An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS (LNAI), vol. 4755, pp. 219–230. Springer, Heidelberg (2007)
Uno, T.: An Efficient Algorithm for Enumerating Pseudo Cliques. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 402–414. Springer, Heidelberg (2007)
Wang, J.T.L., Chirn, G.W., Marr, T.G., Shapiro, B., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: some preliminary results. In: SIGMOD 1994, pp. 115–125 (1994)
Yang, C., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: SIGKDD 2001 (2001)
Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Uno, T., Arimura, H. (2008). Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)