Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration

Uno, Takeaki; Arimura, Hiroki

doi:10.1007/978-3-540-68125-0_32

Takeaki Uno¹ &
Hiroki Arimura²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2554 Accesses

Abstract

Mining frequently appearing patterns in a database is a basic problem in recent informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, called transaction, the problem is called the frequent itemset mining problem, and it has been extensively studied. The items in a frequent itemset appear in many records simultaneously, thus they can be considered to be a cluster with respect to these records. However, in this sense, the condition that every item appears in each record is quite strong. We should allow for several missing items in these records. In this paper, we approach this problem from the algorithm theory, and consider the model that can be solved efficiently and possibly valuable in practice. We introduce ambiguous frequent itemsets which allow missing items in their occurrence records. More precisely, for given thresholds θ and σ, an ambiguous frequent itemset P has a transaction set \(\cal T\), \(|{\cal T}| \ge \sigma\), such that on average, transactions in \(\cal T\) include ratio θ of items of P. We formulate the problem of enumerating ambiguous frequent itemsets, and propose an efficient polynomial delay polynomial space algorithm. The practical performance is evaluated by computational experiments. Our algorithm can be naturally extended to the weighted version of the problem. The weighted version is a natural extension of the ordinary frequent itemset to weighted transaction databases, and is equivalent to finding submatrices with large average weights in their cells. An implementation is available at the author’s homepage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Study of Effective Mining Algorithms for Frequent Itemsets

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Efficiently Finding High Utility-Frequent Itemsets Using Cutoff and Suffix Utility

References

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328 (1996)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)
Google Scholar
Avis, D., Fukuda, K.: Reverse Search for Enumeration. Disc. App. Math. 65, 21–46 (1996)
Article MATH MathSciNet Google Scholar
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: SIGMOD 1998, pp. 85–93 (1998)
Google Scholar
Besson, J., Robardet, C., Boulicaut, J.F.: Mining Formal Concepts with a Bounded Number of Exceptions from Transactional Data. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 33–45. Springer, Heidelberg (2005)
Google Scholar
Goethals, B.: The FIMI repository (2003), http://fimi.cs.helsinki.fi/
Liu, J., Paulsen, S., Wang, W., Nobel, A., Prins, J.: Mining Approximate Frequent Itemsets from Noisy Data. In: ICDM 2005, pp. 721–724 (2005)
Google Scholar
Seppanen, J.K., Mannila, H.: Dense Itemsets. In: SIGKDD 2004 (2004)
Google Scholar
Shen-Shung, W., Suh-Yin, L.: Mining Fault-Tolerant Frequent Patterns in Large Databases. In: ICS 2002 (2002)
Google Scholar
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering Most Classificatory Patterns for Very Expressive Pattern Classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)
Google Scholar
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: IEEE ICDM 2004 Workshop FIMI 2004 (2004)
Google Scholar
Uno, T., Arimura, H.: An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS (LNAI), vol. 4755, pp. 219–230. Springer, Heidelberg (2007)
Chapter Google Scholar
Uno, T.: An Efficient Algorithm for Enumerating Pseudo Cliques. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 402–414. Springer, Heidelberg (2007)
Chapter Google Scholar
Wang, J.T.L., Chirn, G.W., Marr, T.G., Shapiro, B., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: some preliminary results. In: SIGMOD 1994, pp. 115–125 (1994)
Google Scholar
Yang, C., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: SIGKDD 2001 (2001)
Google Scholar
Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Informatics, , 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Takeaki Uno
Graduate School of Information Science and Technology, Hokkaido University, Kita 14 Nishi 9, Sapporo, 060-0814, Japan
Hiroki Arimura

Authors

Takeaki Uno
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uno, T., Arimura, H. (2008). Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-68125-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics