An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining

Uno, Takeaki; Arimura, Hiroki

doi:10.1007/978-3-540-75488-6_21

Takeaki Uno¹ &
Hiroki Arimura²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4755))

Included in the following conference series:

International Conference on Discovery Science

1355 Accesses

Abstract

Mining frequently appearing patterns in a database is a basic problem in informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, the problem is called the frequent itemset mining problem, and has been extensively studied. In the real-world use, one of difficulties of frequent itemset mining is that real-world data is often incorrect, or missing some parts. It causes that some records which should include a pattern do not have it. To deal with real-world problems, one can use an ambiguous inclusion relation and find patterns which are mostly included in many records. However, computational difficulty have prevented such problems from being actively used in practice. In this paper, we use an alternative inclusion relation in which we consider an itemset P to be included in an itemset T if at most k items of P are not included in T, i.e., |P ∖ T| ≤ k. We address the problem of enumerating frequent itemsets under this inclusion relation and propose an efficient polynomial delay polynomial space algorithm. Moreover, To enable us to skip many small non-valuable frequent itemsets, we propose an algorithm for directly enumerating frequent itemsets of a certain size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Study of Effective Mining Algorithms for Frequent Itemsets

Structure of frequent itemsets with extended double constraints

Article Open access 29 January 2016

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

References

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, 307–328 (1996)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)
Google Scholar
Avis, D., Fukuda, K.: Reverse Search for Enumeration. Discrete App. Math. 65, 21–46 (1996)
Article MATH MathSciNet Google Scholar
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD 1998, pp. 85–93 (1998)
Google Scholar
Besson, J., Robardet, C., Boulicaut, J.F.: Mining Formal Concepts with a Bounded Number of Exceptions from Transactional Data. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 33–45. Springer, Heidelberg (2005)
Google Scholar
Liu, J., Paulsen, S., Wang, W., Nobel, A., Prins, J.: Mining Approximate Frequent Itemsets from Noisy Data. In: ICDM 2005. 5th IEEE International Conference on Data Mining, pp. 721–724 (2005)
Google Scholar
Seppanen, J.K., Mannila, H.: Dense Itemsets. In: SIGKDD 2004 (2004)
Google Scholar
Shen-Shung, W., Suh-Yin, L.: Mining Fault-Tolerant Frequent Patterns in Large Databases. In: ICS 2002 (2002)
Google Scholar
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering Most Classificatory Patterns for Very Expressive Pattern Classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets. In: Proc. IEEE ICDM 2003 Workshop FIMI 2003 (2003)
Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)
Google Scholar
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: Proc. IEEE ICDM 2004 Workshop FIMI 2004 (2004)
Google Scholar
Wang, J.T.L., Chirn, G.W., Marr, T.G., Shapiro, B., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: some preliminary results. In: Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pp. 115–125 (1994)
Google Scholar
Yang, C., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: SIGKDD 2001 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
Takeaki Uno
Graduate School of Information Science and Technology, Hokkaido University, Kita 14 Nishi 9, Sapporo 060-0814, Japan
Hiroki Arimura

Authors

Takeaki Uno
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Vincent Corruble Masayuki Takeda Einoshin Suzuki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uno, T., Arimura, H. (2007). An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining. In: Corruble, V., Takeda, M., Suzuki, E. (eds) Discovery Science. DS 2007. Lecture Notes in Computer Science(), vol 4755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75488-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-75488-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75487-9
Online ISBN: 978-3-540-75488-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics