Abstract
Mining frequently appearing patterns in a database is a basic problem in informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, the problem is called the frequent itemset mining problem, and has been extensively studied. In the real-world use, one of difficulties of frequent itemset mining is that real-world data is often incorrect, or missing some parts. It causes that some records which should include a pattern do not have it. To deal with real-world problems, one can use an ambiguous inclusion relation and find patterns which are mostly included in many records. However, computational difficulty have prevented such problems from being actively used in practice. In this paper, we use an alternative inclusion relation in which we consider an itemset P to be included in an itemset T if at most k items of P are not included in T, i.e., |P ∖ T| ≤ k. We address the problem of enumerating frequent itemsets under this inclusion relation and propose an efficient polynomial delay polynomial space algorithm. Moreover, To enable us to skip many small non-valuable frequent itemsets, we propose an algorithm for directly enumerating frequent itemsets of a certain size.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, 307–328 (1996)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)
Avis, D., Fukuda, K.: Reverse Search for Enumeration. Discrete App. Math. 65, 21–46 (1996)
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD 1998, pp. 85–93 (1998)
Besson, J., Robardet, C., Boulicaut, J.F.: Mining Formal Concepts with a Bounded Number of Exceptions from Transactional Data. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 33–45. Springer, Heidelberg (2005)
Liu, J., Paulsen, S., Wang, W., Nobel, A., Prins, J.: Mining Approximate Frequent Itemsets from Noisy Data. In: ICDM 2005. 5th IEEE International Conference on Data Mining, pp. 721–724 (2005)
Seppanen, J.K., Mannila, H.: Dense Itemsets. In: SIGKDD 2004 (2004)
Shen-Shung, W., Suh-Yin, L.: Mining Fault-Tolerant Frequent Patterns in Large Databases. In: ICS 2002 (2002)
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering Most Classificatory Patterns for Very Expressive Pattern Classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets. In: Proc. IEEE ICDM 2003 Workshop FIMI 2003 (2003)
Uno, T., Asai, T., Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: Proc. IEEE ICDM 2004 Workshop FIMI 2004 (2004)
Wang, J.T.L., Chirn, G.W., Marr, T.G., Shapiro, B., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: some preliminary results. In: Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pp. 115–125 (1994)
Yang, C., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: SIGKDD 2001 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Uno, T., Arimura, H. (2007). An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining. In: Corruble, V., Takeda, M., Suzuki, E. (eds) Discovery Science. DS 2007. Lecture Notes in Computer Science(), vol 4755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75488-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-75488-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75487-9
Online ISBN: 978-3-540-75488-6
eBook Packages: Computer ScienceComputer Science (R0)