Abstract
The purpose of this paper is two-fold: First, we give efficient algorithms for answering itemset support queries for collections of itemsets from various representations of the frequency information. As index structures we use itemset tries of transaction databases, frequent itemsets and their condensed representations. Second, we evaluate the usefulness of condensed representations of frequent itemsets to answer itemset support queries using the proposed query algorithms and index structures. We study analytically the worst-case time complexities of querying condensed representations and evaluate experimentally the query efficiency with random itemset queries to several benchmark transaction databases.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) SIGMOD Conference, pp. 207–216 (1993)
Goethals, B.: Frequent set mining. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 377–397. Springer, Heidelberg (2005)
Goethals, B., Zaki, M.J. (eds.): FIMI 2003, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA, December 19, 2003. CEUR Workshop Proceedings, vol. 90 (2003)
Bayardo Jr., R.J., Goethals, B., Zaki, M.J. (eds.): FIMI 2004, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1, 2004. CEUR Workshop Proceedings, vol. 126 (2004)
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations (extended abstract). In: KDD, pp. 189–194 (1996)
Calders, T., Rigotti, C., Boulicaut, J.F.: A survey on condensed representations for frequent sets. In: [30], pp. 64–80
Mielikäinen, T.: Transaction databases, frequent itemsets, and their condensed representations. In: [31], pp. 139–164
Boulicaut, J.-F.: Inductive databases and multiple uses of frequent itemsets: The cInQ approach. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 1–23. Springer, Heidelberg (2006)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39, 58–64 (1996)
Mannila, H.: Inductive databases and condensed representations for data mining. In: ILPS, pp. 21–30 (1997)
Siebes, A.: Data mining in inductive databases. In: [31], pp. 1–23
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1992)
Maron, M.E.: Automatic indexing: An experimental inquiry. J. ACM 8, 404–417 (1961)
Panov, P., Džeroski, S., Blockeel, H., Loškovska, S.: Predictive data mining using itemset frequencies. In: Proceedings of the 8th International Multiconference Information Society, pp. 224–227 (2005)
Kearns, M.J.: Efficient noise-tolerant learning from statistical queries. J. ACM 45, 983–1006 (1998)
Pavlov, D., Mannila, H., Smyth, P.: Beyond independence: Probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering 15, 1409–1421 (2003)
Seppänen, J.K., Mannila, H.: Boolean formulas and frequent sets. In: [30], pp. 348–361
Mielikäinen, T.: Separating structure from interestingness. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 476–485. Springer, Heidelberg (2004)
Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.): VLDB 1996, pp. 134–145 (1996)
Kubat, M., Hafez, A., Raghavan, V.V., Lekkala, J.R., Chen, W.K.: Itemset trees for targeted association querying. IEEE Transactions on Knowledge and Data Engineering 15, 1522–1534 (2003)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8, 53–87 (2004)
Moore, A.W., Lee, M.S.: Cached sufficient statistics for efficient machine learning with large datasets. JAIR 8, 67–91 (1998)
Mielikäinen, T.: Implicit enumeration of patterns. In: [32], pp. 150–172
Laur, S., Lipmaa, H., Mielikäinen, T.: Private itemset support counting. In: Qing, S., Mao, W., López, J., Wang, G. (eds.) ICICS 2005. LNCS, vol. 3783, pp. 97–111. Springer, Heidelberg (2005)
Mielikäinen, T.: An automata approach to pattern collections. In: [32], pp. 130–149
Calders, T., Goethals, B.: Quick inclusion-exclusion. In: [31], pp. 86–103
Geerts, F., Goethals, B., Mielikäinen, T.: What you store is what you get. In: [33], pp. 60–69
Mielikäinen, T.: Finding all occurring patterns of interest. In: [33], pp. 97–106
Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.): Constraint-Based Mining and Inductive Databases. LNCS, vol. 3848. Springer, Heidelberg (2006)
Bonchi, F., Boulicaut, J.-F. (eds.): KDID 2005. LNCS, vol. 3933. Springer, Heidelberg (2006)
Goethals, B., Siebes, A. (eds.): KDID 2004 (Revised Selected and Invited Papers). LNCS, vol. 3377. Springer, Heidelberg (2005)
Boulicaut, J.F., Dzeroski, S. (eds.): Proceedings of the Second International Workshop on Inductive Databases, Cavtat-Dubrovnik, Croatia, September 22 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mielikäinen, T., Panov, P., Džeroski, S. (2006). Itemset Support Queries Using Frequent Itemsets and Their Condensed Representations. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_18
Download citation
DOI: https://doi.org/10.1007/11893318_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46491-4
Online ISBN: 978-3-540-46493-8
eBook Packages: Computer ScienceComputer Science (R0)