Abstract
Given a large collection of transactions containing items, a basic common data mining problem is to extract the so-called frequent itemsets (i.e., set of items appearing in at least a given number of transactions). In this paper, we propose a structure called free-sets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of ε-adequate representation [10].We show that frequent free-sets can be efficiently extracted using pruning strategies developed for frequent item-set discovery, and that they can be used to approximate the support of any frequent itemset. Experiments run on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemsets extraction. Furthermore, the experiments show that the extraction of frequent free-sets is still possible when the extraction of frequent itemsets becomes intractable. Finally, we show that the error made when approximating frequent itemset support remains very low in practice.
e.g., data sets containing many strong correlations.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB’94, pages 487–499, 1994.
R. J. Bayardo. Brute-force mining of high-confidence classification rules. In Proceedings KDD’97, pages 123–126, 1997.
R. J. Bayardo. Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 85–93. ACM Press, 1998.
J.-F. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary data mining. In Proc. PAKDD’00, volume 1805 of LNAI, pages 62–73, Kyoto, JP, 2000. Springer-Verlag.
J.-F. Boulicaut and B. Jeudy. Using constraints during itemset mining: a generic approach. Technical Report 2000–01, INSA Lyon, LISI, F-69621 Villeurbanne, Mar. 2000.
A. Bykowski. Frequent set discovery in highly-correlated data. Technical Report July 1999, Master of Science thesis, INSA Lyon, LISI, F-69621 Villeurbanne, 1999.
A. Bykowski and L. Gomez-Chantada. Frequent itemset extraction in highly-correlated data: a web usage mining application. In Proc. WKDDM’00, pages 27–42, Kyoto, JP, Apr. 2000.
S. Fujiwara, J. D. Ullman, and R. Motwani. Dynamic miss-counting algorithms: Finding implication and similarity rules with confidence pruning. In Proc. ICDE’00, pages 501–511, San Diego, USA, 2000.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings KDD’96, pages 189–194, Portland, USA, 1996.
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.
R. Ng, L. V. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimization of constrained association rules. In Proc. ACM SIGMOD’98, pages 13–24, Seattle, USA, 1998.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1):25–46, 1999.
D. Pavlov, H. Mannila, and P. Smyth. Probalistic models for query approximation with large data sets. Technical Report pp2000–07, Univsersity of California, Department of Information and Computer Science, Irvine, CA-92697-3425, Feb. 2000.
G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, pages 229–248. AAAI Press, Menlo Park, CA, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boulicaut, JF., Bykowski, A., Rigotti, C. (2000). Approximation of Frequency Queries by Means of Free-Sets. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_8
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive