Approximation of Frequency Queries by Means of Free-Sets

Boulicaut, Jean-François; Bykowski, Artur; Rigotti, Christophe

doi:10.1007/3-540-45372-5_8

Jean-François Boulicaut⁴,
Artur Bykowski⁴ &
Christophe Rigotti⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1910))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2815 Accesses
60 Citations

Abstract

Given a large collection of transactions containing items, a basic common data mining problem is to extract the so-called frequent itemsets (i.e., set of items appearing in at least a given number of transactions). In this paper, we propose a structure called free-sets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of ε-adequate representation [10].We show that frequent free-sets can be efficiently extracted using pruning strategies developed for frequent item-set discovery, and that they can be used to approximate the support of any frequent itemset. Experiments run on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemsets extraction. Furthermore, the experiments show that the extraction of frequent free-sets is still possible when the extraction of frequent itemsets becomes intractable. Finally, we show that the error made when approximating frequent itemset support remains very low in practice.

e.g., data sets containing many strong correlations.

Download to read the full chapter text

Chapter PDF

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Article 20 May 2016

Structure of frequent itemsets with extended double constraints

Article Open access 29 January 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB’94, pages 487–499, 1994.
Google Scholar
R. J. Bayardo. Brute-force mining of high-confidence classification rules. In Proceedings KDD’97, pages 123–126, 1997.
Google Scholar
R. J. Bayardo. Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 85–93. ACM Press, 1998.
Google Scholar
J.-F. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary data mining. In Proc. PAKDD’00, volume 1805 of LNAI, pages 62–73, Kyoto, JP, 2000. Springer-Verlag.
Google Scholar
J.-F. Boulicaut and B. Jeudy. Using constraints during itemset mining: a generic approach. Technical Report 2000–01, INSA Lyon, LISI, F-69621 Villeurbanne, Mar. 2000.
Google Scholar
A. Bykowski. Frequent set discovery in highly-correlated data. Technical Report July 1999, Master of Science thesis, INSA Lyon, LISI, F-69621 Villeurbanne, 1999.
Google Scholar
A. Bykowski and L. Gomez-Chantada. Frequent itemset extraction in highly-correlated data: a web usage mining application. In Proc. WKDDM’00, pages 27–42, Kyoto, JP, Apr. 2000.
Google Scholar
S. Fujiwara, J. D. Ullman, and R. Motwani. Dynamic miss-counting algorithms: Finding implication and similarity rules with confidence pruning. In Proc. ICDE’00, pages 501–511, San Diego, USA, 2000.
Google Scholar
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings KDD’96, pages 189–194, Portland, USA, 1996.
Google Scholar
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.
Article Google Scholar
R. Ng, L. V. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimization of constrained association rules. In Proc. ACM SIGMOD’98, pages 13–24, Seattle, USA, 1998.
Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1):25–46, 1999.
Article Google Scholar
D. Pavlov, H. Mannila, and P. Smyth. Probalistic models for query approximation with large data sets. Technical Report pp2000–07, Univsersity of California, Department of Information and Computer Science, Irvine, CA-92697-3425, Feb. 2000.
Google Scholar
G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, pages 229–248. AAAI Press, Menlo Park, CA, 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Ingénierie des Systèmes d’Information, INSA Lyon, Bâtiment 501, F-69621, Villeurbanne Cedex, France
Jean-François Boulicaut, Artur Bykowski & Christophe Rigotti

Authors

Jean-François Boulicaut
View author publications
You can also search for this author in PubMed Google Scholar
Artur Bykowski
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Rigotti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, O.S. Bragstads plass 2E, 7491, Trondheim, Norway
Jan Komorowski
Department of Computer Science, University of North Carolina, Charlotte, NC 28223, USA
Jan Żytkow
Laboratoire ERIC, Université Lyon 2, 5 avenue Pierre Mendès-France, 69676, Bron, France
Djamel A. Zighed

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boulicaut, JF., Bykowski, A., Rigotti, C. (2000). Approximation of Frequency Queries by Means of Free-Sets. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_8

Download citation

DOI: https://doi.org/10.1007/3-540-45372-5_8
Published: 18 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Approximation of Frequency Queries by Means of Free-Sets

Abstract

Chapter PDF

Similar content being viewed by others

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Structure of frequent itemsets with extended double constraints

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Approximation of Frequency Queries by Means of Free-Sets

Abstract

Chapter PDF

Similar content being viewed by others

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Structure of frequent itemsets with extended double constraints

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation