Frequent Closures as a Concise Representation for Binary Data Mining

Boulicaut, Jean-François; Bykowski, Artur

doi:10.1007/3-540-45571-X_9

Jean-François Boulicaut⁴ &
Artur Bykowski⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1744 Accesses
34 Citations

Abstract

Frequent set discovery from binary data is an important problem in data mining. It concerns the discovery of a concise representation of large tables from which descriptive rules can be derived, e.g., the popular association rules. Our work concerns the study of two representations, namely frequent sets and frequent closures. N. Pasquier and colleagues designed the close algorithm that provides frequent sets via the discovery of frequent closures. When one mines highly correlated data, apriori-based algorithms clearly fail while close remains tractable. We discuss our implementation of close and the experimental evidence we got from two real-life binary data mining processes. Then, we introduce the concept of almost-closure (generation of every frequent set from frequent almost-closures remains possible but with a bounded error on frequency). To the best of our knowledge, this is a new concept and, here again, we provide some experimental evidence of its add-value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In: Proc. SIGMOD’93, Washington DC (USA), pages 207–216, May 1993, ACM Press.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pages 307–328, 1996, AAAI Press.
Google Scholar
R.J. Bayardo. Efficiently mining of long patterns from databases. In: Proc. SIG-MOD’98, Seattle (USA), pages 85–93, June 1998, ACM Press.
Google Scholar
J-F. Boulicaut, M. Klemettinen, and H. Mannila. Modeling KDD processes within the Inductive Database Framework. In: Proc. DaWak’99, Florence (I), pages 293–302, September 1999, Springer-Verlag, LNCS 1676.
Google Scholar
J-F. Boulicaut, A. Bykowski, and C. Rigotti. Mining almost-closures in highly correlated data. Research Report LISI INSA Lyon, 2000, 20 pages.
Google Scholar
A. Bykowski. Frequent set discovery in highly correlated data. Master of Science thesis, INSA Lyon, July 1999, 30 pages.
Google Scholar
H. Toivonen. Sampling large databases for association rules. In: Proc. VLDB’96, Mumbay (India), pages 134–145, September 1996, Morgan Kaufmann.
Google Scholar
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In: Proc. KDD’96, Portland (USA), pages 189–194, August 1996, AAAI Press.
Google Scholar
H. Mannila. Inductive databases and condensed representations for data mining. In: Proc. ILPS’97, Port Jefferson, Long Island N.Y. (USA), pages 21–30, October 1997, MIT Press.
Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, Volume 24(1), pages 25–46, 1999.
Article Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Closed set discovery of small covers for association rules. In: Proc. BDA’99, Bordeaux (F), pages 53–68, October 1999.
Google Scholar
M. Zaki and M. Ogihara. Theoretical foundations of association rules. In: Proc. Workshop post-SIGMOD DMKD’98, Seattle (USA), pages 85–93, June 1998.
Google Scholar
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Ingénierie des Systèmes d’Information, Institut National des Sciences Appliquées de Lyon, Bâtiment 501, F-69621, Villeurbanne cedex, France
Jean-François Boulicaut & Artur Bykowski

Authors

Jean-François Boulicaut
View author publications
You can also search for this author in PubMed Google Scholar
Artur Bykowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Systems Management, Universiy of Tsukuba, 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan
Takao Terano
Department of Computer Science and Engineering, Arizona State University, P.O. Box 875 406, Tempe, AZ, 85287-5406
Huan Liu
Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan ROC
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boulicaut, JF., Bykowski, A. (2000). Frequent Closures as a Concise Representation for Binary Data Mining. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_9

Download citation

DOI: https://doi.org/10.1007/3-540-45571-X_9
Published: 24 March 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67382-8
Online ISBN: 978-3-540-45571-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics