Abstract
Patterns are at the core of the discovery of a lot of knowledge from data but their uses are limited due to their huge number and their mining cost. During the last decade, many works addressed the concept of condensed representation w.r.t. frequency queries. Such representations are several orders of magnitude smaller than the size of the whole collections of patterns, and also enable us to regenerate the frequency information of any pattern. In this paper, we propose a framework for condensed representations w.r.t. a large set of new and various queries named condensable functions based on interestingness measures (e.g., frequency, lift, minimum). Such condensed representations are achieved thanks to new closure operators automatically derived from each condensable function to get adequate condensed representations. We propose a generic algorithm Mic Mac to efficiently mine the adequate condensed representations. Experiments show both the conciseness of the adequate condensed representations and the efficiency of our algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) VLDB’94, proceedings of 20th international conference on very large data bases. Morgan Kaufmann, pp 487–499
Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Lloyd JW, Dahl V, Furbach U, Kerber M, Lau K-K, Palamidessi C, Pereira LM, Sagiv Y, Stuckey PJ (eds) Computational logic, vol 1861 of LNCS. Springer, pp 972–986
Birkhoff G (1967) Lattices theory, vol 25. American Mathematical Society
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Discov 7(1):5–22. Kluwer Academics Publishers
Bykowski A, Rigotti C (2003) DBC: a condensed representation of frequent patterns for efficient mining. Inf Syst 28(8): 949–977
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (PKDD’02), pp 74–85
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases (PKDD’03), Springer, pp 71–82
Calders T, Rigotti C, Boulicaut J-F (2004) A survey on condensed representations for frequent sets. In: Boulicaut J-F, Raedt LD, Mannila H (eds) Constraint-based mining and inductive databases, European workshop on inductive databases and constraint based mining, vol 3848 of LNCS. Springer, pp 64–80
Casali A, Cicchetti R, Lakhal L (2005) Essential patterns: a perfect cover of frequent patterns. In: Tjoa AM, Trujillo J (eds) Data warehousing and knowledge discovery, 7th international conference, DaWaK 2005, proceedings, vol 3589 of LNCS. Springer, pp 428–437
Crémilleux B, Boulicaut JF (2002) Simplest rules characterizing classes generated by delta-free sets. In: 22nd international conference on knowledge based systems and applied artificial intelligence, pp 33–46
Gasmi G, Yahia SB, Nguifo EM, Bouker S (2007) Extraction of association rules based on literalsets. In: Song IY, Eder J, Nguyen TM (eds) Data warehousing and knowledge discovery, 9th international conference, DaWaK 2007, proceedings, vol 4654 of LNCS. Springer, pp 293–302
Giacometti A, Laurent D, Diop CT (2002) Condensed representations for sets of mining queries. In: Knowledge discovery in inductive databases, 1st international workshop, KDID 2002
Goethals B, Zaki MJ (eds) (2003) FIMI ’03, frequent itemset mining implementations, proceedings, vol 90 of CEUR workshop proceedings. http://CEUR-WS.org
Imielinski T, Mannila H (1996) A database perspective on knowledge discovery. Commun ACM 39(11): 58–64
Kryszkiewicz M (2005) Generalized disjunction-free representation of frequent patterns with negation. J Exp Theor Artif Intell 17(1–2): 63–82
Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Berkhin P, Caruana R, Wu X (eds) KDD. ACM, pp 430–439
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov 1(3): 241–258
Mitchell TM (1982) Generalization as search. Artif Intell 18(2): 203–226
Morik K, Boulicaut J-F, AS (eds) (2005) Local pattern detection, vol 3539 of LNAI. Springer-Verlag
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: Haas LM, Tiwary A (eds) SIGMOD 1998, proceedings ACM SIGMOD international conference on management of data. ACM Press, pp 13–24
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Database theory—ICDT ’99, 7th international conference, proceedings, vol 1540 of LNCS. Springer, pp 398–416
Soulet A, Crémilleux B, Rioult F (2004) Condensed representation of EPs and patterns quantified by frequency-based measures. In: Post-proceedings of knowledge discovery in inductive databases, 3rd international workshop, KDID 2004, Pise, Springer
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Berry MW, Dayal U, Kamath C, Skillicorn DB (eds) Proceedings of the fourth SIAM international conference on data mining
Zaki MJ (2000a) Generating non-redundant association rules. In: KDD, pp 34–43
Zaki MJ (2000b) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3): 372–390
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Walter Daelemans, Bart Goethals, and Katharina Morik.
Rights and permissions
About this article
Cite this article
Soulet, A., Crémilleux, B. Adequate condensed representations of patterns. Data Min Knowl Disc 17, 94–110 (2008). https://doi.org/10.1007/s10618-008-0111-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0111-4