Skip to main content
Log in

Adequate condensed representations of patterns

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Patterns are at the core of the discovery of a lot of knowledge from data but their uses are limited due to their huge number and their mining cost. During the last decade, many works addressed the concept of condensed representation w.r.t. frequency queries. Such representations are several orders of magnitude smaller than the size of the whole collections of patterns, and also enable us to regenerate the frequency information of any pattern. In this paper, we propose a framework for condensed representations w.r.t. a large set of new and various queries named condensable functions based on interestingness measures (e.g., frequency, lift, minimum). Such condensed representations are achieved thanks to new closure operators automatically derived from each condensable function to get adequate condensed representations. We propose a generic algorithm Mic Mac to efficiently mine the adequate condensed representations. Experiments show both the conciseness of the adequate condensed representations and the efficiency of our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) VLDB’94, proceedings of 20th international conference on very large data bases. Morgan Kaufmann, pp 487–499

  • Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Lloyd JW, Dahl V, Furbach U, Kerber M, Lau K-K, Palamidessi C, Pereira LM, Sagiv Y, Stuckey PJ (eds) Computational logic, vol 1861 of LNCS. Springer, pp 972–986

  • Birkhoff G (1967) Lattices theory, vol 25. American Mathematical Society

  • Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Discov 7(1):5–22. Kluwer Academics Publishers

    Google Scholar 

  • Bykowski A, Rigotti C (2003) DBC: a condensed representation of frequent patterns for efficient mining. Inf Syst 28(8): 949–977

    Article  Google Scholar 

  • Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (PKDD’02), pp 74–85

  • Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases (PKDD’03), Springer, pp 71–82

  • Calders T, Rigotti C, Boulicaut J-F (2004) A survey on condensed representations for frequent sets. In: Boulicaut J-F, Raedt LD, Mannila H (eds) Constraint-based mining and inductive databases, European workshop on inductive databases and constraint based mining, vol 3848 of LNCS. Springer, pp 64–80

  • Casali A, Cicchetti R, Lakhal L (2005) Essential patterns: a perfect cover of frequent patterns. In: Tjoa AM, Trujillo J (eds) Data warehousing and knowledge discovery, 7th international conference, DaWaK 2005, proceedings, vol 3589 of LNCS. Springer, pp 428–437

  • Crémilleux B, Boulicaut JF (2002) Simplest rules characterizing classes generated by delta-free sets. In: 22nd international conference on knowledge based systems and applied artificial intelligence, pp 33–46

  • Gasmi G, Yahia SB, Nguifo EM, Bouker S (2007) Extraction of association rules based on literalsets. In: Song IY, Eder J, Nguyen TM (eds) Data warehousing and knowledge discovery, 9th international conference, DaWaK 2007, proceedings, vol 4654 of LNCS. Springer, pp 293–302

  • Giacometti A, Laurent D, Diop CT (2002) Condensed representations for sets of mining queries. In: Knowledge discovery in inductive databases, 1st international workshop, KDID 2002

  • Goethals B, Zaki MJ (eds) (2003) FIMI ’03, frequent itemset mining implementations, proceedings, vol 90 of CEUR workshop proceedings. http://CEUR-WS.org

  • Imielinski T, Mannila H (1996) A database perspective on knowledge discovery. Commun ACM 39(11): 58–64

    Article  Google Scholar 

  • Kryszkiewicz M (2005) Generalized disjunction-free representation of frequent patterns with negation. J Exp Theor Artif Intell 17(1–2): 63–82

    Article  Google Scholar 

  • Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Berkhin P, Caruana R, Wu X (eds) KDD. ACM, pp 430–439

  • Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov 1(3): 241–258

    Article  Google Scholar 

  • Mitchell TM (1982) Generalization as search. Artif Intell 18(2): 203–226

    Article  Google Scholar 

  • Morik K, Boulicaut J-F, AS (eds) (2005) Local pattern detection, vol 3539 of LNAI. Springer-Verlag

  • Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: Haas LM, Tiwary A (eds) SIGMOD 1998, proceedings ACM SIGMOD international conference on management of data. ACM Press, pp 13–24

  • Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Database theory—ICDT ’99, 7th international conference, proceedings, vol 1540 of LNCS. Springer, pp 398–416

  • Soulet A, Crémilleux B, Rioult F (2004) Condensed representation of EPs and patterns quantified by frequency-based measures. In: Post-proceedings of knowledge discovery in inductive databases, 3rd international workshop, KDID 2004, Pise, Springer

  • Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Berry MW, Dayal U, Kamath C, Skillicorn DB (eds) Proceedings of the fourth SIAM international conference on data mining

  • Zaki MJ (2000a) Generating non-redundant association rules. In: KDD, pp 34–43

  • Zaki MJ (2000b) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3): 372–390

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnaud Soulet.

Additional information

Responsible editors: Walter Daelemans, Bart Goethals, and Katharina Morik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soulet, A., Crémilleux, B. Adequate condensed representations of patterns. Data Min Knowl Disc 17, 94–110 (2008). https://doi.org/10.1007/s10618-008-0111-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0111-4

Keywords

Navigation