Skip to main content
Log in

Mining frequent conjunctive queries in relational databases through dependency discovery

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We present an approach for mining frequent conjunctive in arbitrary relational databases. Our pattern class is the simple, but appealing subclass of simple conjunctive queries. Our algorithm, called Conqueror\(^+\), is capable of detecting previously unknown functional and inclusion dependencies that hold on the database relations as well as on joins of relations. These newly detected dependencies are then used to prune redundant queries. We propose an efficient database-oriented implementation of our algorithm using SQL and provide several promising experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. We consider in our approach that the only possible functional dependencies with an empty left-hand side are of the form \(\emptyset \rightarrow \emptyset \).

  2. The case of projections over the emptyset is defined in [16] as being empty if the projected relation is empty, and otherwise, as containing a single specific tuple, called the empty tuple.

  3. The source code of Conqueror\(^{+}\) can be downloaded at http://www.adrem.ua.ac.be.

References

  1. Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. In: Advances in knowledge discovery and data mining. AAAI-MIT Press, Cambridge, pp 309–328

  2. Baixeries J (2004) A formal concept analysis framework to mine functional dependencies. In: Int. workshop on mathematical methods for learning, pp 1–9

  3. Baixeries J (2008) A formal context for symmetric dependencies. In: Formal concept analysis, 6th international conference, ICFCA, vol 4933 of lecture notes in computer science, Springer, Berlin, pp 90–105

  4. Baixeries J (2011) A new formal context for symmetric dependencies. In: Concept lattices and their applications, CLA, pp 333–348 (INRIA)

  5. Bohannon P, Fan W, Geerts F, Jia X, Kementsietsidis A (2007) Conditional functional dependencies for data cleaning. In: ICDE, pp 746–755

  6. Dehaspe L, Toivonen H (2001) Discovery of relational association rules. In: Džeroski S, Lavrač N (eds) Relational data mining. Springer, Berlin, pp 189–208

  7. Dieng C, Jen T-Y, Laurent D (2010) An efficient computation of frequent queries in a star schema. In: DEXA 2010, vol 6262(II) of LNCS, Springer, Berlin, pp 225–239

  8. Diop C, Giacometti A, Laurent D, Spyratos N (2002) Composition of mining contexts for efficient extraction of association rules. In: EDBT’02, vol 2287 of LNCS. Springer, Berlin, pp 106–123

  9. Goethals B, den Bussche JV (2002) Relational association rules: getting warmer. In: ESF exploratory workshop on pattern detection and discovery in data mining, vol 2447 of LNCS, Springer, Berlin, pp 125–139

  10. Goethals B, Hoekx E, den Bussche JV (2005) Mining tree queries in a graph. In: ACM KDD, pp 61–69

  11. Goethals B, Laurent D, Le Page W (2010) Discovery and application of functional dependencies in conjunctive query mining. In: DAWAK 2010, vol 6263 of LNCS, Springer, Berlin, pp 142–156

  12. Goethals B, Le Page W, Mannila H (2008) Mining association rules of simple conjunctive queries. In: SIAM-SDM, pp 96–107

  13. Hoekx E, den Bussche JV (2006) Mining for tree-query associations in a graph. In: IEEE ICDM, pp 254–264

  14. IMDB. http://imdb.com. 2008

  15. Inokuchi A, Washio T, Motoda H (2000) An Apriori-based algorithm for mining frequent substructures from graph data. In: PKDD, vol 1910 of LNCS, Springer, Berlin, pp 13–23

  16. Jen T-Y, Laurent D, Spyratos N (2008) Mining all frequent selection-projection queries from a relational table. In: EDBT’08, ACM Press, pp 368–379

  17. Jen T-Y, Laurent D, Spyratos N (2009) Mining frequent conjunctive queries in star schemas. In: International database engineering and applications symposium (IDEAS), ACM Press, pp 97–108

  18. Jen T-Y, Laurent D, Spyratos N, Sy O (2005) Towards mining frequent queries in star schemes. In: International workshop on knowledge discovery in databases (KDID), vol 3933 of LNCS, Springer, Berlin, pp 104–123

  19. Jensen V, Soparker N (2000) Frequent itemset counting across multiple tables. In: PAKDD, vol 1805 of lecture notes in computer science, Springer, Berlin, pp 49–61

  20. Kamber M, Han J, Chiang J (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: ACM KDD, pp 207–210

  21. Knuth D (2006) The art of computer programming, vol. 4. Addison-Wesley, Reading

    Google Scholar 

  22. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: IEEE ICDM, pp 313–320

  23. Lakhal L, Stumme G (2005) Efficient mining of association rules based on formal concept analysis. In: Formal concept analysis, vol 3626 of lecture notes in computer science. Springer, Berlin, pp 180–195

  24. Le Page W (2009) Mining patterns in relational databases. PhD thesis, University of Antwerp, Antwerp

  25. Lopes S, Petit J-M, Lakhal L (2002) Functional and approximative dependency mining: database and FCA points of view. J Exp Theor Artif Intell 14(2–3):93–114

    Article  MATH  Google Scholar 

  26. Ng E, Fu A, Wang K (2002) Mining association rules from stars. In: IEEE-ICDM, pp 322–329

  27. Nijssen S, Kok JN (2003) Efficient frequent query discovery in FARMER. In: PKDD 2003, vol 2838 of LNCS, Springer, Berlin, pp 350–362

  28. Novelli N, Cicchetti R (2001) FUN: an efficient algorithm for mining functional and embedded dependencies. In: International conference on database theory (ICDT), vol 1973 of LNCS, Springer, Berlin, pp 189–203

  29. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46

    Article  Google Scholar 

  30. Plotkin G (1970) A note on inductive generalization. Mach Intell 5:153–163

    MathSciNet  Google Scholar 

  31. Ullman J (1988–1989) Principles of databases and knowledge-base systems, vol 1–2. Computer Science Press, Rockville

  32. Weisstein EW (2009) Restricted growth string. In: A Wolfram web resource (http://mathword.wolfram.com/RestrictedGrowthString.html)

  33. Wyss C-M, Giannella C, Robertson E-L (2001) FastFDs: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In: DAWAK, vol 2114 of LNCS, Springer, Berlin, pp 101–110

  34. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: IEEE ICDM, pp 721–724

  35. Yao H, Hamilton HJ (2008) Mining functional dependencies from data. Data Min Know Discov 16(2):197–219

    Article  MathSciNet  Google Scholar 

  36. Zaki M (2002) Efficiently mining frequent trees in a forest. In: ACM KDD, pp 71–80

  37. Zaki M, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE-TKDE 17(4):462–478

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominique Laurent.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goethals, B., Laurent, D., Le Page, W. et al. Mining frequent conjunctive queries in relational databases through dependency discovery. Knowl Inf Syst 33, 655–684 (2012). https://doi.org/10.1007/s10115-012-0526-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0526-5

Keywords