Abstract
We present an approach for mining frequent conjunctive in arbitrary relational databases. Our pattern class is the simple, but appealing subclass of simple conjunctive queries. Our algorithm, called Conqueror\(^+\), is capable of detecting previously unknown functional and inclusion dependencies that hold on the database relations as well as on joins of relations. These newly detected dependencies are then used to prune redundant queries. We propose an efficient database-oriented implementation of our algorithm using SQL and provide several promising experimental results.






Similar content being viewed by others
Notes
We consider in our approach that the only possible functional dependencies with an empty left-hand side are of the form \(\emptyset \rightarrow \emptyset \).
The case of projections over the emptyset is defined in [16] as being empty if the projected relation is empty, and otherwise, as containing a single specific tuple, called the empty tuple.
The source code of Conqueror\(^{+}\) can be downloaded at http://www.adrem.ua.ac.be.
References
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. In: Advances in knowledge discovery and data mining. AAAI-MIT Press, Cambridge, pp 309–328
Baixeries J (2004) A formal concept analysis framework to mine functional dependencies. In: Int. workshop on mathematical methods for learning, pp 1–9
Baixeries J (2008) A formal context for symmetric dependencies. In: Formal concept analysis, 6th international conference, ICFCA, vol 4933 of lecture notes in computer science, Springer, Berlin, pp 90–105
Baixeries J (2011) A new formal context for symmetric dependencies. In: Concept lattices and their applications, CLA, pp 333–348 (INRIA)
Bohannon P, Fan W, Geerts F, Jia X, Kementsietsidis A (2007) Conditional functional dependencies for data cleaning. In: ICDE, pp 746–755
Dehaspe L, Toivonen H (2001) Discovery of relational association rules. In: Džeroski S, Lavrač N (eds) Relational data mining. Springer, Berlin, pp 189–208
Dieng C, Jen T-Y, Laurent D (2010) An efficient computation of frequent queries in a star schema. In: DEXA 2010, vol 6262(II) of LNCS, Springer, Berlin, pp 225–239
Diop C, Giacometti A, Laurent D, Spyratos N (2002) Composition of mining contexts for efficient extraction of association rules. In: EDBT’02, vol 2287 of LNCS. Springer, Berlin, pp 106–123
Goethals B, den Bussche JV (2002) Relational association rules: getting warmer. In: ESF exploratory workshop on pattern detection and discovery in data mining, vol 2447 of LNCS, Springer, Berlin, pp 125–139
Goethals B, Hoekx E, den Bussche JV (2005) Mining tree queries in a graph. In: ACM KDD, pp 61–69
Goethals B, Laurent D, Le Page W (2010) Discovery and application of functional dependencies in conjunctive query mining. In: DAWAK 2010, vol 6263 of LNCS, Springer, Berlin, pp 142–156
Goethals B, Le Page W, Mannila H (2008) Mining association rules of simple conjunctive queries. In: SIAM-SDM, pp 96–107
Hoekx E, den Bussche JV (2006) Mining for tree-query associations in a graph. In: IEEE ICDM, pp 254–264
IMDB. http://imdb.com. 2008
Inokuchi A, Washio T, Motoda H (2000) An Apriori-based algorithm for mining frequent substructures from graph data. In: PKDD, vol 1910 of LNCS, Springer, Berlin, pp 13–23
Jen T-Y, Laurent D, Spyratos N (2008) Mining all frequent selection-projection queries from a relational table. In: EDBT’08, ACM Press, pp 368–379
Jen T-Y, Laurent D, Spyratos N (2009) Mining frequent conjunctive queries in star schemas. In: International database engineering and applications symposium (IDEAS), ACM Press, pp 97–108
Jen T-Y, Laurent D, Spyratos N, Sy O (2005) Towards mining frequent queries in star schemes. In: International workshop on knowledge discovery in databases (KDID), vol 3933 of LNCS, Springer, Berlin, pp 104–123
Jensen V, Soparker N (2000) Frequent itemset counting across multiple tables. In: PAKDD, vol 1805 of lecture notes in computer science, Springer, Berlin, pp 49–61
Kamber M, Han J, Chiang J (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: ACM KDD, pp 207–210
Knuth D (2006) The art of computer programming, vol. 4. Addison-Wesley, Reading
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: IEEE ICDM, pp 313–320
Lakhal L, Stumme G (2005) Efficient mining of association rules based on formal concept analysis. In: Formal concept analysis, vol 3626 of lecture notes in computer science. Springer, Berlin, pp 180–195
Le Page W (2009) Mining patterns in relational databases. PhD thesis, University of Antwerp, Antwerp
Lopes S, Petit J-M, Lakhal L (2002) Functional and approximative dependency mining: database and FCA points of view. J Exp Theor Artif Intell 14(2–3):93–114
Ng E, Fu A, Wang K (2002) Mining association rules from stars. In: IEEE-ICDM, pp 322–329
Nijssen S, Kok JN (2003) Efficient frequent query discovery in FARMER. In: PKDD 2003, vol 2838 of LNCS, Springer, Berlin, pp 350–362
Novelli N, Cicchetti R (2001) FUN: an efficient algorithm for mining functional and embedded dependencies. In: International conference on database theory (ICDT), vol 1973 of LNCS, Springer, Berlin, pp 189–203
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1):25–46
Plotkin G (1970) A note on inductive generalization. Mach Intell 5:153–163
Ullman J (1988–1989) Principles of databases and knowledge-base systems, vol 1–2. Computer Science Press, Rockville
Weisstein EW (2009) Restricted growth string. In: A Wolfram web resource (http://mathword.wolfram.com/RestrictedGrowthString.html)
Wyss C-M, Giannella C, Robertson E-L (2001) FastFDs: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In: DAWAK, vol 2114 of LNCS, Springer, Berlin, pp 101–110
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: IEEE ICDM, pp 721–724
Yao H, Hamilton HJ (2008) Mining functional dependencies from data. Data Min Know Discov 16(2):197–219
Zaki M (2002) Efficiently mining frequent trees in a forest. In: ACM KDD, pp 71–80
Zaki M, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE-TKDE 17(4):462–478
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Goethals, B., Laurent, D., Le Page, W. et al. Mining frequent conjunctive queries in relational databases through dependency discovery. Knowl Inf Syst 33, 655–684 (2012). https://doi.org/10.1007/s10115-012-0526-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0526-5