Unary and n-ary inclusion dependency discovery in relational databases

Marchi, Fabien De; Lopes, Stéphane; Petit, Jean-Marc

doi:10.1007/s10844-007-0048-x

Unary and n-ary inclusion dependency discovery in relational databases

Published: 26 January 2008

Volume 32, pages 53–73, (2009)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Fabien De Marchi¹,
Stéphane Lopes² &
Jean-Marc Petit³

260 Accesses
39 Citations
3 Altmetric
Explore all metrics

Abstract

Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases. Reading, MA: Addison-Wesley.
MATH Google Scholar
Afrati, F. N., Gionis, A., & Mannila H. (2004). Approximating a collection of frequent sets. In W. Kim, R. Kohavi, J. Gehrke, & W. DuMouchel (Eds.), International conference on knowledge discovery and data mining (KDD’04) (pp. 2–19). Washington, USA: ACM.
Google Scholar
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In J. B. Bocca, M. Jarke, & C. Zaniolo (Eds.), International conference on very large data bases (VLDB’94) (pp. 487–499). Santiago de Chile, Chile: Morgan Kaufmann.
Google Scholar
Albrecht, M., Buchholz, E., Düsterhöft, A., & Thalheim, B. (1995). An informal and efficient approach for obtaining semantic constraints using sample data and natural language processing. In L. Libkin & B. Thalheim (Eds.), Semantics in databases (Vol. 1358, pp. 1–28). Lecture Notes in Computer Science, Springer.
Bauckmann, J., Leser, U., Naumann, F., & Tietz, V. (2007). Efficiently detecting inclusion dependencies. In International conference on data engineering (ICDE’07) (pp. 1448–1450). IEEE Computer Society.
Bay, S. D. (1999). The UCI KDD archive [http://kdd.ics.uci.edu]. Technical report, Irvine, CA: University of California, Department of Information and Computer Science.
Bell, S., & Brockhausen, P. (1995). Discovery of constraints and data dependencies in databases (extended abstract). In N. Lavrac & S. Wrobel (Eds.), European conference on machine learning (ECML’95) (Vol. 912, pp. 267–270). Crete, Greece: Lecture Notes in Computer Science, Springer.
Calders, T., & Wijsen, J. (2001). On monotone data mining languages. In G. Ghelli & G. Grahne (Eds.), International workshop on database programming languages (DBPL’01), Frascati, Italy: Springer.
Google Scholar
Casanova, M., Fagin, R., & Papadimitriou C. (1984). Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 24(1), 29–59.
Article MathSciNet Google Scholar
Casanova, M. A., Tucherman, L., & Furtado, A. L. (1988). Enforcing inclusion dependencies and referencial integrity. In F. Bancilhon & D. J. DeWitt (Eds.), International conference on very large data bases (VLDB’88) (pp. 38–49). Los Angeles, CA, USA: Morgan Kaufmann.
Google Scholar
Cheng, Q., Gryz, J., Koo, F., Leung, T. Y. C., Liu, L., Qian, X., et al. (1999). Implementation of two semantic query optimization techniques in DB2 universal database. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, & M. L. Brodie (Eds.), International conference on very large data bases (VLDB’99) (pp. 687–698). Edinburgh, Scotland, UK: Morgan Kaufmann.
Google Scholar
Dasu, T., Johnson, T., Muthukrishnan, S., & Shkapenyuk, V. (2002). Mining database structure; or, how to build a data quality browser. In ACM SIGMOD conference 2002 (pp. 240–251). Madison, WI, USA.
De Marchi, F., Lopes, S., & Petit, J.-M. (2002). Efficient algorithms for mining inclusion dependencies. In C. S. Jensen, K. G. Jeffery, J. Pokorný, S. Saltenis, E. Bertino, K. Böhm, et al. (Eds.), International conference on extending database technology (EDBT’02) (Vol. 2287, pp. 464–476). Prague, Czech Republic: Lecture Notes in Computer Science, Springer.
De Marchi, F., Lopes, S., Petit, J.-M., & Toumani, F. (2003). Analysis of existing databases at the logical level: The DBA companion project. ACM Sigmod Record, 32(1), 47–52.
Article Google Scholar
De Marchi, F., & Petit, J.-M. (2003). Zigzag: A new algorithm for discovering large inclusion dependencies in relational databases. In International conference on data mining (ICDM’03) (pp. 27–34). Melbourne, FL, USA: IEEE Computer Society.
Google Scholar
De Marchi, F., & Petit, J.-M. (2005). Approximating a set of approximate inclusion dependencies. In International conference on intelligent information system (IIS’05) (pp. 633–640). Gdansk, Poland: Springer-Verlag.
Google Scholar
Ganter, B., & Wille, R. (1999). Formal concept analysis. Springer-Verlag.
Gryz, J. (1998). Query folding with inclusion dependencies. In International conference on data engineering (ICDE’98) (pp. 126–133). Orlando, FL, USA: IEEE Computer Society.
Google Scholar
Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. Morgan Kaufmann.
Huhtala, Y., Karkkainen, J, Porkka, P., & Toivonen, H. (1999). TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2), 100–111.
Article MATH Google Scholar
Kantola, M., Mannila, H., Raïha K. J., & Siirtola, H. (1992). Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7, 591–607.
Article MATH Google Scholar
Kivinen, J., & Mannila, H. (1995). Approximate inference of functional dependencies from relations. Theoretical Computer Science, 149(1), 129–149.
Article MATH MathSciNet Google Scholar
Koeller, A., & Rundensteiner, E. A. (2003). Discovery of high-dimentional inclusion dependencies (Poster). In Poster session of international conference on data engineering (ICDE’03). IEEE Computer Society.
Levene, M., & Loizou, G. (1999). A guided tour of relational databases and beyond. Springer.
Levene, M., & Vincent, M. W. (2000). Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering, 12(2), 281–291.
Article Google Scholar
Lopes, S., De Marchi, F., & Petit, J.-M. (2004). DBA companion: A tool for logical database tuning. In Demo session of international conference on data engineering (ICDE’04). http://www.isima.fr/~demarchi/dbacomp/, IEEE Computer Society.
Lopes, S., Petit, J.-M., & Lakhal, L. (2002a). Functional and approximate dependencies mining: Databases and FCA point of view. Special issue of JETAI, 14(2/3), 93–114.
MATH Google Scholar
Lopes, S., Petit, J.-M., & Toumani, F. (2002b). Discovering interesting inclusion dependencies: Application to logical database tuning. Information System, 17(1), 1–19.
Article Google Scholar
Mannila, H., & Räihä, K. J. (1994). The design of relational databases (2nd ed.). Addison-Wesley.
Mannila, H., & Räihä, K.-J. (1986). Inclusion dependencies in database design. In International conference on data engineering (ICDE’86) (pp. 713–718). Los Angeles, CA, USA: IEEE Computer Society.
Google Scholar
Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(1), 241–258.
Article Google Scholar
Miller, R. J., Hernandez, M. A., Haas, L. M., Yan, L., Ho, C. T. H., Fagin, R. et al. (2001). The clio project: Managing heterogeneity. ACM SIGMOD Record, 30(1), 78–83.
Article Google Scholar
Mitchell, J. C. (1983). The implication problem for functional and inclusion dependencies. Information and Control, 56(3), 154–173.
Article MATH MathSciNet Google Scholar
Novelli, N., & Cicchetti, R. (2001). Functional and embedded dependency inference: A data mining point of view. Information System, 26(7), 477–506.
Article MATH Google Scholar
Sarawagi, S., Thomas, S., & Agrawal, R. (2000). Integrating association rule mining with relational database systems: Alternatives and implications. Data Mining and Knowledge Discovery, 4(2/3), 89–125.
Article Google Scholar
Wyss, C., Giannella, C., & Robertson, E. (2001). FastFDs: A heuristic-driven depth-first algorithm for mining functional dependencies from relation instances. In Y. Kambayashi, W. Winiwarter, & M. Arikawa (Eds.), Data warehousing and knowledge discovery (DaWaK’01) (Vol. 2114, pp. 101–110). Munich, Germany: Lecture Notes in Computer Science.

Download references

Author information

Authors and Affiliations

Laboratoire LIRIS, Université de LYON, Université LYON 1, CNRS UMR-5205, 69621, Villeurbanne, France
Fabien De Marchi
Laboratoire PRISM, Université de Versailles Saint-Quentin en Yvelines, CNRS UMR-8144, 78035, Versailles Cedex, France
Stéphane Lopes
Laboratoire LIRIS, Université de LYON, INSA de Lyon, CNRS UMR-5205, 69621, Villeurbanne, France
Jean-Marc Petit

Authors

Fabien De Marchi
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Petit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabien De Marchi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marchi, F.D., Lopes, S. & Petit, JM. Unary and n-ary inclusion dependency discovery in relational databases. J Intell Inf Syst 32, 53–73 (2009). https://doi.org/10.1007/s10844-007-0048-x

Download citation

Received: 04 September 2003
Revised: 30 August 2007
Accepted: 18 September 2007
Published: 26 January 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s10844-007-0048-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unary and n-ary inclusion dependency discovery in relational databases

Abstract

Access this article

Similar content being viewed by others

Holistic primary key and foreign key detection

Detecting Maximum Inclusion Dependencies without Candidate Generation

Incrementally updating unary inclusion dependencies in dynamic data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unary and n-ary inclusion dependency discovery in relational databases

Abstract

Access this article

Similar content being viewed by others

Holistic primary key and foreign key detection

Detecting Maximum Inclusion Dependencies without Candidate Generation

Incrementally updating unary inclusion dependencies in dynamic data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation