Skip to main content
Log in

Unary and n-ary inclusion dependency discovery in relational databases

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases. Reading, MA: Addison-Wesley.

    MATH  Google Scholar 

  • Afrati, F. N., Gionis, A., & Mannila H. (2004). Approximating a collection of frequent sets. In W. Kim, R. Kohavi, J. Gehrke, & W. DuMouchel (Eds.), International conference on knowledge discovery and data mining (KDD’04) (pp. 2–19). Washington, USA: ACM.

    Google Scholar 

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In J. B. Bocca, M. Jarke, & C. Zaniolo (Eds.), International conference on very large data bases (VLDB’94) (pp. 487–499). Santiago de Chile, Chile: Morgan Kaufmann.

    Google Scholar 

  • Albrecht, M., Buchholz, E., Düsterhöft, A., & Thalheim, B. (1995). An informal and efficient approach for obtaining semantic constraints using sample data and natural language processing. In L. Libkin & B. Thalheim (Eds.), Semantics in databases (Vol. 1358, pp. 1–28). Lecture Notes in Computer Science, Springer.

  • Bauckmann, J., Leser, U., Naumann, F., & Tietz, V. (2007). Efficiently detecting inclusion dependencies. In International conference on data engineering (ICDE’07) (pp. 1448–1450). IEEE Computer Society.

  • Bay, S. D. (1999). The UCI KDD archive [http://kdd.ics.uci.edu]. Technical report, Irvine, CA: University of California, Department of Information and Computer Science.

  • Bell, S., & Brockhausen, P. (1995). Discovery of constraints and data dependencies in databases (extended abstract). In N. Lavrac & S. Wrobel (Eds.), European conference on machine learning (ECML’95) (Vol. 912, pp. 267–270). Crete, Greece: Lecture Notes in Computer Science, Springer.

  • Calders, T., & Wijsen, J. (2001). On monotone data mining languages. In G. Ghelli & G. Grahne (Eds.), International workshop on database programming languages (DBPL’01), Frascati, Italy: Springer.

    Google Scholar 

  • Casanova, M., Fagin, R., & Papadimitriou C. (1984). Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 24(1), 29–59.

    Article  MathSciNet  Google Scholar 

  • Casanova, M. A., Tucherman, L., & Furtado, A. L. (1988). Enforcing inclusion dependencies and referencial integrity. In F. Bancilhon & D. J. DeWitt (Eds.), International conference on very large data bases (VLDB’88) (pp. 38–49). Los Angeles, CA, USA: Morgan Kaufmann.

    Google Scholar 

  • Cheng, Q., Gryz, J., Koo, F., Leung, T. Y. C., Liu, L., Qian, X., et al. (1999). Implementation of two semantic query optimization techniques in DB2 universal database. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, & M. L. Brodie (Eds.), International conference on very large data bases (VLDB’99) (pp. 687–698). Edinburgh, Scotland, UK: Morgan Kaufmann.

    Google Scholar 

  • Dasu, T., Johnson, T., Muthukrishnan, S., & Shkapenyuk, V. (2002). Mining database structure; or, how to build a data quality browser. In ACM SIGMOD conference 2002 (pp. 240–251). Madison, WI, USA.

  • De Marchi, F., Lopes, S., & Petit, J.-M. (2002). Efficient algorithms for mining inclusion dependencies. In C. S. Jensen, K. G. Jeffery, J. Pokorný, S. Saltenis, E. Bertino, K. Böhm, et al. (Eds.), International conference on extending database technology (EDBT’02) (Vol. 2287, pp. 464–476). Prague, Czech Republic: Lecture Notes in Computer Science, Springer.

  • De Marchi, F., Lopes, S., Petit, J.-M., & Toumani, F. (2003). Analysis of existing databases at the logical level: The DBA companion project. ACM Sigmod Record, 32(1), 47–52.

    Article  Google Scholar 

  • De Marchi, F., & Petit, J.-M. (2003). Zigzag: A new algorithm for discovering large inclusion dependencies in relational databases. In International conference on data mining (ICDM’03) (pp. 27–34). Melbourne, FL, USA: IEEE Computer Society.

    Google Scholar 

  • De Marchi, F., & Petit, J.-M. (2005). Approximating a set of approximate inclusion dependencies. In International conference on intelligent information system (IIS’05) (pp. 633–640). Gdansk, Poland: Springer-Verlag.

    Google Scholar 

  • Ganter, B., & Wille, R. (1999). Formal concept analysis. Springer-Verlag.

  • Gryz, J. (1998). Query folding with inclusion dependencies. In International conference on data engineering (ICDE’98) (pp. 126–133). Orlando, FL, USA: IEEE Computer Society.

    Google Scholar 

  • Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. Morgan Kaufmann.

  • Huhtala, Y., Karkkainen, J, Porkka, P., & Toivonen, H. (1999). TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2), 100–111.

    Article  MATH  Google Scholar 

  • Kantola, M., Mannila, H., Raïha K. J., & Siirtola, H. (1992). Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7, 591–607.

    Article  MATH  Google Scholar 

  • Kivinen, J., & Mannila, H. (1995). Approximate inference of functional dependencies from relations. Theoretical Computer Science, 149(1), 129–149.

    Article  MATH  MathSciNet  Google Scholar 

  • Koeller, A., & Rundensteiner, E. A. (2003). Discovery of high-dimentional inclusion dependencies (Poster). In Poster session of international conference on data engineering (ICDE’03). IEEE Computer Society.

  • Levene, M., & Loizou, G. (1999). A guided tour of relational databases and beyond. Springer.

  • Levene, M., & Vincent, M. W. (2000). Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering, 12(2), 281–291.

    Article  Google Scholar 

  • Lopes, S., De Marchi, F., & Petit, J.-M. (2004). DBA companion: A tool for logical database tuning. In Demo session of international conference on data engineering (ICDE’04). http://www.isima.fr/~demarchi/dbacomp/, IEEE Computer Society.

  • Lopes, S., Petit, J.-M., & Lakhal, L. (2002a). Functional and approximate dependencies mining: Databases and FCA point of view. Special issue of JETAI, 14(2/3), 93–114.

    MATH  Google Scholar 

  • Lopes, S., Petit, J.-M., & Toumani, F. (2002b). Discovering interesting inclusion dependencies: Application to logical database tuning. Information System, 17(1), 1–19.

    Article  Google Scholar 

  • Mannila, H., & Räihä, K. J. (1994). The design of relational databases (2nd ed.). Addison-Wesley.

  • Mannila, H., & Räihä, K.-J. (1986). Inclusion dependencies in database design. In International conference on data engineering (ICDE’86) (pp. 713–718). Los Angeles, CA, USA: IEEE Computer Society.

    Google Scholar 

  • Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(1), 241–258.

    Article  Google Scholar 

  • Miller, R. J., Hernandez, M. A., Haas, L. M., Yan, L., Ho, C. T. H., Fagin, R. et al. (2001). The clio project: Managing heterogeneity. ACM SIGMOD Record, 30(1), 78–83.

    Article  Google Scholar 

  • Mitchell, J. C. (1983). The implication problem for functional and inclusion dependencies. Information and Control, 56(3), 154–173.

    Article  MATH  MathSciNet  Google Scholar 

  • Novelli, N., & Cicchetti, R. (2001). Functional and embedded dependency inference: A data mining point of view. Information System, 26(7), 477–506.

    Article  MATH  Google Scholar 

  • Sarawagi, S., Thomas, S., & Agrawal, R. (2000). Integrating association rule mining with relational database systems: Alternatives and implications. Data Mining and Knowledge Discovery, 4(2/3), 89–125.

    Article  Google Scholar 

  • Wyss, C., Giannella, C., & Robertson, E. (2001). FastFDs: A heuristic-driven depth-first algorithm for mining functional dependencies from relation instances. In Y. Kambayashi, W. Winiwarter, & M. Arikawa (Eds.), Data warehousing and knowledge discovery (DaWaK’01) (Vol. 2114, pp. 101–110). Munich, Germany: Lecture Notes in Computer Science.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabien De Marchi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marchi, F.D., Lopes, S. & Petit, JM. Unary and n-ary inclusion dependency discovery in relational databases. J Intell Inf Syst 32, 53–73 (2009). https://doi.org/10.1007/s10844-007-0048-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-007-0048-x

Keywords

Navigation