Abstract
We study the combined class of possible keys and functional dependencies over Codd tables under NOT NULL constraints. These constraints can express significant application semantics of data that is compliant with the industry standard SQL. Three major contributions are made. Firstly, the PTIME-complete implication problem is characterized axiomatically by a finite set of Horn rules, algorithmically by a linear time decision procedure, and logically by goal and definite clauses under S-3 logic. Secondly, we establish structural and computational properties of Armstrong tables for the combined class. This includes an algorithm that is guaranteed to compute an Armstrong table of size at most quadratic in that of a minimum-sized table. Thirdly, we conduct an empirical evaluation of a GUI-based implementation of this algorithm. We conclude that Armstrong tables are an effective tool for recognizing domain semantics and should be exploited as early as possible during the acquisition of possible keys and functional dependencies.
Similar content being viewed by others
References
Alattar M, Sali A (2019) Keys in relational databases with nulls and bounded domains. In: Advances in databases and information systems—23rd European conference, ADBIS 2019, Bled, Slovenia, September 8–11, 2019, Proceedings. pp 33–50
Alattar M, Sali A (2020) Functional dependencies in incomplete databases with limited domains. In: Foundations of information and knowledge systems: 11th international symposium, FoIKS 2020, Dortmund, Germany, February 17–21, 2020, Proceedings. pp 1–21
Alattar M, Sali A (2020) Strongly possible keys for SQL. J Data Semant 9(2–3):85–99
Alattar M, Sali A (2021) Toward an axiomatization of strongly possible functional dependencies. Vietnam J Comput Sci 8(1):133–151
Armstrong WW (1974) Dependency structures of database relationships. Inf Process 74:580–583
Atzeni P, Morfuni N (1986) Functional dependencies and constraints on null values in database relations. Inf Control 70(1):1–31
Beeri C, Dowd M, Fagin R, Statman R (1984) On the structure of Armstrong relations for functional dependencies. J ACM 31(1):30–46
Börger E, Grädel E, Gurevich Y (1997) The classical decision problem. Perspectives in mathematical logic. Springer, Berlin
Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13(6):377–387
Codd EF (1979) Extending the database relational model to capture more meaning. ACM Trans Database Syst 4(4):397–434
Date CJ (1982) Null values in database management. In: Proceedings of the second British national conference on databases (BNCOD-2), University of Bristol, UK, 7–9 July 1982. pp 147–166
Date CJ, Darwen H (1997) A guide to SQL standard, 4th edn. Addison-Wesley, Boston
Demetrovics J, Füredi Z, Katona GOH (1985) Minimum matrix representation of closure operations. Discrete Appl Math 11(2):115–128
Demetrovics J, Katona GOH (1981) Extremal combinatorial problems in relational data base. In: Fundamentals of computation theory, FCT’81, proceedings of the 1981 international FCT-conference, Szeged, Hungary, August 24–28, 1981. pp 110–119
Demetrovics J, Katona GOH (1993) A survey of some combinatorial results concerning functional dependencies in database relations. Ann Math Artif Intell 7(1–4):63–82
Diederich J, Milton J (1988) New methods and fast algorithms for database normalization. ACM Trans Database Syst 13(3):339–365
Dowling WF, Gallier JH (1984) Linear-time algorithms for testing the satisfiability of propositional horn formulae. J Log Program 1(3):267–284
Fagin R (1977) Functional dependencies in a relational data base and propositional logic. IBM J Res Dev 21(6):543–544
Fagin R (1977) Multivalued dependencies and a new normal form for relational databases. ACM Trans Database Syst 2(3):262–278
Fagin R (1982) Armstrong databases. Tech. Rep. RJ3440(40926), IBM Research Laboratory, San Jose, California, USA
Ferrarotti F, Hartmann S, Le V, Link S (2011) Codd table representations under weak possible world semantics. In: Hameurlain A, Liddle SW, Schewe KD, Zhou X (eds) DEXA 2011, Part I, LNCS, vol 6860. Springer, Heidelberg, pp 125–139
Gottlob G, Pichler R, Wei F (2010) Tractable database design through bounded treewidth. Inf Syst 35(3):278–298
Grant J (2008) Null values in SQL. SIGMOD Rec 37(3):23–25
Guagliardo P, Libkin L (2017) Correctness of SQL queries on databases with nulls. SIGMOD Rec 46(3):5–16
Hall N, Köhler H, Link S, Prade H, Zhou X (2015) Cardinality constraints on qualitatively uncertain data. Data Knowl Eng 99:126–150
Hannula M, Link S (2018) Automated reasoning about key sets. In: Automated reasoning—9th international joint conference, IJCAR 2018, held as part of the federated logic conference, FloC 2018, Oxford, UK, July 14–17, 2018, Proceedings. pp 47–63
Hartmann S (2001) On the implication problem for cardinality constraints and functional dependencies. Ann Math Artif Intell 33(2–4):253–307
Hartmann S, Kirchberg M, Link S (2012) Design by example for SQL table definitions with functional dependencies. VLDB J 21(1):121–144
Hartmann S, Link S (2010) When data dependencies over SQL tables meet the Logics of Paradox and S-3. In: Proceedings to the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS). pp 317–326
Hartmann S, Link S (2012) The implication problem of data dependencies over SQL table definitions. ACM Trans Database Syst 37(2):13
Imielinski T, Lipski Jr W (1983) Incomplete information and dependencies in relational databases. In: SIGMOD’83, proceedings of annual meeting, San Jose, California, USA, May 23–26, 1983. pp 178–184
Imielinski T, Lipski W Jr (1984) Incomplete information in relational databases. J ACM 31(4):761–791
Jones ND, Laaser WT (1976) Complete problems for deterministic polynomial time. Theor Comput Sci 3(1):105–117
Jr WL (1979) On semantic issues connected with incomplete information databases. ACM Trans Database Syst 4(3):262–296
Katona GOH (1992) Combinatorial and algebraic results for database relations. In: Database theory—ICDT’92, 4th international conference, Berlin, Germany, October 14–16, 1992, Proceedings. pp 1–20
Köhler H, Leck U, Link S, Zhou X (2016) Possible and certain keys for SQL. VLDB J 25(4):571–596
Köhler H, Link S (2010) Armstrong axioms and Boyce–Codd-heath normal form under bag semantics. Inf Process Lett 110(16):717–724
Köhler H, Link S (2016) SQL schema design: foundations, normal forms, and normalization. In: Proceedings of the 2016 international conference on management of data, SIGMOD conference 2016, San Francisco, CA, USA, June 26–July 01, 2016. pp 267–279
Köhler H, Link S (2018) SQL schema design: foundations, normal forms, and normalization. Inf Syst 76:88–113
Köhler H, Link S, Zhou X (2015) Possible and certain SQL keys. PVLDB 8(11):1118–1129
Langeveldt WD, Link S (2010) Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies. Inf Syst 35(3):352–374
Le V, Link S, Ferrarotti F (2013) Effective recognition and visualization of semantic requirements by perfect SQL samples. In: Ng W, Storey VC, Trujillo J (eds) Conceptual modeling—32th international conference, ER 2013, Hong-Kong, China, November 11–13, 2013. Proceedings, lecture notes in computer science, vol 8217. Springer, pp 227–240
Le V, Link S, Ferrarotti F (2015) Empirical evidence for the usefulness of Armstrong tables in the acquisition of semantically meaningful SQL constraints. Data Knowl Eng 98:74–103
Levene M, Loizou G (1998) Axiomatisation of functional dependencies in incomplete relations. Theor Comput Sci 206(1–2):283–300
Liddle SW, Embley DW, Woodfield SN (1993) Cardinality constraints in semantic data models. Data Knowl Eng 11(3):235–270
Lien E (1982) On the equivalence of database models. J ACM 29(2):333–362
Link S (2008) Charting the completeness frontier of inference systems for multivalued dependencies. Acta Inf 45(7–8):565–591
Link S (2008) On the implication of multivalued dependencies in partial database relations. Int J Found Comput Sci 19(3):691–715
Link S (2012) Characterisations of multivalued dependency implication over undetermined universes. J Comput Syst Sci 78(4):1026–1044
Link S, Wei Z (2021) Logical schema design that quantifies update inefficiency and join efficiency. In: SIGMOD ’21: international conference on management of data, virtual event, China, June 20–25, 2021. pp 1169–1181
Maier D (1983) The theory of relational databases. Computer Science Press, Rockville
Mannila H, Räihä KJ (1986) Design by example: an application of Armstrong relations. J Comput Syst Sci 33(2):126–141
Nguyen C (1987) Ein algebraischer Zugang zu mehrwertigen Logiken und Fuzzy-Logiken und ihre Anwendung auf relationale Datenbanken (mit unvollständigen Informationen). Ph.D. thesis, Dresden University of Technology, Germany
Papenbrock T, Ehrlich J, Marten J, Neubert T, Rudolph J, Schönberg M, Zwiener J, Naumann F (2015) Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10):1082–1093
Roblot T, Hannula M, Link S (2018) Probabilistic cardinality constraints—validation, reasoning, and semantic summaries. VLDB J 27(6):771–795
Schaerf M, Cadoli M (1995) Tractable reasoning via approximation. Artif Intell 74:249–310
Thalheim B (1989) On semantic issues connected with keys in relational databases permitting null values. Elektron Informationsverarb Kybern 25(1–2):11–20
Thalheim B (1991) Dependencies in relational databases. Teubner, Stuttgart
Thalheim B (1992) Fundamentals of cardinality constraints. In: Pernul G, Min Tjoa A (eds) Entity-relationship approach—ER’92, 11th international conference on the entity-relationship approach, Karlsruhe, Germany, October 7–9, 1992, Proceedings, vol 645. Lecture notes in computer science. Springer, Heidelberg, pp 7–23
Thalheim B, Schewe K (2010) NULL ’value’ algebras and logics. In: Information modelling and knowledge bases XXII, 20th European–Japanese conference on information modelling and knowledge bases (EJC 2010), Jyväskylä, Finland, 31 May–4 June 2010. pp 354–367
Tichler K (2004) Extremal theorems for databases. Ann Math Artif Intell 40(1–2):165–182
Vincent M, Liu J, Liu C (2004) Strong functional dependencies and their application to normal forms in XML. ACM Trans Database Syst 29(3):445–462
Wei Z, Leck U, Link S (2019) Discovery and ranking of embedded uniqueness constraints. Proc VLDB Endow 12(13):2339–2352
Wei Z, Link S (2019) Embedded functional dependencies and data-completeness tailored database design. PVLDB 12(11):1458–1470
Wei Z, Link S (2021) Embedded functional dependencies and data-completeness tailored database design. ACM Trans Database Syst 46(2):7:1–7:6
Zaniolo C (1984) Database relations with null values. J Comput Syst Sci 28(1):142–166
Acknowledgements
This research is supported by the Marsden Fund Council from New Zealand Government funding, and by the Natural Science Foundation of China (Grant No. 61472263).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Koehler, H., Le, V. & Link, S. Possible Keys and Functional Dependencies. J Data Semant 10, 327–366 (2021). https://doi.org/10.1007/s13740-021-00135-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-021-00135-w