Abstract
Driven by the dominance of the relational model and the requirements of modern applications, we revisit the fundamental notion of a key in relational databases with NULL. In SQL, primary key columns are NOT NULL, and UNIQUE constraints guarantee uniqueness only for tuples without NULL. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that originate from an SQL table, respectively. Possible keys coincide with UNIQUE, thus providing a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns and can uniquely identify entities whenever feasible, while primary keys may not. In addition to basic characterization, axiomatization, discovery, and extremal combinatorics problems, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs occur in real-world data, and related computational problems can be solved efficiently. Certain keys are therefore semantically well founded and able to meet Codd’s entity integrity rule while handling high volumes of incomplete data from different formats.
Similar content being viewed by others
Notes
The similar journal values of the first and second rows are not different by mistake: Our keys deal with the integrity dimension of data and not the accuracy dimension.
A partition of cardinality two.
w.r.t. the \(\wedge \)-support ordering \(\mathcal {W} \lesssim \mathcal {W}' :\Leftrightarrow \forall Y\in \mathcal {W} . \exists Y'\in \mathcal {W}' . Y\subseteq Y'\).
References
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)
Aleksic, S., Celikovic, M., Link, S., Lukovic, I., Moginm, P.: Faceoff: Surrogate vs. natural keys. In: ADBIS, pp. 543–546 (2010)
Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)
Biskup, J.: Security in Computing Systems - Challenges, Approaches and Solutions. Springer, New York (2009)
Brown, P., Link, S.: Probabilistic keys for data quality management. In: CAiSE, pp. 118–132 (2015)
Calì, A., Calvanese, D., Lenzerini, M.: Data integration under integrity constraints. In: Seminal Contributions to Information, Systems Engineering, pp. 335–352 (2013)
Calvanese, D., Fischl, W., Pichler, R., Sallinger, E., Simkus, M.: Capturing relational schemas and functional dependencies in RDFS. In: AAAI, pp. 1003–1011 (2014)
Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)
Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley, Boston (1990)
David, J., Lhote, L., Mary, A., Rioult, F.: An average study of hypergraphs and their minimal transversals. Theor. Comput. Sci. 596, 124–141 (2015)
Doherty. S.: The future of enterprise data. http://insights.wired.com/profiles/blogs/the-future-of-enterprise-data#axzz2owCB8FFn (2013)
Eiter, T., Gottlob, G.: Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput. 24(6), 1278–1304 (1995)
Engel, K.: Sperner Theory. Cambridge University Press, Cambridge (1997)
Fagin, R.: A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst. 6(3), 387–415 (1981)
Fagin, R.: Horn clauses and database dependencies. J. ACM 29(4), 952–985 (1982)
Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)
Fan, W., Geerts, F., Jia, X.: A revival of constraints for data cleaning. PVLDB 1(2), 1522–1523 (2008)
Hannula, M., Kontinen, J., Link, S.: On independence atoms and keys. In: CIKM, pp. 1229–1238 (2014)
Hannula, M., Kontinen, J., Link, S.: On the finite and general implication problems of independence atoms and keys. J. Comput. Syst. Sci. 82(5), 856–877 (2016)
Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. VLDB J. 21(1), 121–144 (2012)
Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. Comput. J. 54(7), 1166–1180 (2011)
Hartmann, S., Link, S.: Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst. 34(2), 10 (2009)
Hartmann, S., Link, S.: The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst. 37(2), 13 (2012)
Heise, A., Quiane-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. PVLDB 7(4), 301–312 (2013)
Ileana, I., Cautis, B., Deutsch, A., Katsis, Y.; Complete yet practical search for minimal query reformulations under constraints. In: SIGMOD, pp. 1015–1026 (2014)
Imielinski, T., Jr, W.L.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)
Jha, A.K., Rastogi, V., Suciu, D.: Query evaluation with soft keys. In: PODS, pp. 119–128 (2008)
Köhler, H., Leck, U., Link, S., Prade, H.: Logical foundations of possibilistic keys. In: JELIA, pp. 181–195 (2014)
Köhler, H., Link, S.: Inclusion dependencies reloaded. In: CIKM, pp. 1361–1370 (2015)
Köhler, H., Link, S.: SQL schema design. In: SIGMOD (2016). doi:10.1145/2882903.2915239
Köhler, H., Link, S., Zhou, X.: Possible and certain SQL keys. PVLDB 8(11), 1118–1129 (2015)
Koutris, P., Wijsen, J.: The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In: PODS, pp. 17–29 (2015)
Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206(1–2), 283–300 (1998)
Levene, M., Loizou, G.: A generalisation of entity and referential integrity. ITA 35(2), 113–127 (2001)
Lien, Y.E.: On the equivalence of database models. J. ACM 29(2), 333–362 (1982)
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE TKDE 24(2), 251–264 (2012)
Mannila, H., Räihä, K.-J.: Design of Relational Databases. Addison-Wesley, Boston (1992)
Marchi, F.D., Petit, J.: Semantic sampling of existing databases through informative armstrong databases. Inf. Syst. 32(3), 446–457 (2007)
Melton, J.: ISO/IEC 9075-2: 2003 (SQL/foundation). ISO standard (2003)
Memari, M., Link, S.: Index design for enforcing partial referential integrity efficiently. In: EDBT, pp. 217–228 (2015)
Memari, M., Link, S., Dobbie, G.: SQL data profiling of foreign keys. In: ER, pp. 229–243 (2015)
Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)
New Zealand Ministry of Health. National health index questions and answers. http://www.health.govt.nz/our-work/health-identity/national-health-index/nhi-information-health-consumers/national-health-index-questions-and-answers#duplicates. Accessed 1 Mar 2016
Pochampally, R., Sarma, A.D., Dong, X.L.: A. Meliou, and D. Srivastava. Fusing data with correlations. In: SIGMOD, pp. 433–444 (2014)
Ross, K. A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: Trading space for time. In: SIGMOD, pp. 447–458 (1996)
Saha, B., Srivastava, D.: Data quality: the other face of big data. In: ICDE, pp. 1294–1297 (2014)
Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp. 216–226 (1978)
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)
Sperner, E.: Ein Satz über Untermengen einer endlichen Menge. Math. Z. 27, 544–548 (1928)
Thalheim, B.: On semantic issues connected with keys in relational databases permitting null values. Elektr. Inform. Kybern. 25(1/2), 11–20 (1989)
Acknowledgments
We thank Mozhgan Memari for carrying out the experiments for key enforcement. This research was partially supported by the Marsden Fund Council from New Zealand Government funding, by the Natural Science Foundation of China (Grant No. 61472263) and the Australian Research Council (Grants No. DP140103171).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Köhler, H., Leck, U., Link, S. et al. Possible and certain keys for SQL. The VLDB Journal 25, 571–596 (2016). https://doi.org/10.1007/s00778-016-0430-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-016-0430-9