Skip to main content
Log in

Possible and certain keys for SQL

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Driven by the dominance of the relational model and the requirements of modern applications, we revisit the fundamental notion of a key in relational databases with NULL. In SQL, primary key columns are NOT NULL, and UNIQUE constraints guarantee uniqueness only for tuples without NULL. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that originate from an SQL table, respectively. Possible keys coincide with UNIQUE, thus providing a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns and can uniquely identify entities whenever feasible, while primary keys may not. In addition to basic characterization, axiomatization, discovery, and extremal combinatorics problems, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs occur in real-world data, and related computational problems can be solved efficiently. Certain keys are therefore semantically well founded and able to meet Codd’s entity integrity rule while handling high volumes of incomplete data from different formats.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The similar journal values of the first and second rows are not different by mistake: Our keys deal with the integrity dimension of data and not the accuracy dimension.

  2. A partition of cardinality two.

  3. w.r.t. the \(\wedge \)-support ordering \(\mathcal {W} \lesssim \mathcal {W}' :\Leftrightarrow \forall Y\in \mathcal {W} . \exists Y'\in \mathcal {W}' . Y\subseteq Y'\).

References

  1. Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)

    Article  Google Scholar 

  2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)

    MATH  Google Scholar 

  3. Aleksic, S., Celikovic, M., Link, S., Lukovic, I., Moginm, P.: Faceoff: Surrogate vs. natural keys. In: ADBIS, pp. 543–546 (2010)

  4. Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  5. Biskup, J.: Security in Computing Systems - Challenges, Approaches and Solutions. Springer, New York (2009)

    MATH  Google Scholar 

  6. Brown, P., Link, S.: Probabilistic keys for data quality management. In: CAiSE, pp. 118–132 (2015)

  7. Calì, A., Calvanese, D., Lenzerini, M.: Data integration under integrity constraints. In: Seminal Contributions to Information, Systems Engineering, pp. 335–352 (2013)

  8. Calvanese, D., Fischl, W., Pichler, R., Sallinger, E., Simkus, M.: Capturing relational schemas and functional dependencies in RDFS. In: AAAI, pp. 1003–1011 (2014)

  9. Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)

    Article  Google Scholar 

  10. Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley, Boston (1990)

    MATH  Google Scholar 

  11. David, J., Lhote, L., Mary, A., Rioult, F.: An average study of hypergraphs and their minimal transversals. Theor. Comput. Sci. 596, 124–141 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  12. Doherty. S.: The future of enterprise data. http://insights.wired.com/profiles/blogs/the-future-of-enterprise-data#axzz2owCB8FFn (2013)

  13. Eiter, T., Gottlob, G.: Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput. 24(6), 1278–1304 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  14. Engel, K.: Sperner Theory. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  15. Fagin, R.: A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst. 6(3), 387–415 (1981)

    Article  MATH  Google Scholar 

  16. Fagin, R.: Horn clauses and database dependencies. J. ACM 29(4), 952–985 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  17. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  18. Fan, W., Geerts, F., Jia, X.: A revival of constraints for data cleaning. PVLDB 1(2), 1522–1523 (2008)

    Google Scholar 

  19. Hannula, M., Kontinen, J., Link, S.: On independence atoms and keys. In: CIKM, pp. 1229–1238 (2014)

  20. Hannula, M., Kontinen, J., Link, S.: On the finite and general implication problems of independence atoms and keys. J. Comput. Syst. Sci. 82(5), 856–877 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. VLDB J. 21(1), 121–144 (2012)

    Article  Google Scholar 

  22. Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. Comput. J. 54(7), 1166–1180 (2011)

    Article  Google Scholar 

  23. Hartmann, S., Link, S.: Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst. 34(2), 10 (2009)

    Article  Google Scholar 

  24. Hartmann, S., Link, S.: The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst. 37(2), 13 (2012)

    Article  Google Scholar 

  25. Heise, A., Quiane-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. PVLDB 7(4), 301–312 (2013)

    Google Scholar 

  26. Ileana, I., Cautis, B., Deutsch, A., Katsis, Y.; Complete yet practical search for minimal query reformulations under constraints. In: SIGMOD, pp. 1015–1026 (2014)

  27. Imielinski, T., Jr, W.L.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  28. Jha, A.K., Rastogi, V., Suciu, D.: Query evaluation with soft keys. In: PODS, pp. 119–128 (2008)

  29. Köhler, H., Leck, U., Link, S., Prade, H.: Logical foundations of possibilistic keys. In: JELIA, pp. 181–195 (2014)

  30. Köhler, H., Link, S.: Inclusion dependencies reloaded. In: CIKM, pp. 1361–1370 (2015)

  31. Köhler, H., Link, S.: SQL schema design. In: SIGMOD (2016). doi:10.1145/2882903.2915239

  32. Köhler, H., Link, S., Zhou, X.: Possible and certain SQL keys. PVLDB 8(11), 1118–1129 (2015)

    Google Scholar 

  33. Koutris, P., Wijsen, J.: The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In: PODS, pp. 17–29 (2015)

  34. Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206(1–2), 283–300 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  35. Levene, M., Loizou, G.: A generalisation of entity and referential integrity. ITA 35(2), 113–127 (2001)

    MathSciNet  MATH  Google Scholar 

  36. Lien, Y.E.: On the equivalence of database models. J. ACM 29(2), 333–362 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  37. Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE TKDE 24(2), 251–264 (2012)

    Google Scholar 

  38. Mannila, H., Räihä, K.-J.: Design of Relational Databases. Addison-Wesley, Boston (1992)

    MATH  Google Scholar 

  39. Marchi, F.D., Petit, J.: Semantic sampling of existing databases through informative armstrong databases. Inf. Syst. 32(3), 446–457 (2007)

    Article  Google Scholar 

  40. Melton, J.: ISO/IEC 9075-2: 2003 (SQL/foundation). ISO standard (2003)

  41. Memari, M., Link, S.: Index design for enforcing partial referential integrity efficiently. In: EDBT, pp. 217–228 (2015)

  42. Memari, M., Link, S., Dobbie, G.: SQL data profiling of foreign keys. In: ER, pp. 229–243 (2015)

  43. Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)

    Article  Google Scholar 

  44. New Zealand Ministry of Health. National health index questions and answers. http://www.health.govt.nz/our-work/health-identity/national-health-index/nhi-information-health-consumers/national-health-index-questions-and-answers#duplicates. Accessed 1 Mar 2016

  45. Pochampally, R., Sarma, A.D., Dong, X.L.: A. Meliou, and D. Srivastava. Fusing data with correlations. In: SIGMOD, pp. 433–444 (2014)

  46. Ross, K. A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: Trading space for time. In: SIGMOD, pp. 447–458 (1996)

  47. Saha, B., Srivastava, D.: Data quality: the other face of big data. In: ICDE, pp. 1294–1297 (2014)

  48. Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp. 216–226 (1978)

  49. Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)

  50. Sperner, E.: Ein Satz über Untermengen einer endlichen Menge. Math. Z. 27, 544–548 (1928)

    Article  MathSciNet  MATH  Google Scholar 

  51. Thalheim, B.: On semantic issues connected with keys in relational databases permitting null values. Elektr. Inform. Kybern. 25(1/2), 11–20 (1989)

    MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank Mozhgan Memari for carrying out the experiments for key enforcement. This research was partially supported by the Marsden Fund Council from New Zealand Government funding, by the Natural Science Foundation of China (Grant No. 61472263) and the Australian Research Council (Grants No. DP140103171).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Link.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 175 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Köhler, H., Leck, U., Link, S. et al. Possible and certain keys for SQL. The VLDB Journal 25, 571–596 (2016). https://doi.org/10.1007/s00778-016-0430-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0430-9

Keywords

Navigation