ABSTRACT
Normalization helps us find a database schema at design time that can process the most frequent updates efficiently at run time. Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present. On one hand, these features occur frequently in real-world data compliant with the industry standard SQL, and especially in modern application domains. On the other hand, the features impose challenges that have made it impossible so far to extend the existing forty year old normalization framework to SQL. We introduce a new class of functional dependencies and show that they provide the right notion for SQL schema design. Axiomatic and linear-time algorithmic characterizations of the associated implication problem are established. These foundations enable us to propose a Boyce-Codd normal form for SQL. Indeed, we justify the normal form by showing that it permits precisely those SQL instances which are free from data redundancy. Unlike the relational case, there are SQL schemata that cannot be converted into Boyce-Codd normal form. Nevertheless, for an expressive sub-class of our functional dependencies we establish a normalization algorithm that always produces a schema in Value-Redundancy free normal form. This normal form permits precisely those instances which are free from any redundant data value occurrences other than the null marker. Experiments show that our functional dependencies occur frequently in real-world data and that they are effective in eliminating redundant values from these data sets without loss of information.
- Z. Abedjan, L. Golab, and F. Naumann. Profiling relational data: a survey. VLDB J., 24(4):557--581, 2015. Google ScholarDigital Library
- M. Arenas. Normalization theory for XML. SIGMOD Record, 35(4):57--64, 2006. Google ScholarDigital Library
- W. W. Armstrong. Dependency structures of data base relationships. In IFIP Congress, pages 580--583, 1974.Google Scholar
- P. Atzeni and N. M. Morfuni. Functional dependencies and constraints on null values in database relations. Information and Control, 70(1):1--31, 1986. Google ScholarDigital Library
- C. Beeri and P. A. Bernstein. Computational problems related to the design of normal form relational schemas. ACM Trans. Database Syst., 4(1):30--59, 1979. Google ScholarDigital Library
- P. A. Bernstein and N. Goodman. What does boyce-codd normal form do? In VLDB, pages 245--259, 1980. Google ScholarDigital Library
- J. Biskup, U. Dayal, and P. A. Bernstein. Synthesizing independent database schemas. In SIGMOD, pages 143--151, 1979. Google ScholarDigital Library
- J. Biskup and T. Polle. Adding inclusion dependencies to an object-oriented data model with uniqueness constraints. Acta Inf., 39(6--7):391--449, 2003.Google ScholarCross Ref
- E. F. Codd. Recent investigations in relational data base systems. In IFIP Congress, pages 1017--1021, 1974.Google Scholar
- E. F. Codd. The Relational Model for Database Management, Version 2. Addison-Wesley, 1990. Google ScholarDigital Library
- H. Darwen, C. Date, and R. Fagin. A normal form for preventing redundant tuples in relational databases. In ICDT, pages 114--126, 2012. Google ScholarDigital Library
- J. Diederich and J. Milton. New methods and fast algorithms for database normalization. ACM Trans. Database Syst., 13(3):339--365, 1988. Google ScholarDigital Library
- R. Fagin. Multivalued dependencies and a new normal form for relational databases. ACM Trans. Database Syst., 2(3):262--278, 1977. Google ScholarDigital Library
- R. Fagin. A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst., 6(3):387--415, 1981. Google ScholarDigital Library
- F. Ferrarotti, S. Hartmann, H. Köhler, S. Link, and M. W. Vincent. The Boyce-Codd-Heath normal form for SQL. In WoLLIC, pages 110--122, 2011. Google ScholarDigital Library
- S. Hartmann and S. Link. When data dependencies over SQL tables meet the logics of paradox and S-3. In PODS, pages 317--326, 2010. Google ScholarDigital Library
- S. Hartmann and S. Link. The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst., 37(2):13, 2012. Google ScholarDigital Library
- I. J. Heath. Unacceptable file operations in a relational data base. In SIGFIDET Workshop, pages 19--33, 1971. Google ScholarDigital Library
- R. T. Herrera, J. Tekli, R. Chbeir, S. Laborie, I. Dongo, and R. Guzman. Toward RDF normalization. In ER, pages 261--275, 2015.Google Scholar
- H. Köhler and S. Link. SQL Schema Design. Technical Report TR-496, Centre for Discrete Mathematics and Theoretical Computer Science, The University of Auckland, 2016.Google Scholar
- H. Köhler, S. Link, and X. Zhou. Possible and certain SQL key. PVLDB, 8(11):1118--1129, 2015. Google ScholarDigital Library
- G. Lausen. Relational databases in RDF: Keys and foreign keys. In SWDB-ODBIS, pages 43--56, 2007. Google ScholarDigital Library
- M. Levene. The Nested Universal Relation Database Model, volume 595 of LNCS. Springer, 1992.Google ScholarCross Ref
- M. Levene and G. Loizou. Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci., 206(1--2):283--300, 1998. Google ScholarDigital Library
- M. Levene and G. Loizou. Database design for incomplete relations. ACM Trans. Database Syst., 24(1):80--125, 1999. Google ScholarDigital Library
- M. Levene and G. Loizou. A guided tour of relational databases and beyond. Springer, 1999. Google ScholarCross Ref
- M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Eng., 12(2):281--291, 2000. Google ScholarDigital Library
- Y. E. Lien. On the equivalence of database models. J. ACM, 29(2):333--362, 1982. Google ScholarDigital Library
- S. Link. Reasoning about saturated conditional independence under uncertainty. In AAAI, 2013. Google ScholarDigital Library
- J. A. Makowsky and E. V. Ravve. Dependency preserving refinements and the fundamental problem of database design. Data Knowl. Eng., 24(3):277--312, 1998. Google ScholarDigital Library
- H. Mannila and K.-J. Raiha. Design of Relational Databases. Addison-Wesley, 1992. Google ScholarDigital Library
- M. Niepert, M. Gyssens, B. Sayrafi, and D. V. Gucht. On the conditional independence implication problem: A lattice-theoretic approach. Artif. Intell., 202:29--51, 2013.Google ScholarDigital Library
- T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schönberg, J. Zwiener, and F. Naumann. Functional dependency discovery. PVLDB, 8(10):1082--1093, 2015. Google ScholarDigital Library
- J. Paredaens, P. D. Bra, M. Gyssens, and D. V. Gucht. The Structure of the Relational Database Model, volume 17 of EATCS Monographs on Theoretical Computer Science. Springer, 1989. Google ScholarDigital Library
- J. Rissanen. Independent components of relations. ACM Trans. Database Syst., 2(4):317--325, 1977. Google ScholarDigital Library
- Z. Tari, J. Stokes, and S. Spaccapietra. Object normal forms and dependency constraints for object-oriented schemata. ACM Trans. Database Syst., 22(4):513--569, 1997. Google ScholarDigital Library
- B. Thalheim. Entity-relationship modeling - foundations of database technology. Springer, 2000. Google ScholarDigital Library
- D.-M. Tsou and P. C. Fischer. Decomposition of a relation scheme into Boyce-Codd normal form. SIGACT News, 14(3):23--29, 1982. Google ScholarDigital Library
- Y. Vassiliou. Functional dependencies and incomplete information. In VLDB, pages 260--269, 1980. Google ScholarDigital Library
- M. W. Vincent. Semantic foundations of 4NF in relational database design. Acta Inf., 36(3):173--213, 1999.Google ScholarCross Ref
- M. W. Vincent, J. Liu, and C. Liu. Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst., 29(3):445--462, 2004. Google ScholarDigital Library
- C. Zaniolo. Database relations with null values. J. Comput. System Sci., 28(1):142--166, 1984.Google ScholarCross Ref
Index Terms
- SQL Schema Design: Foundations, Normal Forms, and Normalization
Recommendations
SQL: From Traditional Databases to Big Data
SIGCSE '16: Proceedings of the 47th ACM Technical Symposium on Computing Science EducationThe Structured Query Language (SQL) is the main programing language designed to manage data stored in database systems. While SQL was initially used only with relational database management systems (RDBMS), its use has been significantly extended with ...
Relational Database Schema Design for Uncertain Data
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementWe investigate the impact of uncertainty on relational data\-base schema design. Uncertainty is modeled qualitatively by assigning to tuples a degree of possibility with which they occur, and assigning to functional dependencies a degree of certainty ...
Possible and certain keys for SQL
Driven by the dominance of the relational model and the requirements of modern applications, we revisit the fundamental notion of a key in relational databases with NULL. In SQL, primary key columns are NOT NULL, and UNIQUE constraints guarantee ...
Comments