skip to main content
10.1145/2882903.2915239acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SQL Schema Design: Foundations, Normal Forms, and Normalization

Published:14 June 2016Publication History

ABSTRACT

Normalization helps us find a database schema at design time that can process the most frequent updates efficiently at run time. Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present. On one hand, these features occur frequently in real-world data compliant with the industry standard SQL, and especially in modern application domains. On the other hand, the features impose challenges that have made it impossible so far to extend the existing forty year old normalization framework to SQL. We introduce a new class of functional dependencies and show that they provide the right notion for SQL schema design. Axiomatic and linear-time algorithmic characterizations of the associated implication problem are established. These foundations enable us to propose a Boyce-Codd normal form for SQL. Indeed, we justify the normal form by showing that it permits precisely those SQL instances which are free from data redundancy. Unlike the relational case, there are SQL schemata that cannot be converted into Boyce-Codd normal form. Nevertheless, for an expressive sub-class of our functional dependencies we establish a normalization algorithm that always produces a schema in Value-Redundancy free normal form. This normal form permits precisely those instances which are free from any redundant data value occurrences other than the null marker. Experiments show that our functional dependencies occur frequently in real-world data and that they are effective in eliminating redundant values from these data sets without loss of information.

References

  1. Z. Abedjan, L. Golab, and F. Naumann. Profiling relational data: a survey. VLDB J., 24(4):557--581, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Arenas. Normalization theory for XML. SIGMOD Record, 35(4):57--64, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. W. Armstrong. Dependency structures of data base relationships. In IFIP Congress, pages 580--583, 1974.Google ScholarGoogle Scholar
  4. P. Atzeni and N. M. Morfuni. Functional dependencies and constraints on null values in database relations. Information and Control, 70(1):1--31, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Beeri and P. A. Bernstein. Computational problems related to the design of normal form relational schemas. ACM Trans. Database Syst., 4(1):30--59, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Bernstein and N. Goodman. What does boyce-codd normal form do? In VLDB, pages 245--259, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Biskup, U. Dayal, and P. A. Bernstein. Synthesizing independent database schemas. In SIGMOD, pages 143--151, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Biskup and T. Polle. Adding inclusion dependencies to an object-oriented data model with uniqueness constraints. Acta Inf., 39(6--7):391--449, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  9. E. F. Codd. Recent investigations in relational data base systems. In IFIP Congress, pages 1017--1021, 1974.Google ScholarGoogle Scholar
  10. E. F. Codd. The Relational Model for Database Management, Version 2. Addison-Wesley, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Darwen, C. Date, and R. Fagin. A normal form for preventing redundant tuples in relational databases. In ICDT, pages 114--126, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Diederich and J. Milton. New methods and fast algorithms for database normalization. ACM Trans. Database Syst., 13(3):339--365, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Fagin. Multivalued dependencies and a new normal form for relational databases. ACM Trans. Database Syst., 2(3):262--278, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Fagin. A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst., 6(3):387--415, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Ferrarotti, S. Hartmann, H. Köhler, S. Link, and M. W. Vincent. The Boyce-Codd-Heath normal form for SQL. In WoLLIC, pages 110--122, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Hartmann and S. Link. When data dependencies over SQL tables meet the logics of paradox and S-3. In PODS, pages 317--326, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Hartmann and S. Link. The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst., 37(2):13, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. J. Heath. Unacceptable file operations in a relational data base. In SIGFIDET Workshop, pages 19--33, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. T. Herrera, J. Tekli, R. Chbeir, S. Laborie, I. Dongo, and R. Guzman. Toward RDF normalization. In ER, pages 261--275, 2015.Google ScholarGoogle Scholar
  20. H. Köhler and S. Link. SQL Schema Design. Technical Report TR-496, Centre for Discrete Mathematics and Theoretical Computer Science, The University of Auckland, 2016.Google ScholarGoogle Scholar
  21. H. Köhler, S. Link, and X. Zhou. Possible and certain SQL key. PVLDB, 8(11):1118--1129, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Lausen. Relational databases in RDF: Keys and foreign keys. In SWDB-ODBIS, pages 43--56, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Levene. The Nested Universal Relation Database Model, volume 595 of LNCS. Springer, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Levene and G. Loizou. Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci., 206(1--2):283--300, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Levene and G. Loizou. Database design for incomplete relations. ACM Trans. Database Syst., 24(1):80--125, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Levene and G. Loizou. A guided tour of relational databases and beyond. Springer, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Eng., 12(2):281--291, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. E. Lien. On the equivalence of database models. J. ACM, 29(2):333--362, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Link. Reasoning about saturated conditional independence under uncertainty. In AAAI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. A. Makowsky and E. V. Ravve. Dependency preserving refinements and the fundamental problem of database design. Data Knowl. Eng., 24(3):277--312, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Mannila and K.-J. Raiha. Design of Relational Databases. Addison-Wesley, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Niepert, M. Gyssens, B. Sayrafi, and D. V. Gucht. On the conditional independence implication problem: A lattice-theoretic approach. Artif. Intell., 202:29--51, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schönberg, J. Zwiener, and F. Naumann. Functional dependency discovery. PVLDB, 8(10):1082--1093, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Paredaens, P. D. Bra, M. Gyssens, and D. V. Gucht. The Structure of the Relational Database Model, volume 17 of EATCS Monographs on Theoretical Computer Science. Springer, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Rissanen. Independent components of relations. ACM Trans. Database Syst., 2(4):317--325, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Tari, J. Stokes, and S. Spaccapietra. Object normal forms and dependency constraints for object-oriented schemata. ACM Trans. Database Syst., 22(4):513--569, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Thalheim. Entity-relationship modeling - foundations of database technology. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D.-M. Tsou and P. C. Fischer. Decomposition of a relation scheme into Boyce-Codd normal form. SIGACT News, 14(3):23--29, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Vassiliou. Functional dependencies and incomplete information. In VLDB, pages 260--269, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. W. Vincent. Semantic foundations of 4NF in relational database design. Acta Inf., 36(3):173--213, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. W. Vincent, J. Liu, and C. Liu. Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst., 29(3):445--462, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Zaniolo. Database relations with null values. J. Comput. System Sci., 28(1):142--166, 1984.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SQL Schema Design: Foundations, Normal Forms, and Normalization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
        June 2016
        2300 pages
        ISBN:9781450335317
        DOI:10.1145/2882903

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 June 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader