skip to main content
10.1145/3448016.3459238acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Logical Schema Design that Quantifies Update Inefficiency and Join Efficiency

Published: 18 June 2021 Publication History

Abstract

The goal of classical normalization is to maintain data consistency under updates, with a minimum level of effort. Given functional dependencies (FDs) alone, this goal is only achievable in the special case an FD-preserving Boyce-Codd Normal Form (BCNF) decomposition exists. As we show, in all other cases the level of effort can be neither controlled nor quantified. In response, we establish the l-Bounded Cardinality Normal Form, parameterized by a positive integer l. For every l, the normal form condition requires from every instance that every value combination over the left-hand side of every non-trivial FD does not occur in more than l tuples. BCNF is captured when l=1. We demonstrate that schemata in this normal form characterize the instances that are i) free from level l data redundancy and update inefficiency, and ii) permit level l join efficiency. We establish algorithms that compute schemata in l-Bounded Cardinality Normal Form for the smallest level l attainable across all FD-preserving decompositions. Additional algorithms i) attain even smaller levels of effort based on the loss of some FDs, and ii) decompose schemata based on prioritized FDs that cause high levels of effort. Our framework informs de-normalization already during logical design. In particular, level l quantifies both the incremental maintenance and join support of materialized views. Experiments with synthetic and real-world data illustrate which properties the schemata have that result from our algorithms, and how these properties predict the performance of update and query operations on instances over the schemata, without and with materialized views.

Supplementary Material

MP4 File (3448016.3459238.mp4)
An open challenge for logical database design is to compute schemata that enjoy semantically justified characteristics in terms of their update and join efficiency. We introduce the level l of data redundancy as the number of tuples in which a data value can be redundant in every instance of the schema. Each level is a positive integer shown to quantify join efficiency and update inefficiency. Previous work has only addressed the case l=1, where updates are most efficient and joins least efficient. With functional dependencies (FDs) alone, classical normalization cannot provide any a priori upper bound on the join efficiency, Boyce-Codd Normal Form (BCNF) falls short of measuring the level of data redundancy by FDs lost during decomposition, and Third Normal Form falls short of bounding the level of data redundancy for non-key FDs. In response, our new framework positions schema design as the problem of finding a level of data redundancy that balances update and join efficiency. We show that Chen's cardinality constraints (CCs) capture the level of data redundancy at schema design time. We establish the infinite strict hierarchy of l-Bounded Cardinality Normal Forms that characterize schemata where every instance will be i) free from level l data redundancy, ii) free from level l update inefficiency and iii) permit a level l of join efficiency. The normal form is natural as it requires every value combination over the left-hand side of every non-trivial FD to not occur in more than l tuples of every instance. The interaction of CCs and FDs facilitates efficient testing whether given schemata are in l-Bounded Cardinality Normal Form. BCNF is captured as the special case where l=1. Experiments with real-world schemata and data show the various trade-offs that our normal forms achieve, including update and join efficiency. We demonstrate how CCs guide schema design in achieving different performance levels for updates and joins.

References

[1]
Marcelo Arenas. 2006. Normalization theory for XML . SIGMOD Record, Vol. 35, 4 (2006), 57--64.
[2]
William Ward Armstrong. 1974. Dependency Structures of Data Base Relationships. In IFIP congress, Vol. 74. Geneva, Switzerland, 580--583.
[3]
Catriel Beeri and Philip A Bernstein. 1979. Computational problems related to the design of normal form relational schemas. ACM Trans. Database Syst., Vol. 4, 1 (1979), 30--59.
[4]
Catriel Beeri, Philip A Bernstein, and Nathan Goodman. 1978. A sophisticate's introduction to database normalization theory. In VLDB. 113--124.
[5]
Philip A. Bernstein and Nathan Goodman. 1980. What does Boyce-Codd Normal Form Do?. In VLDB. 245--259.
[6]
Joachim Biskup, Umeshwar Dayal, and Philip A. Bernstein. 1979. Synthesizing Independent Database Schemas. In SIGMOD. 143--151.
[7]
Douglas B. Bock and John F. Schrage. 2002. Denormalization guidelines for base and transaction tables. ACM SIGCSE Bull., Vol. 34, 4 (2002), 129--133.
[8]
Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas. 2006. Generating Queries with Cardinality Constraints for DBMS Testing. IEEE Trans. Knowl. Data Eng., Vol. 18, 12 (2006), 1721--1725.
[9]
Peter Chen. 1976. The entity-relationship model-toward a unified view of data. ACM Trans. Database Syst., Vol. 1, 1 (1976), 9--36.
[10]
Wenguang Chen, Wenfei Fan, and Shuai Ma. 2009. Incorporating cardinality constraints and synonym rules into conditional functional dependencies. Inf. Process. Lett., Vol. 109, 14 (2009), 783--789.
[11]
Rada Chirkova and Jun Yang. 2012. Materialized Views. Found. Trends Databases, Vol. 4, 4 (2012), 295--405.
[12]
Graham Cormode, Divesh Srivastava, Entong Shen, and Ting Yu. 2012. Aggregate Query Answering on Possibilistic Data with Cardinality Constraints. In ICDE. 258--269.
[13]
Faiz Currim and Sudha Ram. 2008. Conceptually modeling windows and bounds for space and time in database constraints. Commun. ACM, Vol. 51, 11 (2008), 125--129.
[14]
Ronald Fagin. 1977. Multivalued Dependencies and a New Normal Form for Relational Databases. ACM Trans. Database Syst., Vol. 2, 3 (1977), 262--278.
[15]
Ronald Fagin. 1979. Normal Forms and Relational Database Operators. In SIGMOD. 153--160.
[16]
Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional Dependencies for Graphs. In SIGMOD. 1843--1857.
[17]
Flavio Ferrarotti, Sven Hartmann, and Sebastian Link. 2013. Efficiency frontiers of XML cardinality constraints. Data Knowl. Eng., Vol. 87 (2013), 297--319.
[18]
John Grant and Jack Minker. 1985 a. Inferences for Numerical Dependencies. Theor. Comput. Sci., Vol. 41 (1985), 271--287.
[19]
John Grant and Jack Minker. 1985 b. Normalization and Axiomatization for Numerical Dependencies. Inf. Control., Vol. 65, 1 (1985), 1--17.
[20]
Neil Hall, Henning Kö hler, Sebastian Link, Henri Prade, and Xiaofang Zhou. 2015. Cardinality constraints on qualitatively uncertain data. Data Knowl. Eng., Vol. 99 (2015), 126--150.
[21]
Sven Hartmann. 2003. Reasoning about participation constraints and Chen's constraints. In ADC. 105--113.
[22]
Yk"a Huhtala, Juha K"a rkk"a inen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. Comput. J., Vol. 42, 2 (1999), 100--111.
[23]
Christian S. Jensen, Richard T. Snodgrass, and Michael D. Soo. 1996. Extending Existing Dependency Theory to Temporal Databases. IEEE Trans. Knowl. Data Eng., Vol. 8, 4 (1996), 563--582.
[24]
Vitaliy L. Khizder and Grant E. Weddell. 2003. Reasoning about Uniqueness Constraints in Object Relational Databases. IEEE Trans. Knowl. Data Eng., Vol. 15, 5 (2003), 1295--1306.
[25]
Henning Kö hler. 2006. Finding Faithful Boyce-Codd Normal Form Decompositions. In AAIM. 102--113.
[26]
Henning Kö hler and Sebastian Link. 2016. SQL Schema Design: Foundations, Normal Forms, and Normalization. In SIGMOD. 267--279.
[27]
Nemanja Kojic and Dragan Milicev. 2020. Equilibrium of Redundancy in Relational Model for Optimized Data Retrieval. IEEE Trans. Knowl. Data Eng., Vol. 32, 9 (2020), 1707--1721.
[28]
Solmaz Kolahi and Leonid Libkin. 2010. An information-theoretic analysis of worst-case redundancy in database design. ACM Trans. Database Syst., Vol. 35, 1 (2010), 5:1--5:32.
[29]
Sebastian Kruse and Felix Naumann. 2018. Efficient Discovery of Approximate Dependencies. Proc. VLDB Endow., Vol. 11, 7 (2018), 759--772.
[30]
Jens Lechtenbö rger and Gottfried Vossen. 2003. Multidimensional normal forms for warehouse design. Inf. Syst., Vol. 28, 5 (2003), 415--434.
[31]
Maurizio Lenzerini and Gaetano Santucci. 1983. Cardinality Constraints in the Entity-Relationship Model. In ER. 529--549.
[32]
Mark Levene and George Loizou. 2003. Why is the snowflake schema a good data warehouse design? Inf. Syst., Vol. 28, 3 (2003), 225--240.
[33]
Mark Levene and Millist W. Vincent. 2000. Justification for Inclusion Dependency Normal Form. IEEE Trans. Knowl. Data Eng., Vol. 12, 2 (2000), 281--291.
[34]
Stephen W. Liddle, David W. Embley, and Scott N. Woodfield. 1993. Cardinality Constraints in Semantic Data Models. Data Knowl. Eng., Vol. 11, 3 (1993), 235--270.
[35]
Sebastian Link and Henri Prade. 2019. Relational database schema design for uncertain data. Inf. Syst., Vol. 84 (2019), 88--110.
[36]
S. Link and Z. Wei. 2021. Logical Schema Design that Quantify Update Inefficiency and Join Efficiency . Report CDMTCS-552. The University of Auckland. http://www.cs.auckland.ac.nz/research/groups/CDMTCS/
[37]
Zezhou Liu and Stratos Idreos. 2016. Main Memory Adaptive Denormalization. In SIGMOD. 2253--2254.
[38]
David Maier. 1983. The Theory of Relational Databases .Computer Science Press.
[39]
Wai Yin Mok, Yiu-Kai Ng, and David W. Embley. 1996. A Normal Form for Precisely Characterizing Redundancy in Nested Relations. ACM Trans. Database Syst., Vol. 21, 1 (1996), 77--106.
[40]
Antoni Olivé. 2007. Cardinality Constraints. In Conceptual Modeling of Information Systems . Springer, 83--102.
[41]
Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schö nberg, Jakob Zwiener, and Felix Naumann. 2015. Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. PVLDB, Vol. 8, 10 (2015), 1082--1093.
[42]
Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD. 821--833.
[43]
Jean-Marc Petit, Farouk Toumani, Jean-Francc ois Boulicaut, and Jacques Kouloumdjian. 1996. Towards the Reverse Engineering of Denormalized Relational Databases. In ICDE. 218--227.
[44]
Tania Roblot, Miika Hannula, and Sebastian Link. 2018. Probabilistic Cardinality Constraints. VLDB J., Vol. 27, 6 (2018), 771--795.
[45]
Stefanie Scherzinger and Sebastian Sidortschuck. 2020. An Empirical Study on the Design and Evolution of NoSQL Database Schemas. In ER. 441--455.
[46]
Christian Soutou. 1998. Relational Database Reverse Engineering: Algorithms to Extract Cardinality Constraints. Data Knowl. Eng., Vol. 28, 2 (1998), 161--207.
[47]
Millist W. Vincent. 1999. Semantic Foundations of 4NF in Relational Database Design. Acta Inf., Vol. 36, 3 (1999), 173--213.
[48]
Ziheng Wei and Sebastian Link. 2019. Discovery and Ranking of Functional Dependencies. In ICDE. 1526--1537.
[49]
Jaejun Yoo, Ki-Hoon Lee, and Young-Ho Jeon. 2018. Migration from RDBMS to NoSQL Using Column-level Denormalization and Atomic Aggregates. J. Inf. Sci. Eng., Vol. 34, 1 (2018), 243--259.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cardinality constraint
  2. data redundancy
  3. functional dependency
  4. join
  5. normal form
  6. normalization
  7. update

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Third and Boyce–Codd normal form for property graphsThe VLDB Journal10.1007/s00778-025-00902-234:2Online publication date: 7-Feb-2025
  • (2024)Compositional Dependencies and High-Order Relations Formulas at Relational Data ModelsAutomatic Documentation and Mathematical Linguistics10.3103/S000510552470031658:5(320-332)Online publication date: 1-Oct-2024
  • (2024)Mixed Covers of Keys and Functional Dependencies for Maintaining the Integrity of Data under UpdatesProceedings of the VLDB Endowment10.14778/3654621.365462617:7(1578-1590)Online publication date: 30-May-2024
  • (2024)Performance of Data Model over Computational Power in CDP Platforms2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825108(3729-3734)Online publication date: 15-Dec-2024
  • (2023)Normalizing Property GraphsProceedings of the VLDB Endowment10.14778/3611479.361150616:11(3031-3043)Online publication date: 1-Jul-2023
  • (2023)Composite Object Normal Forms: Parameterizing Boyce-Codd Normal Form by the Number of Minimal KeysProceedings of the ACM on Management of Data10.1145/35886931:1(1-25)Online publication date: 30-May-2023
  • (2023)Uniqueness Constraints for Object StoresJournal of Data and Information Quality10.1145/358175815:2(1-29)Online publication date: 22-Jun-2023
  • (2023)Cardinality constraints and functional dependencies in SQLInformation Systems10.1016/j.is.2023.102208115:COnline publication date: 1-May-2023
  • (2023)Entity integrity management under data volume, variety and veracityKnowledge and Information Systems10.1007/s10115-022-01814-165:7(2895-2934)Online publication date: 25-Jan-2023
  • (2021)Possible Keys and Functional DependenciesJournal on Data Semantics10.1007/s13740-021-00135-w10:3-4(327-366)Online publication date: 14-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media