skip to main content
10.1145/3318464.3389786acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Discovery Algorithms for Embedded Functional Dependencies

Published: 31 May 2020 Publication History

Abstract

Embedded functional dependencies (eFDs) advance data management applications by data completeness and integrity requirements. We show that the discovery problem of eFDs is NP-complete, W[2]-complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Nevertheless, we use novel data structures and search strategies to develop row-efficient, column-efficient, and hybrid algorithms for eFD discovery. Our experiments demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. Finally, we demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for approximate and genuine functional dependencies.

Supplementary Material

MP4 File (3318464.3389786.mp4)
Presentation Video

References

[1]
Ziawasch Abedjan, Lukasz Golab, Felix Naumann, and Thorsten Papenbrock. 2018. Data Profiling. Morgan & Claypool Publishers.
[2]
Ziawasch Abedjan, Patrick Schulze, and Felix Naumann. 2014. DFD: Efficient Functional Dependency Discovery. In CIKM. 949--958.
[3]
Nishita Balamuralikrishna, Yingnan Jiang, Henning Koehler, Uwe Leck, Sebastian Link, and Henri Prade. 2019. Possibilistic keys. Fuzzy Sets Syst., Vol. 376 (2019), 1--36.
[4]
Laure Berti-É quille, Hazar Harmouch, Felix Naumann, Noë l Novelli, and Saravanan Thirumuruganathan. 2018. Discovery of Genuine Functional Dependencies from Relational Data with Missing Values. PVLDB, Vol. 11, 8 (2018), 880--892.
[5]
Thomas Bl"asius, Tobias Friedrich, and Martin Schirneck. 2017. The parameterized complexity of dependency detection in relational databases. In LIPIcs-Leibniz International Proceedings in Informatics, Vol. 63. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[6]
Pieta Brown and Sebastian Link. 2017. Probabilistic Keys. IEEE Trans. Knowl. Data Eng., Vol. 29, 3 (2017), 670--682.
[7]
Loredana Caruccio, Vincenzo Deufemia, and Giuseppe Polese. 2016. Relaxed Functional Dependencies - A Survey of Approaches. IEEE Trans. Knowl. Data Eng., Vol. 28, 1 (2016), 147--165.
[8]
Já nos Demetrovics, Gyula O. H. Katona, Dezsö Mikló s, and Bernhard Thalheim. 2006. On the Number of Independent Functional Dependencies. In FoIKS. 83--91.
[9]
Peter A. Flach and Iztok Savnik. 1999. Database Dependency Discovery: A Machine Learning Approach. AI Commun., Vol. 12, 3 (1999), 139--160.
[10]
C Giannella and C Wyss. 1999. Finding minimal keys in a relation instance.
[11]
Sven Hartmann, Uwe Leck, and Sebastian Link. 2011. On Codd Families of Keys over Incomplete Relations. Comput. J., Vol. 54, 7 (2011), 1166--1180.
[12]
Yk"a Huhtala, Juha K"a rkk"a inen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. Comput. J., Vol. 42, 2 (1999), 100--111.
[13]
Henning Kö hler, Uwe Leck, Sebastian Link, and Xiaofang Zhou. 2016a. Possible and certain keys for SQL. VLDB J., Vol. 25, 4 (2016), 571--596.
[14]
Henning Kö hler and Sebastian Link. 2016. SQL Schema Design: Foundations, Normal Forms, and Normalization. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016. 267--279.
[15]
Henning Kö hler and Sebastian Link. 2018. SQL schema design: foundations, normal forms, and normalization. Inf. Syst., Vol. 76 (2018), 88--113.
[16]
Henning Kö hler, Sebastian Link, and Xiaofang Zhou. 2015. Possible and Certain SQL Keys. PVLDB, Vol. 8, 11 (2015), 1118--1129.
[17]
Henning Kö hler, Sebastian Link, and Xiaofang Zhou. 2016b. Discovering Meaningful Certain Keys from Incomplete and Inconsistent Relations. IEEE Data Eng. Bull., Vol. 39, 2 (2016), 21--37.
[18]
Sebastian Kruse and Felix Naumann. 2018. Efficient Discovery of Approximate Dependencies. PVLDB, Vol. 11, 7 (2018), 759--772.
[19]
Sebastian Link and Henri Prade. 2019. Relational database schema design for uncertain data. Inf. Syst., Vol. 84 (2019), 88--110.
[20]
Sté phane Lopes, Jean-Marc Petit, and Lotfi Lakhal. 2000. Efficient Discovery of Functional Dependencies and Armstrong Relations. In EDBT. 350--364.
[21]
Noë l Novelli and Rosine Cicchetti. 2001. Functional and embedded dependency inference: a data mining point of view. Inf. Syst., Vol. 26, 7 (2001), 477--506.
[22]
Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schö nberg, Jakob Zwiener, and Felix Naumann. 2015. Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. PVLDB, Vol. 8, 10 (2015), 1082--1093.
[23]
Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD. 821--833.
[24]
Tania Roblot, Miika Hannula, and Sebastian Link. 2018. Probabilistic Cardinality Constraints - Validation, Reasoning, and Semantic Summaries. VLDB J., Vol. 27, 6 (2018), 771--795.
[25]
Yannis Sismanis, Paul Brown, Peter J. Haas, and Berthold Reinwald. 2006. GORDIAN: Efficient and Scalable Discovery of Composite Keys. In VLDB. 691--702.
[26]
Ziheng Wei, Sven Hartmann, and Sebastian Link. 2020. Algorithms for the Discovery of Embedded Functional Dependencies. Centre for Discrete Mathematics and Theoretical Computer Science, Technical Report 542, The University of Auckland.
[27]
Ziheng Wei, Uwe Leck, and Sebastian Link. 2019. Discovery and Ranking of Embedded Uniqueness Constraints. PVLDB, Vol. 12, 13 (2019), 2339--2352.
[28]
Ziheng Wei and Sebastian Link. 2019 a. Discovery and Ranking of Functional Dependencies. In ICDE. 1526--1537.
[29]
Ziheng Wei and Sebastian Link. 2019 b. Embedded Functional Dependencies and Data-completeness Tailored Database Design. PVLDB, Vol. 12, 11 (2019), 1458--1470.
[30]
Catharine M. Wyss, Chris Giannella, and Edward L. Robertson. [n.d.]. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances. In DaWaK.
[31]
Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2002. FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences. In ICDM. 729--732.

Cited By

View all
  • (2025)Third and Boyce–Codd normal form for property graphsThe VLDB Journal10.1007/s00778-025-00902-234:2Online publication date: 7-Feb-2025
  • (2023)Normalizing Property GraphsProceedings of the VLDB Endowment10.14778/3611479.361150616:11(3031-3043)Online publication date: 24-Aug-2023
  • (2023)BCNF* - From Normalized- to Star-Schemas and Back AgainCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589712(103-106)Online publication date: 4-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. discovery
  2. embedded functional dependency
  3. missing data

Qualifiers

  • Short-paper

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Third and Boyce–Codd normal form for property graphsThe VLDB Journal10.1007/s00778-025-00902-234:2Online publication date: 7-Feb-2025
  • (2023)Normalizing Property GraphsProceedings of the VLDB Endowment10.14778/3611479.361150616:11(3031-3043)Online publication date: 24-Aug-2023
  • (2023)BCNF* - From Normalized- to Star-Schemas and Back AgainCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589712(103-106)Online publication date: 4-Jun-2023
  • (2023)EulerFD: An Efficient Double-Cycle Approximation of Functional Dependencies2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00220(2878-2891)Online publication date: Apr-2023
  • (2023)Towards the efficient discovery of meaningful functional dependenciesInformation Systems10.1016/j.is.2023.102224116(102224)Online publication date: Jun-2023
  • (2023)Dependency-Aware Core Column Discovery for Table UnderstandingThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_9(159-178)Online publication date: 27-Oct-2023
  • (2022)Possibilistic Data CleaningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306231834:12(5939-5950)Online publication date: 1-Dec-2022
  • (2022)Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00026(286-298)Online publication date: May-2022
  • (2021)Embedded Functional Dependencies and Data-completeness Tailored Database DesignACM Transactions on Database Systems10.1145/345051846:2(1-46)Online publication date: 29-May-2021
  • (2021)Algorithms for the discovery of embedded functional dependenciesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00684-330:6(1069-1093)Online publication date: 28-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media