ABSTRACT
The discovery of unknown functional dependencies in a dataset is of great importance for database redesign, anomaly detection and data cleansing applications. However, as the nature of the problem is exponential in the number of attributes none of the existing approaches can be applied on large datasets. We present a new algorithm DFD for discovering all functional dependencies in a dataset following a depth-first traversal strategy of the attribute lattice that combines aggressive pruning and efficient result verification. Our approach is able to scale far beyond existing algorithms for up to 7.5 million tuples, and is up to three orders of magnitude faster than existing approaches on smaller datasets.
- Z. Abedjan and F. Naumann. Advancing the discovery of unique column combinations. In CIKM, pages 1565--1570, 2011. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarDigital Library
- P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarCross Ref
- X. Chu, I. F. Ilyas, and P. Papotti. Discovering denial constraints. PVLDB, 6(13):1498--1509, Aug. 2013. Google ScholarDigital Library
- X. Chu, I. F. Ilyas, and P. Papotti. Holistic data cleaning: Putting violations into context. In ICDE, pages 458--469, 2013. Google ScholarDigital Library
- W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering conditional functional dependencies. TKDE, 23(5):683--698, 2011. Google ScholarDigital Library
- W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Interaction between record matching and data repairing. In SIGMOD, pages 469--480, 2011. Google ScholarDigital Library
- P. A. Flach and I. Savnik. Database dependency discovery: A machine learning approach. Journal of AI Com., 12(3):139--160, 1999. Google ScholarDigital Library
- L. Golab, H. Karloff, F. Korn, and D. Srivastava. Data Auditor: Exploring data quality and semantics using pattern tableaux. PVLDB, 3(1-2):1641--1644, 2010. Google ScholarDigital Library
- A. Heise, J.-A. Quiané-Ruiz, Z. Abedjan, A. Jentzsch, and F. Naumann. Scalable discovery of unique column combinations. PVLDB, 7(4):301--312, 2013.Google ScholarDigital Library
- Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.Google ScholarCross Ref
- I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pages 647--658, 2004. Google ScholarDigital Library
- S. Lopes, J.-M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In EDBT, volume 1777, pages 350--364, 2000. Google ScholarDigital Library
- H. Mannila and K.-J. Räihä. On the complexity of inferring functional dependencies. Discrete Applied Mathematics, 40(2):237--243, 1992. Google ScholarDigital Library
- F. Naumann. Data profiling revisited. SIGMOD Rec., 42(4):40--49, 2014. Google ScholarDigital Library
- N. Novelli and R. Cicchetti. FUN: An efficient algorithm for mining functional and embedded dependencies. In ICDT, volume 1973, pages 189--203, 2001. Google ScholarDigital Library
- Y. Sismanis, P. Brown, P. J. Haas, and B. Reinwald. GORDIAN: Efficient and scalable discovery of composite keys. In VLDB, pages 691--702, 2006. Google ScholarDigital Library
- C. Wyss, C. Giannella, and E. Robertson. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In DaWaK, 2001, pages 101--110, 2001. Google ScholarDigital Library
- H. Yao, H. J. Hamilton, and C. J. Butz. FD Mine: Discovering functional dependencies in a database using equivalences. In ICDM, pages 729--732, 2002. Google ScholarDigital Library
Index Terms
- DFD: Efficient Functional Dependency Discovery
Recommendations
A Hybrid Approach to Functional Dependency Discovery
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataFunctional dependencies are structural metadata that can be used for schema normalization, data integration, data cleansing, and many other data management tasks. Despite their importance, the functional dependencies of a specific dataset are usually ...
On the menbership problem for functional and multivalued dependencies in relational databases
The problem of whether a given dependency in a database relation can be derived from a given set of dependencies is investigated. We show that the problem can be decided in polynomial time when the given set consists of either multivalued dependencies ...
Removing XML Data Redundancies by Constraint-Tree-Based Functional Dependencies
CCCM '08: Proceedings of the 2008 ISECS International Colloquium on Computing, Communication, Control, and Management - Volume 01XML datasets may contain redundant information due to some anomaly functional dependencies among elements and attributes just as those in relational database schema. This paper proposes a new concept of XML functional dependency based on constraint-tree ...
Comments