skip to main content
10.1145/2661829.2661884acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

DFD: Efficient Functional Dependency Discovery

Published:03 November 2014Publication History

ABSTRACT

The discovery of unknown functional dependencies in a dataset is of great importance for database redesign, anomaly detection and data cleansing applications. However, as the nature of the problem is exponential in the number of attributes none of the existing approaches can be applied on large datasets. We present a new algorithm DFD for discovering all functional dependencies in a dataset following a depth-first traversal strategy of the attribute lattice that combines aggressive pruning and efficient result verification. Our approach is able to scale far beyond existing algorithms for up to 7.5 million tuples, and is up to three orders of magnitude faster than existing approaches on smaller datasets.

References

  1. Z. Abedjan and F. Naumann. Advancing the discovery of unique column combinations. In CIKM, pages 1565--1570, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. X. Chu, I. F. Ilyas, and P. Papotti. Discovering denial constraints. PVLDB, 6(13):1498--1509, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. X. Chu, I. F. Ilyas, and P. Papotti. Holistic data cleaning: Putting violations into context. In ICDE, pages 458--469, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering conditional functional dependencies. TKDE, 23(5):683--698, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Interaction between record matching and data repairing. In SIGMOD, pages 469--480, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. A. Flach and I. Savnik. Database dependency discovery: A machine learning approach. Journal of AI Com., 12(3):139--160, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Golab, H. Karloff, F. Korn, and D. Srivastava. Data Auditor: Exploring data quality and semantics using pattern tableaux. PVLDB, 3(1-2):1641--1644, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Heise, J.-A. Quiané-Ruiz, Z. Abedjan, A. Jentzsch, and F. Naumann. Scalable discovery of unique column combinations. PVLDB, 7(4):301--312, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  12. I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pages 647--658, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Lopes, J.-M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In EDBT, volume 1777, pages 350--364, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Mannila and K.-J. Räihä. On the complexity of inferring functional dependencies. Discrete Applied Mathematics, 40(2):237--243, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Naumann. Data profiling revisited. SIGMOD Rec., 42(4):40--49, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Novelli and R. Cicchetti. FUN: An efficient algorithm for mining functional and embedded dependencies. In ICDT, volume 1973, pages 189--203, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Sismanis, P. Brown, P. J. Haas, and B. Reinwald. GORDIAN: Efficient and scalable discovery of composite keys. In VLDB, pages 691--702, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Wyss, C. Giannella, and E. Robertson. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In DaWaK, 2001, pages 101--110, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Yao, H. J. Hamilton, and C. J. Butz. FD Mine: Discovering functional dependencies in a database using equivalences. In ICDM, pages 729--732, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DFD: Efficient Functional Dependency Discovery

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
      November 2014
      2152 pages
      ISBN:9781450325981
      DOI:10.1145/2661829

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader