Skip to main content
Log in

Discovering context-aware conditional functional dependencies

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Conditional functional dependencies(CFDs) are important techniques for data consistency. However, CFDs are limited to 1) provide the reasonable values for consistency repairing and 2) detect potential errors. This paper presents context-aware conditional functional dependencies(CCFDs) which contribute to provide reasonable values and detect potential errors. Especially, we focus on automatically discovering minimal CCFDs. In this paper, we present context relativity to measure the relationship of CFDs. The overlap of the related CFDs can provide reasonable values which result in more accuracy consistency repairing, and some related CFDs are combined into CCFDs.Moreover,we prove that discovering minimal CCFDs is NP-complete and we design the precise method and the heuristic method. We also present the dominating value to facilitate the process in both the precise method and the heuristic method. Additionally, the context relativity of the CFDs affects the cleaning results. We will give an approximate threshold of context relativity according to data distribution for suggestion. The repairing results are approvedmore accuracy, even evidenced by our empirical evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bitton D, Millman J, Torgersen S. A feasibility and performance study of dependency inference (database design). In: Proceedings of the 5th International Conference on Data Engineering. 1989, 635–641

    Google Scholar 

  2. Abiteboul S, Hull R, Vianu V. Foundations of Databases. Boston: Addison-Wesley, 1995

    MATH  Google Scholar 

  3. Kivinen J, Mannila H. Approximate inference of functional dependencies from relations. Theoretical Computer Science, 1995, 149(1): 129–149

    Article  MathSciNet  MATH  Google Scholar 

  4. Maher M. Constrained dependencies. Theoretical Computer Science, 1997, 173(1): 113–149

    Article  MathSciNet  MATH  Google Scholar 

  5. Fan W F, Geerts F, Jia X B, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Transactions on Database Systems (TODS), 2008, 33(2): 1–44

    Article  Google Scholar 

  6. Fan W F, Geerts F. Foundations of Data Quality Management. San Rafael, Calif: Morgan and Claypool, 2012

    MATH  Google Scholar 

  7. Bravo L, Fan WF, Geerts F, Ma S. Increasing the expressivity of conditional functional dependencies without extra complexity. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 516–525

    Google Scholar 

  8. Raman V, Hellerstein J. Potter’s wheel: an interactive data cleaning system. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001, 381–390

    Google Scholar 

  9. Ilyas I, Markl V, Haas P, Brown P, Aboulnaga A. Cords: automatic discovery of correlations and soft functional dependencies. In: Proceedings of the 30th ACM SIGMOD International Conference on Management of Data. 2004, 647–658

    Google Scholar 

  10. Mayfield C, Neville J, Prabhakar S. Eracer: a database approach for statistical inference and data cleaning. In: Proceedings of the 36th ACM SIGMOD International Conference on Management of Data. 2010, 75–86

    Google Scholar 

  11. Dallachiesa M, Ebaid A, Eldawy A, Elmagarmid A, Ilyas I, OuzzaniM, Tang N. Nadeef: a commodity data cleaning system. In: Proceedings of the 39th ACM SIGMOD International Conference on Management of Data. 2013, 541–552

    Google Scholar 

  12. Bohannon P, Fan W F, Flaster M, Rastogi R. A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 31st ACM SIGMOD International Conference on Management of Data. 2005, 143–154

    Google Scholar 

  13. Ma S, Fan W F, Bravo L. Extending inclusion dependencies with conditions. Theoretical Computer Science, 2014, 515: 64–95

    Article  MathSciNet  MATH  Google Scholar 

  14. Fan WF, Geerts F, Li J Z, Xiong M. Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(5): 683–698

    Article  Google Scholar 

  15. Cormen H, Leiserson C, Rivest R, Stein C. Introduction to algorithms. Cambridge: MIT Press, 2001

    MATH  Google Scholar 

  16. Cong G, Fan W F, Geerts F, Jia X, Ma S. Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 315–326

    Google Scholar 

  17. Chu X, Ilyas I, Papotti P. Discovering denial constraints. Proceedings of the VLDB Endowment, 2013, 6(13): 1498–1509

    Article  Google Scholar 

  18. Fan WF, Geerts F, Tang N, YuWY. Inferring data currency and consistency for conflict resolution. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 470–481

    Google Scholar 

  19. Cao Y, Fan W F, Yu W Y. Determining the relative accuracy of attributes. In: Proceedings of the 39th ACM SIGMOD International Conference on Management of Data. 2013, 565–576

    Google Scholar 

  20. Haas L, Hernández M, Ho H, Popa L, Roth M. Clio grows up: from research prototype to industrial tool. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 805–810

    Chapter  Google Scholar 

  21. Ma S, Duan L, Fan W F, Hu C, Chen W G. Extending conditional dependencies with built-in predicates. Knowledge and Data Engineering, IEEE Transactions on, 2015, 27(12): 3274–3288

    Article  Google Scholar 

  22. Chen W G, Fan W F, Ma S. Incorporating cardinality constraints and synonym rules into conditional functional dependencies. Information Processing Letters, 2009, 109(14): 783–789

    Article  MathSciNet  MATH  Google Scholar 

  23. Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H. Tane: an efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 1999, 42(2): 100–111

    Article  MATH  Google Scholar 

  24. Huhtala Y, Karkkainen J, Porkka P, Toivonen H. Efficient discovery of functional and approximate dependencies using partitions. In: Proceedings of the 4th International Conference on Data Engineering. 1998, 392–401

    Chapter  Google Scholar 

  25. Chiang F, Miller R. Discovering data quality rules. Proceedings of the VLDB Endowment, 2008, 1(1): 1166–1177

    Article  Google Scholar 

  26. Chiang F, Miller R. A unified model for data and constraint repair. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 446–457

    Google Scholar 

  27. Fan W F, Ma S, Tang N, Yu WY. Interaction between record matching and data repairing. Journal of Data and Information Quality (JDIQ), 2014, 4(4): 1–16

    Article  Google Scholar 

  28. Wang J N, Tang N. Towards dependable data repairing with fixing rules. In: Proceedings of the 40th ACM SIGMOD International Conference on Management of Data. 2014, 457–468

    Google Scholar 

  29. Interlandi M, Tang N. Proof positive and negative in data cleaning. In: Proceedings of the 31st International Conference on Data Engineering. 2015, 18–29

    Google Scholar 

Download references

Acknowledgements

This research was supported by the National Basic Research Program of China (973 Program) (2012CB316201), the National Natural Science Foundation of China (Grant No. 61033007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuefeng Du.

Additional information

Yuefeng Du is currently a PhD candidate in the College of Information Science & Engineering, Northeastern University, China from where he received his MS in 2012. His interests include data quality and data integration.

Derong Shen is a full professor and a PhD supervisor in the College of Information Science & Engineering, Northeastern University, China from where she received her PhD in 2004. She received her BS and MS from Jilin University, China in 1987 and 1990, respectively. Her interests include entity search and distributed computing.

Tiezheng Nie is an associate professor in the College of Information Science & Engineering, Northeastern University, China from where he received his BS, MS, and PhD in 2002, 2005, and 2009, respectively. His interests include data quality and data integration.

Yue Kou is an associate professor in the College of Information Science & Engineering, Northeastern University, China from where she also received her BS, MS, and PhD in 2002, 2005, and 2009, respectively. Her interests include entity resolution and web data management.

Ge Yu is a full professor and a PhD supervisor in the College of Information Science & Engineering, Northeastern University, China from where he received his BS and MS in 1982 and 1985, respectively. He received his PhD from Kyushu University, Japan in 1996. He is a senior member of the CCF, and a member of ACM and IEEE. His interests include databases and big-data management.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, Y., Shen, D., Nie, T. et al. Discovering context-aware conditional functional dependencies. Front. Comput. Sci. 11, 688–701 (2017). https://doi.org/10.1007/s11704-016-5265-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-5265-4

Keywords

Navigation