Skip to main content
Log in

Matching dependencies: semantics and query answering

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of certain attribute values in pairs of database tuples when some similarity conditions on other values are satisfied. Their enforcement can be seen as a natural generalization of entity resolution. In what we call the pure case of MD enforcement, an arbitrary value from the underlying data domain can be used for the value in common that is used for a matching. However, the overall number of changes of attribute values is expected to be kept to a minimum. We investigate this case in terms of semantics and the properties of data cleaning through the enforcement of MDs. We characterize the intended clean instances, and also the clean answers to queries, as those that are invariant under the cleaning process. The complexity of computing clean instances and clean query answering is investigated. Tractable and intractable cases depending on the MDs are identified and characterized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Elmagarmid A, Ipeirotis P, Verykios V. Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1–16

    Article  Google Scholar 

  2. Bleiholder J, Naumann F. Data fusion. ACM Computing Surveys, 2008, 41(1): 1–41

    Article  Google Scholar 

  3. Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang S, Widom J. Swoosh: a generic approach to entity resolution. The VLDB Journal, 2009, 18: 255–276

    Article  Google Scholar 

  4. Fan W. Dependencies revisited for improving data quality. In: Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2008, 159–170

  5. Fan W, Jia X, Li J, Ma S. Reasoning about record matching rules. Proceedings of the VLDB Endowment, 2009, 2(1): 407–418

    Google Scholar 

  6. Arenas M, Bertossi L, Chomicki J. Consistent query answers in inconsistent databases. In: Proceedings of the 18th ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems. 1999, 68–79

  7. Bertossi L. Consistent queryanswering in databases. ACM Sigmod Record, 2006, 35(2): 68–76

    Article  Google Scholar 

  8. Chomicki J. Consistent query answering: five easy pieces. In: Proceedings of the 11th International Conference on Database Theory. LNCS 4353. Springer, 2007, 1–17

  9. Bertossi L. Database repairing and consistent query answering. Synthesis Lectures on Data Management. Morgan & Claypool, 2011

  10. Bertossi L, Bravo L. Consistent query answers in virtual data integration systems. In: Bertossi L, Hunter A, Schaub T, eds. Inconsistency Tolerance. LNCS 3300. Berlin: Springer, 2005, 42–83

    Chapter  Google Scholar 

  11. Bertossi L, Kolahi S, Lakshmanan L. Data cleaning and query answering with matching dependencies and matching functions. In: Proceedings of the 14th International Conference on Database Theory. 2011, 268–279

  12. Gardezi J, Bertossi L, Kiringa I. Matching dependencies with arbitrary attribute values: semantics, query answering and integrity constraints. In: Proceedings of the 4th International Workshop on Logic in Databases. 2011, 23–30

  13. Abiteboul S, Hull R, Vianu V. Foundations of Databases. 1st edition. Addison-Wesley, 1995

  14. Bertossi L, Kolahi S, Lakshmanan L. Data cleaning and query answering with matching dependencies and matching functions. Theory of Computing Systems, 2012 (in press)

  15. Franconi1 E, Palma A, Leone N, Perri S, Scarcello F. Census data repair: A challenging application of disjunctive logic programming. In: Nieuwenhuis R, Voronkov A, eds. Logic for Programming, Artificial Intelligence, and Reasoning. LNCS 2250. Berlin: Springer, 2001, 561–578

    Chapter  Google Scholar 

  16. Flesca S, Furfaro F, Parisi F. Querying and repairing inconsistent numerical databases. ACM Transactions on Database Systems, 2010, 35(2): 14

    Article  Google Scholar 

  17. Wijsen J. Database repairing using updates. ACM Transactions on Database Systems, 2005, 30(3): 722–768

    Article  Google Scholar 

  18. Bertossi L, Bravo L, Franconi E, Lopatenko A. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Information Systems, 2008, 33(4): 407–434

    Article  Google Scholar 

  19. Barceló P. Logical foundations of relational data exchange. ACM SIGMOD Record, 2009, 38(1): 49–58

    Article  Google Scholar 

  20. Bahmani Z, Bertossi L, Kolahi S, Lakshmanan L V S. Declarative entity resolution via matching dependencies and answer set programs. In: Proceedings of the 13th International Conference on Principles of Knowledge Representation and Reasoning. 2012 (in press)

  21. Gardezi J, Bertossi L. Query answering under matching dependencies for data cleaning: Complexity and algorithms. Arxiv preprint arXiv: 1112. 5908, 2011

  22. Garey M, Johnson D. Computers and Intractability: A Guide to the Theory of NP-completeness. WH Freeman & Co, 1979

  23. Goldreich O. Computational Complexity: A Conceptual Perspective. 1st edition. New York: Cambridge University Press, 2008

    MATH  Google Scholar 

  24. Papadimitriou C. Computational complexity. Addison-Wesley, 1994

  25. Gelfond M, Lifschitz V. Classical negation in logic programs and disjunctive databases. New Generation Computing, 1991, 9(3): 365–385

    Article  Google Scholar 

  26. Brewka G, Eiter T, Truszczy’nski M. Answer set programming at a glance. Communications of the ACM, 2011, 54(12): 92–103

    Article  Google Scholar 

  27. Buccafurri F, Leone N, Rullo P. Enhancing disjunctive datalog by constraints. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(5): 845–860

    Article  Google Scholar 

  28. Fan W, Li J, Ma S, Tang N, Yu W. Towards certain fixes with editing rules and master data. Proceedings of the VLDB Endowment, 2010, 3(1–2): 173–184

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leopoldo Bertossi.

Additional information

Jaffer Gardezi is a PhD student in computer science at the University of Ottawa. He received a BSc in computer science from the University of Victoria and an MSc in computer science from the University of Waterloo. His research is in duplicate resolution and matching dependencies.

Leopoldo Bertossi is a Full Professor at the School of Computer Science, Carleton University, Ottawa, Canada. He is Faculty Fellow of the IBM Center for Advanced Studies. He obtained a PhD in Mathematics from the Pontifical Catholic University of Chile (PUC) in 1988.

Until 2001, he was professor at the Department of Computer Science, PUC; and the President of the Chilean Computer Science Society (SCCC) in 1996 and 1999–2000. He has been visiting professor and researcher at the universities of Toronto, Wisconsin-Milwaukee, Marseille-Luminy, Technical University Berlin, Free University of Bolzano-Bozen; and the Technical University of Vienna as a Pauli Fellow.

Prof. Bertossi’s research interests include database theory, data integration, intelligent information systems, data quality, knowledge representation, and logic programming.

Iluju Kiringa received the MS and the PhD degrees in Computer Science from the University of Bonn, Germany, in 1996 and the University of Toronto, Canada, in 2003, respectively. His research interests are in databases, business intelligence, peer-to-peer data management, and knowledge representation. He focuses on run-time data sharing systems, advanced database transactions, active databases, data warehousing and related cleanness issues, conceptual models, and logic-based knowledge representation. Dr. Kiringa is an Associate professor of Computer Science in the School of Electrical Engineering and Computer Science in the Faculty of Engineering, University of Ottawa which he joined in 2002.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gardezi, J., Bertossi, L. & Kiringa, I. Matching dependencies: semantics and query answering. Front. Comput. Sci. 6, 278–292 (2012). https://doi.org/10.1007/s11704-012-2007-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-012-2007-0

Keywords