ABSTRACT
In the traditional sense, a subset repair of an inconsistent database refers to a consistent subset of facts (tuples) that is maximal under set containment. Preferences between pairs of facts allow to distinguish a set of preferred repairs based on relative reliability (source credibility, extraction quality, recency, etc.) of data items. Previous studies explored the problem of categoricity, where one aims to determine whether preferences suffice to repair the database unambiguously, or in other words, whether there is precisely one preferred repair. In this paper we study the ability to quantify ambiguity, by investigating two classes of problems. The first is that of counting the number of subset repairs, both preferred (under various common semantics) and traditional. We establish dichotomies in data complexity for the entire space of (sets of) functional dependencies. The second class of problems is that of enumerating (i.e., generating) the preferred repairs. We devise enumeration algorithms with efficiency guarantees on the delay between generated repairs, even for constraints represented as general conflict graphs or hypergraphs.
- F. N. Afrati and P. G. Kolaitis. Repair checking in inconsistent databases: algorithms and complexity. In ICDT, pages 31--41. ACM, 2009. Google ScholarDigital Library
- D. E. Appelt and B. Onyshkevych. The common pattern specification language. In TIPSTER Text Program: Phase III, pages 23--30. Association for Computational Linguistics, 1998. Google ScholarDigital Library
- M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79. ACM, 1999. Google ScholarDigital Library
- L. E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. Google ScholarDigital Library
- P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755. IEEE, 2007.Google ScholarCross Ref
- E. Boros, K. M. Elbassioni, V. Gurvich, and L. Khachiyan. An efficient incremental algorithm for generating all maximal independent sets in hypergraphs of bounded dimension. Parallel Processing Letters, 10(4):253--266, 2000.Google ScholarCross Ref
- C. Bourgaux, M. Bienvenu, and F. Goasdoué. Querying inconsistent description logic knowledge bases under preferred repair semantics. In DL, pages 96--99, 2014.Google Scholar
- Y. Cao, W. Fan, and W. Yu. Determining the relative accuracy of attributes. In SIGMOD, pages 565--576. ACM, 2013. Google ScholarDigital Library
- L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, and S. Vaithyanathan. SystemT: An algebraic approach to declarative information extraction. In ACL, pages 128--137, 2010. Google ScholarDigital Library
- J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1--2):90--121, 2005. Google ScholarDigital Library
- S. Cohen, I. Fadida, Y. Kanza, B. Kimelfeld, and Y. Sagiv. Full disjunctions: Polynomial-delay iterators in action. In VLDB, pages 739--750. ACM, 2006. Google ScholarDigital Library
- D. Corneil, H. Lerchs, and L. Burlingham. Complement reducible graphs. Discrete Applied Mathematics, 3(3):163--174, 1981.Google ScholarCross Ref
- T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput., 24(6):1278--1304, 1995. Google ScholarDigital Library
- R. Fagin, B. Kimelfeld, and P. G. Kolaitis. Dichotomies in the complexity of preferred repairs. In PODS, pages 3--15. ACM, 2015. Google ScholarDigital Library
- R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Cleaning inconsistencies in information extraction via prioritized repairs. In PODS, pages 164--175. ACM, 2014. Google ScholarDigital Library
- W. Fan, F. Geerts, and J. Wijsen. Determining the currency of data. ACM Trans. Database Syst., 37(4):25, 2012. Google ScholarDigital Library
- W. Fan, S. Ma, N. Tang, and W. Yu. Interaction between record matching and data repairing. J. Data and Information Quality, 4(4):16:1--16:38, 2014. Google ScholarDigital Library
- S. Flesca, F. Furfaro, and F. Parisi. Preferred database repairs under aggregate constraints. In SUM, pages 215--229, 2007. Google ScholarDigital Library
- T. Gaasterland, P. Godfrey, and J. Minker. An overview of cooperative answering. J. Intell. Inf. Syst., 1(2):123--157, 1992.Google ScholarCross Ref
- F. Geerts, G. Mecca, P. Papotti, and D. Santoro. The LLUNATIC data-cleaning framework. PVLDB, 6(9):625--636, 2013. Google ScholarDigital Library
- D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. On generating all maximal independent sets. Inf. Process. Lett., 27(3):119--123, 1988. Google ScholarDigital Library
- B. Kimelfeld. A dichotomy in the complexity of deletion propagation with functional dependencies. In PODS, pages 191--202, 2012. Google ScholarDigital Library
- B. Kimelfeld, E. Livshits, and L. Peterfreund. Unambiguous prioritized repairing of databases. To appear in ICDT, 2017.Google Scholar
- B. Kimelfeld, J. Vondrák, and R. Williams. Maximizing conjunctive views in deletion propagation. ACM Trans. Database Syst., 37(4):24, 2012. Google ScholarDigital Library
- V. Koltun and C. H. Papadimitriou. Approximately dominating representatives. Theor. Comput. Sci., 371(3):148--154, 2007. Google ScholarDigital Library
- P. Koutris and J. Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In PODS, pages 17--29. ACM, 2015. Google ScholarDigital Library
- A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007. Google ScholarDigital Library
- D. Maier. Minimum covers in relational database model. J. ACM, 27(4):664--674, 1980. Google ScholarDigital Library
- D. Maslowski and J. Wijsen. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci., 79(6):958--983, 2013. Google ScholarDigital Library
- D. Maslowski and J. Wijsen. Counting database repairs that satisfy conjunctive queries with self-joins. In ICDT, pages 155--164. Open Proceedings.org, 2014.Google Scholar
- D. V. Nieuwenborgh and D. Vermeir. Preferred answer sets for ordered logic programs. In JELIA, pages 432--443, 2002. Google ScholarDigital Library
- J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777--788, 1983.Google ScholarDigital Library
- S. Staworko, J. Chomicki, and J. Marcinkowski. Preference-driven querying of inconsistent relational databases. In EDBT Workshops, volume 4254 of LNCS, pages 318--335. Springer, 2006. Google ScholarDigital Library
- S. Staworko, J. Chomicki, and J. Marcinkowski. Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell., 64(2--3):209--246, 2012. Google ScholarDigital Library
- S. Toda and M. Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 21(2), 1992. Google ScholarDigital Library
- S. P. Vadhan. The complexity of counting in sparse, regular, and planar graphs. SIAM J. Comput., 31(2):398--427, 2001. Google ScholarDigital Library
- J. Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005. Google ScholarDigital Library
Index Terms
- Counting and Enumerating (Preferred) Database Repairs
Recommendations
Computing Optimal Repairs for Functional Dependencies
Best of SIGMOD 2018 and Best of PODS 2018We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair), which is ...
Computing Optimal Repairs for Functional Dependencies
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsWe investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair) that is ...
Dichotomies in the Complexity of Preferred Repairs
PODS '15: Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsThe framework of database repairs provides a principled approach to managing inconsistencies in databases. Informally, a repair of an inconsistence database is a consistent database that differs from the inconsistent one in a "minimal way." A ...
Comments