skip to main content
10.1145/3034786.3056107acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Counting and Enumerating (Preferred) Database Repairs

Published: 09 May 2017 Publication History

Abstract

In the traditional sense, a subset repair of an inconsistent database refers to a consistent subset of facts (tuples) that is maximal under set containment. Preferences between pairs of facts allow to distinguish a set of preferred repairs based on relative reliability (source credibility, extraction quality, recency, etc.) of data items. Previous studies explored the problem of categoricity, where one aims to determine whether preferences suffice to repair the database unambiguously, or in other words, whether there is precisely one preferred repair. In this paper we study the ability to quantify ambiguity, by investigating two classes of problems. The first is that of counting the number of subset repairs, both preferred (under various common semantics) and traditional. We establish dichotomies in data complexity for the entire space of (sets of) functional dependencies. The second class of problems is that of enumerating (i.e., generating) the preferred repairs. We devise enumeration algorithms with efficiency guarantees on the delay between generated repairs, even for constraints represented as general conflict graphs or hypergraphs.

References

[1]
F. N. Afrati and P. G. Kolaitis. Repair checking in inconsistent databases: algorithms and complexity. In ICDT, pages 31--41. ACM, 2009.
[2]
D. E. Appelt and B. Onyshkevych. The common pattern specification language. In TIPSTER Text Program: Phase III, pages 23--30. Association for Computational Linguistics, 1998.
[3]
M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79. ACM, 1999.
[4]
L. E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
[5]
P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755. IEEE, 2007.
[6]
E. Boros, K. M. Elbassioni, V. Gurvich, and L. Khachiyan. An efficient incremental algorithm for generating all maximal independent sets in hypergraphs of bounded dimension. Parallel Processing Letters, 10(4):253--266, 2000.
[7]
C. Bourgaux, M. Bienvenu, and F. Goasdoué. Querying inconsistent description logic knowledge bases under preferred repair semantics. In DL, pages 96--99, 2014.
[8]
Y. Cao, W. Fan, and W. Yu. Determining the relative accuracy of attributes. In SIGMOD, pages 565--576. ACM, 2013.
[9]
L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, and S. Vaithyanathan. SystemT: An algebraic approach to declarative information extraction. In ACL, pages 128--137, 2010.
[10]
J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1--2):90--121, 2005.
[11]
S. Cohen, I. Fadida, Y. Kanza, B. Kimelfeld, and Y. Sagiv. Full disjunctions: Polynomial-delay iterators in action. In VLDB, pages 739--750. ACM, 2006.
[12]
D. Corneil, H. Lerchs, and L. Burlingham. Complement reducible graphs. Discrete Applied Mathematics, 3(3):163--174, 1981.
[13]
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput., 24(6):1278--1304, 1995.
[14]
R. Fagin, B. Kimelfeld, and P. G. Kolaitis. Dichotomies in the complexity of preferred repairs. In PODS, pages 3--15. ACM, 2015.
[15]
R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Cleaning inconsistencies in information extraction via prioritized repairs. In PODS, pages 164--175. ACM, 2014.
[16]
W. Fan, F. Geerts, and J. Wijsen. Determining the currency of data. ACM Trans. Database Syst., 37(4):25, 2012.
[17]
W. Fan, S. Ma, N. Tang, and W. Yu. Interaction between record matching and data repairing. J. Data and Information Quality, 4(4):16:1--16:38, 2014.
[18]
S. Flesca, F. Furfaro, and F. Parisi. Preferred database repairs under aggregate constraints. In SUM, pages 215--229, 2007.
[19]
T. Gaasterland, P. Godfrey, and J. Minker. An overview of cooperative answering. J. Intell. Inf. Syst., 1(2):123--157, 1992.
[20]
F. Geerts, G. Mecca, P. Papotti, and D. Santoro. The LLUNATIC data-cleaning framework. PVLDB, 6(9):625--636, 2013.
[21]
D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. On generating all maximal independent sets. Inf. Process. Lett., 27(3):119--123, 1988.
[22]
B. Kimelfeld. A dichotomy in the complexity of deletion propagation with functional dependencies. In PODS, pages 191--202, 2012.
[23]
B. Kimelfeld, E. Livshits, and L. Peterfreund. Unambiguous prioritized repairing of databases. To appear in ICDT, 2017.
[24]
B. Kimelfeld, J. Vondrák, and R. Williams. Maximizing conjunctive views in deletion propagation. ACM Trans. Database Syst., 37(4):24, 2012.
[25]
V. Koltun and C. H. Papadimitriou. Approximately dominating representatives. Theor. Comput. Sci., 371(3):148--154, 2007.
[26]
P. Koutris and J. Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In PODS, pages 17--29. ACM, 2015.
[27]
A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007.
[28]
D. Maier. Minimum covers in relational database model. J. ACM, 27(4):664--674, 1980.
[29]
D. Maslowski and J. Wijsen. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci., 79(6):958--983, 2013.
[30]
D. Maslowski and J. Wijsen. Counting database repairs that satisfy conjunctive queries with self-joins. In ICDT, pages 155--164. Open Proceedings.org, 2014.
[31]
D. V. Nieuwenborgh and D. Vermeir. Preferred answer sets for ordered logic programs. In JELIA, pages 432--443, 2002.
[32]
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777--788, 1983.
[33]
S. Staworko, J. Chomicki, and J. Marcinkowski. Preference-driven querying of inconsistent relational databases. In EDBT Workshops, volume 4254 of LNCS, pages 318--335. Springer, 2006.
[34]
S. Staworko, J. Chomicki, and J. Marcinkowski. Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell., 64(2--3):209--246, 2012.
[35]
S. Toda and M. Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 21(2), 1992.
[36]
S. P. Vadhan. The complexity of counting in sparse, regular, and planar graphs. SIAM J. Comput., 31(2):398--427, 2001.
[37]
J. Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005.

Cited By

View all
  • (2023)The Shapley Value in Database ManagementACM SIGMOD Record10.1145/3615952.361595452:2(6-17)Online publication date: 11-Aug-2023
  • (2022)Representing Vietnamese Traditional Dances and Handling Inconsistent InformationInformation Processing and Management of Uncertainty in Knowledge-Based Systems10.1007/978-3-031-08974-9_30(379-393)Online publication date: 4-Jul-2022
  • (2021)Properties of Inconsistency Measures for DatabasesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457310(1182-1194)Online publication date: 18-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
May 2017
458 pages
ISBN:9781450341981
DOI:10.1145/3034786
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conflict hypergraph
  2. enumeration
  3. functional dependencies
  4. inconsistent databases
  5. preferred repairs
  6. repair counting
  7. repairs

Qualifiers

  • Research-article

Funding Sources

  • The Israeli Science Foundation

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

PODS '17 Paper Acceptance Rate 29 of 101 submissions, 29%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The Shapley Value in Database ManagementACM SIGMOD Record10.1145/3615952.361595452:2(6-17)Online publication date: 11-Aug-2023
  • (2022)Representing Vietnamese Traditional Dances and Handling Inconsistent InformationInformation Processing and Management of Uncertainty in Knowledge-Based Systems10.1007/978-3-031-08974-9_30(379-393)Online publication date: 4-Jul-2022
  • (2021)Properties of Inconsistency Measures for DatabasesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457310(1182-1194)Online publication date: 18-Jun-2021
  • (2021)General information spaces: measuring inconsistency, rationality postulates, and complexityAnnals of Mathematics and Artificial Intelligence10.1007/s10472-021-09740-890:2-3(235-269)Online publication date: 27-Apr-2021
  • (2020)Evaluating top-k queries with inconsistency degreesProceedings of the VLDB Endowment10.14778/3407790.340781513:12(2146-2158)Online publication date: 1-Jul-2020
  • (2020)Computing Optimal Repairs for Functional DependenciesACM Transactions on Database Systems10.1145/336090445:1(1-46)Online publication date: 17-Feb-2020
  • (2019)Foundations of Query Answering on Inconsistent DatabasesACM SIGMOD Record10.1145/3377391.337739348:3(6-16)Online publication date: 20-Dec-2019
  • (2019)Database Repairs and Consistent Query AnsweringProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3322190(48-58)Online publication date: 25-Jun-2019
  • (2018)Computing Optimal Repairs for Functional DependenciesProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196980(225-237)Online publication date: 27-May-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media