skip to main content
10.1145/2063576.2063826acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Cost-efficient repair in inconsistent probabilistic databases

Authors Info & Claims
Published:24 October 2011Publication History

ABSTRACT

Due to the ubiquitous data uncertainty in many emerging real applications, efficient management of probabilistic databases has become an increasingly important yet challenging problem. In particular, one fundamental task of data management is to identify those unreliable data in the probabilistic database that violate integrity constraints (e.g., functional dependencies), and then quickly resolve data inconsistencies. In this paper, we formulate and tackle an important problem of repairing inconsistent probabilistic databases efficiently by value modification. Specifically, we propose a repair semantic, namely possible-world-oriented repair (PW-repair), which partitions possible worlds into several disjoint groups, and repairs these groups individually with minimum repair costs. Due to the intractable result that finding such a PW-repair strategy is NP-complete, we carefully design a heuristic-based greedy approach for PW-repair, which can efficiently obtain an effective repair of the inconsistent probabilistic database. Through extensive experiments, we show that our approach can achieve the efficiency and effectiveness of the repair on inconsistent probabilistic data.

References

  1. P. Andritsos, A. Fuxman, and R.J. Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Antova, C. Koch, and D. Olteanu. MayBMS: Managing incomplete information with probabilistic world-set decompositions. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Benjelloun, A. Das Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bohannon, W. Fan, M. Flaster, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Boulos, N. N. Dalvi, B. Mandhani, S. Mathur, C. Ré, and D. Suciu. Mystiq: a system for finding more answers by using probabilities. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Cheng, S. Singh, and S. Prabhakar. U-DBMS: A database system for managing constantly-evolving data. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1/2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute, 2002.Google ScholarGoogle Scholar
  12. W. Fan. Dependencies revisited for improving data quality. In PODS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Fellegi and D. Holt. A systematic approach to automatic edit and imputation. J. American Statistical Association, 71(353), 1976.Google ScholarGoogle Scholar
  14. A. Fuxman, E. Fazli, and R. Miller. ConQuer: efficient management of inconsistent databases. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Lian, L. Chen, and S. Song. Consistent query answers in inconsistent probabilistic databases. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. F. Sadri. Reliability of answers to queries in relational databases. TKDE, 3(2), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. A. Soliman, I. F. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Z. Wang, E. Michelakis, M. Garofalakis, and J. Hellerstein. Bayestore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Wijsen. Database repairing using updates. TODS, 30(3), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. E. Winkler. Methods for evaluating and creating data quality. Inf. Syst., 29(7), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cost-efficient repair in inconsistent probabilistic databases

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
          October 2011
          2712 pages
          ISBN:9781450307178
          DOI:10.1145/2063576

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 October 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        • Article Metrics

          • Downloads (Last 12 months)2
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader