ABSTRACT
Due to the ubiquitous data uncertainty in many emerging real applications, efficient management of probabilistic databases has become an increasingly important yet challenging problem. In particular, one fundamental task of data management is to identify those unreliable data in the probabilistic database that violate integrity constraints (e.g., functional dependencies), and then quickly resolve data inconsistencies. In this paper, we formulate and tackle an important problem of repairing inconsistent probabilistic databases efficiently by value modification. Specifically, we propose a repair semantic, namely possible-world-oriented repair (PW-repair), which partitions possible worlds into several disjoint groups, and repairs these groups individually with minimum repair costs. Due to the intractable result that finding such a PW-repair strategy is NP-complete, we carefully design a heuristic-based greedy approach for PW-repair, which can efficiently obtain an effective repair of the inconsistent probabilistic database. Through extensive experiments, we show that our approach can achieve the efficiency and effectiveness of the repair on inconsistent probabilistic data.
- P. Andritsos, A. Fuxman, and R.J. Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, 2006. Google ScholarDigital Library
- L. Antova, C. Koch, and D. Olteanu. MayBMS: Managing incomplete information with probabilistic world-set decompositions. In ICDE, 2007.Google ScholarCross Ref
- M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, 1999. Google ScholarDigital Library
- O. Benjelloun, A. Das Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarDigital Library
- P. Bohannon, W. Fan, M. Flaster, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, 2005. Google ScholarDigital Library
- J. Boulos, N. N. Dalvi, B. Mandhani, S. Mathur, C. Ré, and D. Suciu. Mystiq: a system for finding more answers by using probabilities. In SIGMOD, 2005. Google ScholarDigital Library
- R. Cheng, S. Singh, and S. Prabhakar. U-DBMS: A database system for managing constantly-evolving data. In VLDB, 2005. Google ScholarDigital Library
- J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1/2), 2005. Google ScholarDigital Library
- G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. In VLDB, 2007. Google ScholarDigital Library
- N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4), 2007. Google ScholarDigital Library
- W. W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute, 2002.Google Scholar
- W. Fan. Dependencies revisited for improving data quality. In PODS, 2008. Google ScholarDigital Library
- I. Fellegi and D. Holt. A systematic approach to automatic edit and imputation. J. American Statistical Association, 71(353), 1976.Google Scholar
- A. Fuxman, E. Fazli, and R. Miller. ConQuer: efficient management of inconsistent databases. In SIGMOD, 2005. Google ScholarDigital Library
- R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 2008. Google ScholarDigital Library
- X. Lian, L. Chen, and S. Song. Consistent query answers in inconsistent probabilistic databases. In SIGMOD, 2010. Google ScholarDigital Library
- C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarCross Ref
- F. Sadri. Reliability of answers to queries in relational databases. TKDE, 3(2), 1991. Google ScholarDigital Library
- M. A. Soliman, I. F. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.Google ScholarCross Ref
- D. Z. Wang, E. Michelakis, M. Garofalakis, and J. Hellerstein. Bayestore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, 2008. Google ScholarDigital Library
- J. Wijsen. Database repairing using updates. TODS, 30(3), 2005. Google ScholarDigital Library
- W. E. Winkler. Methods for evaluating and creating data quality. Inf. Syst., 29(7), 2004. Google ScholarDigital Library
Index Terms
- Cost-efficient repair in inconsistent probabilistic databases
Recommendations
Consistent query answers in inconsistent probabilistic databases
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataEfficient and effective manipulation of probabilistic data has become increasingly important recently due to many real applications that involve the data uncertainty. This is especially crucial when probabilistic data collected from different sources ...
Repair checking in inconsistent databases: algorithms and complexity
ICDT '09: Proceedings of the 12th International Conference on Database TheoryManaging inconsistency in databases has long been recognized as an important problem. One of the most promising approaches to coping with inconsistency in databases is the framework of database repairs, which has been the topic of an extensive ...
Approximate Probabilistic Query Answering over Inconsistent Databases
ER '08: Proceedings of the 27th International Conference on Conceptual ModelingThe problem of managing and querying inconsistent databases has been deeply investigated in the last few years. Most of the approaches proposed so far rely on the notion of <em>repair</em>(a minimal set of delete/insert operations making the database ...
Comments