Abstract
Inconsistent data indicates that there is conflicted information in the data, which can be formalized as the violations of given semantic constraints. To improve data quality, repair means to make the data consistent by modifying the original data. Using the feedbacks of users to direct the repair operations is a popular solution. Under the setting of big data, it is unrealistic to let users give their feedbacks on the whole data set. In this paper, the repair position selection problem (RPS for short) is formally defined and studied. Intuitively, the RPS problem tries to find an optimal set of repair positions under the limitation of repairing cost such that we can obtain consistent data as many as possible. First, the RPS problem is formalized. Then, by considering three different repair strategies, the complexities and approximabilities of the corresponding RPS problems are studied.
Similar content being viewed by others
Notes
In fact, most of the results in this paper can also apply to other kinds of dependency rules such as conditional functional dependency (Cong et al. 2007) and so on.
References
Abiteboul S, Hull R, Vianu V (eds) (1995) Foundations of databases: the logical level, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston
Arenas M, Bertossi L, Chomicki J (1999) Consistent query answers in inconsistent databases. In: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’99, New York, NY, USA, ACM, pp 68–79
Beskales G, Ilyas IF, Golab L, Galiullin A (2014) Sampling from repairs of conditional functional dependency violations. VLDB J 23:103–128
Bohannon P, Fan W, Flaster M, Rastogi R (2005) A cost-based model and effective heuristic for repairing constraints by value modification In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD ’05, New York, NY, USA, ACM pp 143–154
Bohannon P, Fan W, Geerts F, Jia X, Kementsietsidis A (2007) Conditional functional dependencies for data cleaning. In: 2007 IEEE 23rd international conference on data engineering, pp 746–755
Cai Z, Heydari M, Lin G (2006) Iterated local least squares microarray missing value imputation. J Bioinform Comput Biol 4:935–958
Chiang F, Miller RJ (2011) A unified model for data and constraint repair. In: Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11. IEEE Computer Society, Washington, DC, USA, pp 446–457
Chomicki J, Marcinkowski Jerzy (2005) Minimal-change integrity maintenance using tuple deletions. Inf Comput 197:90–121
Cong G, Fan W, Geerts F, Jia X, Ma S (2007) Improving data quality: consistency and accuracy. In: Proceedings of the 33rd international conference on very large data bases, VLDB ’07, VLDB Endowment, pp 315–326
Cong G, Fan W, Geerts F, Li J, Luo J (2012) On the complexity of view update analysis and its application to annotation propagation. IEEE Trans Knowl Data Eng 24:506–519
Cosmadakis SS, Papadimitriou CH (1984) Updates of relational views. J ACM 31:742–760
Decker H, Martinenghi D (2011) Inconsistency-tolerant integrity checking. IEEE Trans Knowl Data Eng 23:218–234
Eiter T, Fink M, Greco G, Lembo D (2008) Repair localization for query answering from inconsistent databases. ACM Trans Database Syst 33:10:1–10:51
Feige U (1998) A threshold of ln n for approximating set cover. J. ACM 45:634–652
Feige U, Seltser M (1997) On the densest k-subgraph problems. Technical report. The Weizmann Institute. Jerusalem, Israel
Feige U, Peleg D, Kortsarz G (2001) The dense k-subgraph problem. Algorithmica 29:410–421
Fuxman A, Miller RJ (2007) First-order query rewriting for inconsistent databases. J Comput Syst Sci 73:610–635
Greco S, Sirangelo C, Trubitsyna I, Zumpano E (2003) Preferred repairs for inconsistent databases. In: Proceedings of the seventh international database engineering and applications symposium. July 2003, pp 202–211
Kimelfeld B, Vondrák J, Woodruff DP (2013) Multi-tuple deletion propagation: approximations and complexity. Proc VLDB Endow 6:1558–1569
Kuhn H (1955) The hungarian method for the assignment problem. Nav Res Logist Q 2:83–97
Lechtenbörger J, Vossen G (2003) On the computation of relational view complements. ACM Trans Database Syst 28:175–208
Li J, Liu X (2013) An important aspect of big data: data usability. J Comput Res Dev 50:1147–1162
Lopatenko A, Bravo L (2007) Efficient approximation algorithms for repairing inconsistent databases. In: 2007 IEEE 23rd international conference on data engineering, pp 216–225
Lopatenko A, Leopoldo B (2006) Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In: Proceedings of the 11th international conference on database theory, ICDT’07. Springer, Berlin, pp 179–193
Miao D, Liu X, Li J (2016) On the complexity of sampling query feedback restricted database repair of functional dependency violations. Theor Comput Sci 609:594–605
Staworko SS, Chomicki J (2010) Consistent query answers in the presence of universal constraints. Inf Syst 35:1–22
Wang Y, Cai Z, Stothard P, Moore S, Goebel R, Wang L, Lin G (2012) Fast accurate missing snp genotype local imputation. BMC Res Notes 5:404
West DB (2001) Introduction to graph theory. Prentice Hall, New York
Wijsen J (2002) Condensed representation of database repairs for consistent query answering. In: Proceedings of the 9th international conference on database theory, ICDT ’03, London, Springer, London, UK, pp 378–393
Zhao B, Zhou H, Li G, Huang Y (2018) Zenlda: large-scale topic model training on distributed data-parallel platform. Big Data Min Anal 1:57–74
Zhou B, Li J, Wang X, Gu Y, Xu L, Hu Y, Zhu L (2018) Online internet traffic monitoring system using spark streaming. Big Data Min Anal 1:47–56
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by the National Natural Science Foundation of China under Grants 61832003, 61502121, 61772157, 61872106, the China Postdoctoral Science Foundation under Grant 2016M590284, the Fundamental Research Funds for the Central Universities (Grant No. HIT.NSRIF.201649), and Heilongjiang Postdoctoral Foundation (Grant No. LBH-Z15094).
Rights and permissions
About this article
Cite this article
Liu, X., Li, Y., Li, J. et al. On the complexity and approximability of repair position selection problem. J Comb Optim 42, 354–372 (2021). https://doi.org/10.1007/s10878-018-0362-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-018-0362-y