Skip to main content
Log in

On the complexity and approximability of repair position selection problem

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Inconsistent data indicates that there is conflicted information in the data, which can be formalized as the violations of given semantic constraints. To improve data quality, repair means to make the data consistent by modifying the original data. Using the feedbacks of users to direct the repair operations is a popular solution. Under the setting of big data, it is unrealistic to let users give their feedbacks on the whole data set. In this paper, the repair position selection problem (RPS for short) is formally defined and studied. Intuitively, the RPS problem tries to find an optimal set of repair positions under the limitation of repairing cost such that we can obtain consistent data as many as possible. First, the RPS problem is formalized. Then, by considering three different repair strategies, the complexities and approximabilities of the corresponding RPS problems are studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In fact, most of the results in this paper can also apply to other kinds of dependency rules such as conditional functional dependency (Cong et al. 2007) and so on.

References

  • Abiteboul S, Hull R, Vianu V (eds) (1995) Foundations of databases: the logical level, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston

    Google Scholar 

  • Arenas M, Bertossi L, Chomicki J (1999) Consistent query answers in inconsistent databases. In: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’99, New York, NY, USA, ACM, pp 68–79

  • Beskales G, Ilyas IF, Golab L, Galiullin A (2014) Sampling from repairs of conditional functional dependency violations. VLDB J 23:103–128

    Article  Google Scholar 

  • Bohannon P, Fan W, Flaster M, Rastogi R (2005) A cost-based model and effective heuristic for repairing constraints by value modification In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD ’05, New York, NY, USA, ACM pp 143–154

  • Bohannon P, Fan W, Geerts F, Jia X, Kementsietsidis A (2007) Conditional functional dependencies for data cleaning. In: 2007 IEEE 23rd international conference on data engineering, pp 746–755

  • Cai Z, Heydari M, Lin G (2006) Iterated local least squares microarray missing value imputation. J Bioinform Comput Biol 4:935–958

    Article  Google Scholar 

  • Chiang F, Miller RJ (2011) A unified model for data and constraint repair. In: Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11. IEEE Computer Society, Washington, DC, USA, pp 446–457

  • Chomicki J, Marcinkowski Jerzy (2005) Minimal-change integrity maintenance using tuple deletions. Inf Comput 197:90–121

    Article  MathSciNet  Google Scholar 

  • Cong G, Fan W, Geerts F, Jia X, Ma S (2007) Improving data quality: consistency and accuracy. In: Proceedings of the 33rd international conference on very large data bases, VLDB ’07, VLDB Endowment, pp 315–326

  • Cong G, Fan W, Geerts F, Li J, Luo J (2012) On the complexity of view update analysis and its application to annotation propagation. IEEE Trans Knowl Data Eng 24:506–519

    Article  Google Scholar 

  • Cosmadakis SS, Papadimitriou CH (1984) Updates of relational views. J ACM 31:742–760

    Article  MathSciNet  Google Scholar 

  • Decker H, Martinenghi D (2011) Inconsistency-tolerant integrity checking. IEEE Trans Knowl Data Eng 23:218–234

    Article  Google Scholar 

  • Eiter T, Fink M, Greco G, Lembo D (2008) Repair localization for query answering from inconsistent databases. ACM Trans Database Syst 33:10:1–10:51

    Article  Google Scholar 

  • Feige U (1998) A threshold of ln n for approximating set cover. J. ACM 45:634–652

    Article  MathSciNet  Google Scholar 

  • Feige U, Seltser M (1997) On the densest k-subgraph problems. Technical report. The Weizmann Institute. Jerusalem, Israel

  • Feige U, Peleg D, Kortsarz G (2001) The dense k-subgraph problem. Algorithmica 29:410–421

    Article  MathSciNet  Google Scholar 

  • Fuxman A, Miller RJ (2007) First-order query rewriting for inconsistent databases. J Comput Syst Sci 73:610–635

    Article  MathSciNet  Google Scholar 

  • Greco S, Sirangelo C, Trubitsyna I, Zumpano E (2003) Preferred repairs for inconsistent databases. In: Proceedings of the seventh international database engineering and applications symposium. July 2003, pp 202–211

  • Kimelfeld B, Vondrák J, Woodruff DP (2013) Multi-tuple deletion propagation: approximations and complexity. Proc VLDB Endow 6:1558–1569

    Article  Google Scholar 

  • Kuhn H (1955) The hungarian method for the assignment problem. Nav Res Logist Q 2:83–97

    Article  MathSciNet  Google Scholar 

  • Lechtenbörger J, Vossen G (2003) On the computation of relational view complements. ACM Trans Database Syst 28:175–208

    Article  Google Scholar 

  • Li J, Liu X (2013) An important aspect of big data: data usability. J Comput Res Dev 50:1147–1162

    Google Scholar 

  • Lopatenko A, Bravo L (2007) Efficient approximation algorithms for repairing inconsistent databases. In: 2007 IEEE 23rd international conference on data engineering, pp 216–225

  • Lopatenko A, Leopoldo B (2006) Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In: Proceedings of the 11th international conference on database theory, ICDT’07. Springer, Berlin, pp 179–193

  • Miao D, Liu X, Li J (2016) On the complexity of sampling query feedback restricted database repair of functional dependency violations. Theor Comput Sci 609:594–605

    Article  MathSciNet  Google Scholar 

  • Staworko SS, Chomicki J (2010) Consistent query answers in the presence of universal constraints. Inf Syst 35:1–22

    Article  Google Scholar 

  • Wang Y, Cai Z, Stothard P, Moore S, Goebel R, Wang L, Lin G (2012) Fast accurate missing snp genotype local imputation. BMC Res Notes 5:404

    Article  Google Scholar 

  • West DB (2001) Introduction to graph theory. Prentice Hall, New York

    Google Scholar 

  • Wijsen J (2002) Condensed representation of database repairs for consistent query answering. In: Proceedings of the 9th international conference on database theory, ICDT ’03, London, Springer, London, UK, pp 378–393

  • Zhao B, Zhou H, Li G, Huang Y (2018) Zenlda: large-scale topic model training on distributed data-parallel platform. Big Data Min Anal 1:57–74

    Article  Google Scholar 

  • Zhou B, Li J, Wang X, Gu Y, Xu L, Hu Y, Zhu L (2018) Online internet traffic monitoring system using spark streaming. Big Data Min Anal 1:47–56

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianmin Liu.

Additional information

This work was supported in part by the National Natural Science Foundation of China under Grants 61832003, 61502121, 61772157, 61872106, the China Postdoctoral Science Foundation under Grant 2016M590284, the Fundamental Research Funds for the Central Universities (Grant No. HIT.NSRIF.201649), and Heilongjiang Postdoctoral Foundation (Grant No. LBH-Z15094).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Li, Y., Li, J. et al. On the complexity and approximability of repair position selection problem. J Comb Optim 42, 354–372 (2021). https://doi.org/10.1007/s10878-018-0362-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-018-0362-y

Keywords

Navigation