Skip to main content

Handling Dirty Databases: From User Warning to Data Cleaning — Towards an Interactive Approach

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6379))

Abstract

One can conceive many reasonable ways of characterizing how dirty a database is with respect to a set of integrity constraints (e.g., functional dependencies). However, dirtiness measures, as good as they can be, are difficult to interpret for an end-user and do not give the database administrator much hint about how to clean the base. This paper discusses these aspects and proposes some methods aimed at either helping the user or the administrator overcome the limitations of dirtiness measures when it comes to handling dirty databases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Decker, H., Martinenghi, D.: Getting rid of straitjackets for flexible integrity checking. In: DEXA Workshops, pp. 360–364. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  2. Martinenghi, D., Christiansen, H., Decker, H.: Integrity checking and maintenance in relational and deductive databases and beyond. In: Ma, Z. (ed.) Intelligent Databases: Technologies and Applications, pp. 238–285. Idea Group, USA (2006)

    Google Scholar 

  3. Decker, H., Martinenghi, D.: Avenues to flexible data integrity checking. In: DEXA Workshops, pp. 425–429. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  4. Arenas, M., Bertossi, L.E., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. TPLP 3(4-5), 393–424 (2003)

    MATH  MathSciNet  Google Scholar 

  5. Wijsen, J.: Project-join-repair: An approach to consistent query answering under functional dependencies. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Martinez, M.V., Pugliese, A., Simari, G.I., Subrahmanian, V.S., Prade, H.: How dirty is your relational database? An axiomatic approach. In: Mellouli, K. (ed.) ECSQARU 2007. LNCS (LNAI), vol. 4724, pp. 103–114. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: Proc. of ICDE 2007, pp. 746–755 (2007)

    Google Scholar 

  8. Delgado, M., Martin-Bautista, M.-J., Sanchez, D., Vila, M.-A.: Mining strong approximate dependencies from relational databases. In: Proc. of IPMU 2000, pp. 1123–1130 (2000)

    Google Scholar 

  9. Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theor. Comput. Sci. 149(1), 129–149 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  10. Baral, C., Kraus, S., Minker, J., Subrahmanian, V.S.: Combining knowledge bases consisting of first-order analysis. Computational Intelligence 8, 45–71 (1992)

    Article  Google Scholar 

  11. Lozinskii, E.L.: Resolving contradictions: A plausible semantics for inconsistent systems. J. Autom. Reasoning 12(1), 1–32 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  12. Hunter, A., Konieczny, S.: Approaches to measuring inconsistent information. In: Bertossi, L., Hunter, A., Schaub, T. (eds.) Inconsistency Tolerance. LNCS, vol. 3300, pp. 191–236. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Grant, J., Hunter, A.: Measuring inconsistency in knowledgebases. J. Intell. Inf. Syst. 27(2), 159–184 (2006)

    Article  Google Scholar 

  14. De Luca, A., Termini, S.: A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory. Information and Control 20(4), 301–312 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  15. Bertossi, L.E.: Consistent query answering in databases. SIGMOD Record 35(2), 68–76 (2006)

    Article  Google Scholar 

  16. Chomicki, J.: Consistent query answering: Five easy pieces. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 1–17. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Lipski, W.: On semantic issues connected with incomplete information databases. ACM Transactions on Database Systems 4(3), 262–296 (1979)

    Article  Google Scholar 

  18. Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)

    Google Scholar 

  19. Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 375–390. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: SIGMOD Conference, pp. 143–154 (2005)

    Google Scholar 

  21. Fan, W., Geerts, F., Jia, X.: Conditional dependencies: A principled approach to improving data quality. In: Proc. of BNCOD 2009, pp. 8–20 (2009)

    Google Scholar 

  22. Fan, W., Geerts, F., Jia, X.: Semandaq: a data quality system based on conditional functional dependencies. PVLDB 1(2), 1460–1463 (2008)

    Google Scholar 

  23. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: Proc. of VLDB 2007 07, pp. 315–326 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pivert, O., Prade, H. (2010). Handling Dirty Databases: From User Warning to Data Cleaning — Towards an Interactive Approach. In: Deshpande, A., Hunter, A. (eds) Scalable Uncertainty Management. SUM 2010. Lecture Notes in Computer Science(), vol 6379. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15951-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15951-0_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15950-3

  • Online ISBN: 978-3-642-15951-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics