Skip to main content

Repairing Functional Dependency Violations in Distributed Data

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9049))

Included in the following conference series:

  • 2030 Accesses

Abstract

One of the problems central to data consistency is data repairing. Given a database \(D\) violating a set \(\Sigma \) of data dependencies as data quality rules, it aims to modify \(D\) for a new relation \(D'\) satisfying \(\Sigma \). When \(D\) is a centralized database, a host of methods have been provided to address this problem. In practice, a database may be fragmented and distributed to multiple sites, which is advocated by distributed systems for better scalability and is readily supported by commercial systems. This paper makes a first effort to develop techniques for repairing functional dependency violations in a horizontally partitioned database. (1) Based on a message-passing distributed computing model and two complexity measures (parallel time and data shipment) for distributed algorithms, we study data repairing with equivalence classes in the distributed setting. We show that it is NP-completeto build equivalence classes when the data is horizontally partitioned, and when we aim to minimize either data shipment or parallel computation time. (2) Despite the intractability, we propose efficient distributed algorithms and optimization techniques for data repairing based on equivalence classes. (3) We experimentally verify the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)

    Google Scholar 

  2. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)

    Google Scholar 

  3. Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: Sampling from repairs of conditional functional dependency violations. VLDB Journal 23(1), 103–128 (2014)

    Article  Google Scholar 

  4. Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: On the relative trust between inconsistent data and inaccurate constraints. In: ICDE (2013)

    Google Scholar 

  5. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB (2007)

    Google Scholar 

  6. Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: Putting violations into context. In: ICDE (2013)

    Google Scholar 

  7. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press (2009)

    Google Scholar 

  8. Chiang, F., Miller, R.: A unified model for data and constraint repair. In: ICDE (2011)

    Google Scholar 

  9. Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013)

    Google Scholar 

  10. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)

    Google Scholar 

  11. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. In: TODS 33(2) (2008)

    Google Scholar 

  12. Fan, W., Geerts, F., Ma, S., Muller, H.: Detecting inconsistencies in distributed data. In: ICDE (2010)

    Google Scholar 

  13. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB Journal 21(2), 213–238 (2012)

    Article  Google Scholar 

  14. Fan, W., Li, J., Tang, N., Yu, W.: Incremental detection of inconsistencies in distributed data. TKDE 26(6), 1367–1383 (2014)

    Google Scholar 

  15. Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. In: ICDT (2009)

    Google Scholar 

  16. Lynch, N.: Distributed Algorithms. Morgan Kaufmann (1996)

    Google Scholar 

  17. Ozsu, M., Valduriez, P.: Principles of Distributed Database Systems (2nd edition). Prentice-Hall (1999)

    Google Scholar 

  18. Song, S., Cheng, H., Yu, J., Chen, L.: Repairing vertex labels under neighborhood constraints. In: VLDB (2014)

    Google Scholar 

  19. Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: SIGMOD (2014)

    Google Scholar 

  20. Yakout, M., Elmagarmid, A., Neville, J., Ouzzani, M., Ilyas, I.: Guided data repair. In: VLDB (2011)

    Google Scholar 

  21. UIS data generator. http://www.cs.utexas.edu/users/ml/riddle/data.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zijing Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, Q., Tan, Z., He, C., Sha, C., Wang, W. (2015). Repairing Functional Dependency Violations in Distributed Data. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18120-2_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18119-6

  • Online ISBN: 978-3-319-18120-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics