Abstract
One of the problems central to data consistency is data repairing. Given a database \(D\) violating a set \(\Sigma \) of data dependencies as data quality rules, it aims to modify \(D\) for a new relation \(D'\) satisfying \(\Sigma \). When \(D\) is a centralized database, a host of methods have been provided to address this problem. In practice, a database may be fragmented and distributed to multiple sites, which is advocated by distributed systems for better scalability and is readily supported by commercial systems. This paper makes a first effort to develop techniques for repairing functional dependency violations in a horizontally partitioned database. (1) Based on a message-passing distributed computing model and two complexity measures (parallel time and data shipment) for distributed algorithms, we study data repairing with equivalence classes in the distributed setting. We show that it is NP-completeto build equivalence classes when the data is horizontally partitioned, and when we aim to minimize either data shipment or parallel computation time. (2) Despite the intractability, we propose efficient distributed algorithms and optimization techniques for data repairing based on equivalence classes. (3) We experimentally verify the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)
Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: Sampling from repairs of conditional functional dependency violations. VLDB Journal 23(1), 103–128 (2014)
Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: On the relative trust between inconsistent data and inaccurate constraints. In: ICDE (2013)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB (2007)
Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: Putting violations into context. In: ICDE (2013)
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press (2009)
Chiang, F., Miller, R.: A unified model for data and constraint repair. In: ICDE (2011)
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI (2004)
Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. In: TODS 33(2) (2008)
Fan, W., Geerts, F., Ma, S., Muller, H.: Detecting inconsistencies in distributed data. In: ICDE (2010)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB Journal 21(2), 213–238 (2012)
Fan, W., Li, J., Tang, N., Yu, W.: Incremental detection of inconsistencies in distributed data. TKDE 26(6), 1367–1383 (2014)
Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. In: ICDT (2009)
Lynch, N.: Distributed Algorithms. Morgan Kaufmann (1996)
Ozsu, M., Valduriez, P.: Principles of Distributed Database Systems (2nd edition). Prentice-Hall (1999)
Song, S., Cheng, H., Yu, J., Chen, L.: Repairing vertex labels under neighborhood constraints. In: VLDB (2014)
Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: SIGMOD (2014)
Yakout, M., Elmagarmid, A., Neville, J., Ouzzani, M., Ilyas, I.: Guided data repair. In: VLDB (2011)
UIS data generator. http://www.cs.utexas.edu/users/ml/riddle/data.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, Q., Tan, Z., He, C., Sha, C., Wang, W. (2015). Repairing Functional Dependency Violations in Distributed Data. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-18120-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18119-6
Online ISBN: 978-3-319-18120-2
eBook Packages: Computer ScienceComputer Science (R0)