Abstract
Data quality has become a pervasive challenge for organizations as they wrangle with large, heterogeneous datasets to extract value. Existing data cleaning solutions have focused on scalable techniques to resolve inconsistencies quickly. However, given the proliferation of sensitive, confidential user information, data privacy concerns have largely remained unexplored in data cleaning techniques. In this work, we present a new privacy-aware, data cleaning framework that aims to resolve data inconsistencies while minimizing the amount of information disclosed. We present a set of data disclosure operations that facilitate the data cleaning process, and propose two information-theoretic measures for privacy loss and data utility that are used to correct inconsistencies in the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For illustration purposes of this example, we assume persons of a unique age determine a unique disease.
References
Beskales, G., Ilyas, I., Golab, L.: Sampling the repairs of functional dependency violations under hard constraints. In: VLDB, pp. 197–207 (2010)
Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: On the relative trust between inconsistent data and inaccurate constraints. In: ICDE, pp. 541–552 (2013)
Chiang, F., Miller, R.J.: Active repair of data quality rules. In: ICIQ, pp. 174–188 (2011)
Deb, K.: Multi-objective optimization. In: Burke, E.K., Kendall, G. (eds.) Search Methodologies, pp. 403–449. Springer, New York (2014)
Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comp. Surv. 42(4), 14 (2010)
Howard, P.: The business case for data quality. Bloor Research, White Paper (2012)
Sweeney, L.: k-anonymity: a model for protecting privacy. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Thomas, J., Cover, T.: Elements of Information Theory, vol. 2. Wiley, New York (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, Y., Chiang, F. (2015). Towards a Unified Framework for Data Cleaning and Data Privacy. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-26187-4_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)