Abstract
In this paper, we present our design of a data cleaning framework that combines interaction of data quality rules (CFDS, CINDS and MDs) with user feedback through an interactive process. First, to generate candidate repairs for each potentially dirty attribute, we propose an optimization model based on genetic algorithm. We then create a Bayesian machine learning model with several committees to predict the correctness of the repair and rank these repairs by uncertainly score to improve the learned model. User feedback is used to decide whether the model is accurate while inspecting the suggestions. Finally, our experiments on real-world datasets show significant improvement in data quality.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Herzog, T.N., Scheuren, F.J., Winkler, W.E.: Data Quality and Record Linkage Techniques. Springer, Heidelberg (2007)
Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modication. In: ACM SIGMOD, pp. 143–154 (2005)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality:consistency and accuracy. In: VLDB, pp. 315–326 (2007)
Lopatenko, A., Bravo, L.: Efficient approximation algorithms for repairing inconsistent databases. In: ICDE, pp. 216–225 (2007)
Fan, W., Geerts, F.: Foundations of Data Quality Management. In: Synthesis Lectures on Data Management (2012)
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer (2006)
Greco, G., Greco, S., Zumpano, E.: A logical framework for querying and repairing inconsistent databases. IEEE Trans. Knowl. Data Eng. 15(6), 1389–1408 (2003)
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.: Declarative Data Cleaning: Language, Model and Algorithms. In: VLDB (2001)
Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: VLDB (2001)
Jeery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: ACM SIGMOD, pp. 847–860 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xie, H., Wang, H., Li, J., Gao, H. (2013). A Data Cleaning Framework Based on User Feedback. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)