Abstract
In a real world data set there are usually four kinds of mistaken values, the first one is the mistake in unit; the second one is the mistake of putting the radix points in wrong place, the third one is a scribal error, and the fourth one is a computational mistake. In this paper, we propose two algorithms for finding these four kinds of mistaken data. SARS and coronary heart disease data sets experimental results show that the two algorithms are available, that is, using the two algorithms we find some mistakes in the SARS and coronary heart disease data sets, and the results correspond to that found manually by medical experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Johnson, Theodore, Dasu, Tamrapami: Data Quality and Data Cleaning: An Overview. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)
Zhong-guo, D., Yi-xin, Z.: Research of the Data Cleaning Technique. Journal of Shandong University of Science and Technology (Natural Science) (Chinese) 2, 55–57 (2004)
Simoudis, E., Livezey, B., Kerber, R.: Using Recon for Data Cleaning. In: Proceedings of KDD 1995, pp. 282–287 (1995)
Levitin, A., Redman, T.: A Model of the Data (life) Cycles with Application to Quality. Information and Soft Ware Technology 4, 217–223 (1995)
Jonathan, I., Marcus, M.A.: Data Cleaning: Beyond Integrity Analysis. Division of Computer Science, vol. 2 ( (2000)
Guyon, I., Matic, N., Vapnik, V.: Discovering Information Patterns and Data Cleaning. In: Advances in Knowledge Discovery in Data Ming. MIT Press/ AAAI Press (1996)
Marcus, A., Jonathan, I., Maletic: Utilizing Association Rules for the Identification of Errors in Data. Technical Report. CS-00-04
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Honghai, F., Hao, X., Baoyan, L., LiYun, H., Bingru, Y., Yueli, L. (2006). Algorithms for Finding and Correcting Four Kinds of Data Mistakes in Information Table. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11892960_86
Download citation
DOI: https://doi.org/10.1007/11892960_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46535-5
Online ISBN: 978-3-540-46536-2
eBook Packages: Computer ScienceComputer Science (R0)