Algorithms for Finding and Correcting Four Kinds of Data Mistakes in Information Table

Honghai, Feng; Hao, Xu; Baoyan, Liu; LiYun, He; Bingru, Yang; Yueli, Li

doi:10.1007/11892960_86

Feng Honghai^21,22,
Xu Hao^23,24,
Liu Baoyan²⁵,
He LiYun²⁵,
Yang Bingru²² &
…
Li Yueli²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4251))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

859 Accesses

Abstract

In a real world data set there are usually four kinds of mistaken values, the first one is the mistake in unit; the second one is the mistake of putting the radix points in wrong place, the third one is a scribal error, and the fourth one is a computational mistake. In this paper, we propose two algorithms for finding these four kinds of mistaken data. SARS and coronary heart disease data sets experimental results show that the two algorithms are available, that is, using the two algorithms we find some mistakes in the SARS and coronary heart disease data sets, and the results correspond to that found manually by medical experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Johnson, Theodore, Dasu, Tamrapami: Data Quality and Data Cleaning: An Overview. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)
Google Scholar
Zhong-guo, D., Yi-xin, Z.: Research of the Data Cleaning Technique. Journal of Shandong University of Science and Technology (Natural Science) (Chinese) 2, 55–57 (2004)
Google Scholar
Simoudis, E., Livezey, B., Kerber, R.: Using Recon for Data Cleaning. In: Proceedings of KDD 1995, pp. 282–287 (1995)
Google Scholar
Levitin, A., Redman, T.: A Model of the Data (life) Cycles with Application to Quality. Information and Soft Ware Technology 4, 217–223 (1995)
Google Scholar
Jonathan, I., Marcus, M.A.: Data Cleaning: Beyond Integrity Analysis. Division of Computer Science, vol. 2 ( (2000)
Google Scholar
Guyon, I., Matic, N., Vapnik, V.: Discovering Information Patterns and Data Cleaning. In: Advances in Knowledge Discovery in Data Ming. MIT Press/ AAAI Press (1996)
Google Scholar
Marcus, A., Jonathan, I., Maletic: Utilizing Association Rules for the Identification of Errors in Data. Technical Report. CS-00-04
Google Scholar

Download references

Author information

Authors and Affiliations

Hebei Agricultural University, 071001, Baoding, Hebei, China
Feng Honghai & Li Yueli
University of Science and Technology Beijing, 100083, Beijing, China
Feng Honghai & Yang Bingru
China_Japan Friendship Hospital, 100029, Beijing, China
Xu Hao
Xiyuan Hospital, China Academy of Traditional Chinese Medicine,
Xu Hao
China Academy of Traditional Chinese Medicine, 100700, Beijing, China
Liu Baoyan & He LiYun

Authors

Feng Honghai
View author publications
You can also search for this author in PubMed Google Scholar
Xu Hao
View author publications
You can also search for this author in PubMed Google Scholar
Liu Baoyan
View author publications
You can also search for this author in PubMed Google Scholar
He LiYun
View author publications
You can also search for this author in PubMed Google Scholar
Yang Bingru
View author publications
You can also search for this author in PubMed Google Scholar
Li Yueli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Design, Engineering and Computing, Bournemouth University, UK
Bogdan Gabrys
Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Honghai, F., Hao, X., Baoyan, L., LiYun, H., Bingru, Y., Yueli, L. (2006). Algorithms for Finding and Correcting Four Kinds of Data Mistakes in Information Table. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11892960_86

Download citation

DOI: https://doi.org/10.1007/11892960_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46535-5
Online ISBN: 978-3-540-46536-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics