Abstract
For numerical record fields such as date and age, many types of error are likely to yield small numerical differences between observed and true values. If, for example, two different sources provide separate case reports related to the same incident, the dates of onset may not match perfectly but are more likely to differ by a few days than by several years. In order to tackle the variations in numbers a few methods are available. The paper proposes a new normalization technique useful for the numerical record. A Comparison of Distance with the Smith Waterman Distance shows significant increase in the weight by the present technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kopal, Z.: Physics and Astronomy of the Moon. Academic Press (1962)
Agrawal, R., Srikant, R.: Searching with numbers. In: Proceedings of the 11th International World Wide Web Conference (WWW11), pp. 420–431 (2002)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proc. of the 1995 ACM SIGMOD Int’l Conf. on Management of Data, pp. 71–79 (1995)
Crespo, A., Jannink, J., Neuhold, E., Rys, M., Studer, R.: A survey of semi-automatic extraction and transformation, http://www-db.stanford.edu/crespo/publications/
Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: The AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)
Noren, G., Orre, R., Bate, A., Edword, I.: Duplicate detection in adverse drug reaction surveillance. Data Mining and Knowledge Discovery Journal, 306–328 (2007)
http://www.miislita.com/information-retrieval-tutorial/cosine-similarity-tutorial.htmlcosim
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining, ch. 8, p. 500. Addison-Wesley (2005) ISBN 0-321-32136-7
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deshmukh, S.N., Mehrotra, S.C., Singh, H. (2012). Using the Normalization for Typographic Errors in Numerals. In: Kannan, R., Andres, F. (eds) Data Engineering and Management. ICDEM 2010. Lecture Notes in Computer Science, vol 6411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27872-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-27872-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27871-6
Online ISBN: 978-3-642-27872-3
eBook Packages: Computer ScienceComputer Science (R0)