Abstract
Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ABBYY FineReader professional v6.0
Adamek, T., O’Connor, N.E., Smeaton, A.F.: Word matching using single closed contours for indexing handwritten historical documents, IJDAR (2007)
Ambauen, R., Fischer, S., Bunke, H.: Graph Edit Distance with Node Splitting and Merging and its Application to Diatom Identification. In: Hancock, E.R., Vento, M. (eds.) GbRPR 2003. LNCS, vol. 2726, pp. 259–264. Springer, Heidelberg (2003)
Antonacopoulos, A., Karatzas, D., Krawczyk, H., Wiszniewski, B.: The Lifecycle of a Digital Historical Document: Structure and Content. In: ACM Symposium on DE (2004)
Baird, H.S.: Difficult and urgent open problems in document image analysis for libraries. In: 1st International workshop on Document Image Analysis for Libraries (2004)
Digital Library of Bibliotheque Interuniversitaire de Medecine, Paris, http://www.bium.univparis5.fr/histmed/medica.htm
Kaygin, S., Bulut, M.M.: Shape recognition using attributed string matching with polygon vertices as the primitives. Pattern Recognition Letters (2002)
Keogh, E., Pazzani, M.: Derivative Dynamic Time Warping. In: First SIAM International Conference on Data Mining, Chicago, IL (2001)
Khurshid, K., Faure, C., Vincent, N.: Feature based word spotting in ancient printed documents. In: Proceedings of PRIS (2008)
Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of Niblack inspired binarization techniques for ancient document images. In: 16th International conference DDR (2009)
Manolis, C., Brey, G.: Edit Distance with Single-Symbol Combinations and Splits. In: Proceedings of the Prague Stringology Conference (2008)
Rath, T.M., Manmatha, R.: Word Spotting for historical documents. IJDAR (2007)
Waard, W.P.: An optimised minimal edit distance for hand-written word recognition. Pattern Recognition Letters (1995)
Wong, K.Y., Casey, R.G., Wahl, F.M.: Document analysis system. IBM J. Res. Development (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Khurshid, K., Faure, C., Vincent, N. (2009). A Novel Approach for Word Spotting Using Merge-Split Edit Distance. In: Jiang, X., Petkov, N. (eds) Computer Analysis of Images and Patterns. CAIP 2009. Lecture Notes in Computer Science, vol 5702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03767-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-03767-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03766-5
Online ISBN: 978-3-642-03767-2
eBook Packages: Computer ScienceComputer Science (R0)