Abstract
This paper proposes a framework based on the word spotting technology for indexing and retrieving the historical Mongolian document images. In the framework, the scanned document images are segmented into word images by some preprocessing steps such as binarization, connected component analysis and so on. And then each word image is processed by the following procedure, including removing inflectional suffixes, feature extraction and fixed-length representation. Finally, each word image is represented by a fixed-length feature vector and considered as an indexing term. At the retrieval stage, the necessary query keyword image can be obtained by synthesizing a sequence of glyphs according to the spelling rules of Mongolian language. For word matching, the query keyword image is also converted into a fixed-length feature vector through the same procedure. And a ranking list can be returned in descending order of similarities between the query keyword image and each candidate word image. Experimental results on the data set prove the feasibility and effectiveness of the proposed framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian Words Recognition in Historical Document. In: Proceedings of ICDAR 2011, pp. 692–697 (2011)
Louloudis, G., Kesidis, A.L., Gatos, B.: Efficient Word Retrieval Using a Multiple Ranking Combination Scheme. In: Proceedings of DAS 2012, pp. 379–383 (2012)
Kesidis, A.L., Gatos, B.: Efficient Cut-off Threshold Estimation for Word Spotting Applications. In: Proceedings of ICDAR 2011, pp. 279–283 (2011)
Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing Handwriting Using Word Matching. In: Proceedings of the ICDL 1996, pp. 151–159 (1996)
Rath, T.M., Manmatha, R.: Features for Word Spotting in Historical Manuscripts. In: Proceedings of ICDAR 2003, pp. 218–222 (2003)
Gatos, B., Konidaris, T., Ntzios, K., Pratikakis, I., Perantonis, S.J.: A Segmentation-free Approach for Keyword Search in Historical Typewritten Documents. In: Proceedings of ICDAR 2005, pp. 54–58 (2005)
Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace Method for Text Retrieval in Historical Document Images. In: Proceedings of ICDAR 2005, pp. 437–441 (2005)
Ataer, E., Duygulu, P.: Matching Ottoman words: An Image Retrieval Approach to Historical Document Indexing. In: Proceedings of CIVR 2007, pp. 341–347 (2007)
Wei, H., Gao, G., Bao, Y., Wang, Y.: An Efficient Binarization Method for Ancient Mongolian Document Images. In: Proceedings of ICACTE 2010, pp. 43–46 (2010)
Wei, H., Gao, G., Bao, Y.: A Method for Removing Inflectional Suffixes in Word Spotting of Mongolian Kanjur. In: Proceedings of ICDAR 2011, pp. 88–92 (2011)
Rath, T.M., Manmatha, R.: Word Image Matching Using Dynamic Time Warping. In: Proceedings of CVPR 2003, pp. 521–527 (2003)
Wei, H., Gao, G., Zhang, X.: Indexing for Mongolian Kanjur Images in Word Spotting. Journal of Computational Information Systems 9(4), 1501–1508 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wei, H., Gao, G. (2013). Word Spotting Application in Historical Mongolian Document Images. In: Huang, DS., Bevilacqua, V., Figueroa, J.C., Premaratne, P. (eds) Intelligent Computing Theories. ICIC 2013. Lecture Notes in Computer Science, vol 7995. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39479-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-39479-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39478-2
Online ISBN: 978-3-642-39479-9
eBook Packages: Computer ScienceComputer Science (R0)