Abstract
Most of the researchers around the world focus on developing monolingual Optical Character Recognition (OCR) systems. But in a multilingual country like India, it is quite common that a single document page includes text words written in more than one script. Therefore, OCRing such documents need a script identification module as a prerequisite. This paper reports a complete script recognition system for handwritten mixed-script documents. The document pages are first segmented into their corresponding text-lines and words. Then, the script recognition is done at word-level using texture-based features. The present technique is applied on 100 mixed-script document pages written in Bangla or Devanagari text mixed with English words. Encouraging outcomes would motivate more researchers to work on multilingual handwriting recognition domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Singh, P.K., Sarkar, R., Nasipuri, M.: Offline Script Identification from Multilingual Indic-script Documents: A state-of-the-art. Computer Science Review (Elsevier). 15–16, 1–28 (2015).
Obaidullah, S.M., Kundu, S.K., Roy, K.: A System for Handwritten Script Identification from Indian Document. Journal of Pattern Recognition Research. 8, 1–12 (2013).
Padma, M.C., Vijaya, P.A.: Global Approach for Script Identification using Wavelet Packet Based Features. International Journal of Signal Processing, Image Processing and Pattern Recogntion. 20, 29–40 (2010).
Hiremath, P.S., Shivshankar, S., Pujari, J.D., Mouneswara, V.: Script identification in a handwritten document image using texture features. In: IEEE 2nd International Conference on Advance Computing. pp. 110–114 (2010).
Hangarge, M., Dhandra, B. V: Offline Handwritten Script Identification in Document Images. International Journal of Computer Applications (IJCA). 4, (2010).
Singh, P.K., Sarkar, R., Nasipuri, M.: Line-level Script Identification for six handwritten scripts using texture based features. In: 2nd Information Systems Design and In-telligent Applications, AISC. pp. 285–293 (2015).
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: Word level script Identification from Bangla and Devnagari Handwritten texts mixed with Roman scripts. Journal of Computing. 2, 103–108 (2010).
Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Rec-ognition Letters. 29, 1218–1229 (2008).
Singh, P.K., Sarkar, R., Das, N., Basu, S., Nasipuri, M.: Identification of Devnagari and Roman script from Multiscript Handwritten documents. In: 5th International Conference on PReMI, LNCS 8251. pp. 509–514 (2013).
Singh, P.K., Mondal, A., Bhowmik, S., Sarkar, R., Nasipuri, M.: Word-level Script Identification from Multi-script Handwritten Documents. In: 3rd International Conference on Frontiers in Intelligent Computing Theory and Applications (FICTA). pp. 551–558 (2014).
Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recognition Letters. 35, 23–33 (2014).
Saabni, R., El-Sana, J.: Language-independent text lines extraction using seam carving. In: IEEE International Conference on Document Analysis and Recognition. pp. 563–568 (2011).
Singh, P.K., Chowdhury, S.P., Sinha, S., Eum, S., Sarkar, R.: Page-to-Word Extraction from Unconstrained Handwritten Document Images. In: 1st International Conference on Intelligent Computing and Communication(ICIC2) (2016).
Harris, C., Stephens, M.: A combined corner and edge detector. Alvey vision Conference. 15, (1988).
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining. pp. 226–231 (1996).
Laws, K.: Rapid Texture Identification. Image Processing for Missile Guidance. SPIE. 238, 376–380 (1980).
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. vol. 1, Prentice-hall, (1992).
Tamura, H., Mori, S., Yamawaki, T.: Textural Features Corresponding to Visual Perception. IEEE Transactions on Systems, Man, and Cybernetics. 8, 460–473 (1978).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singh, P.K., Das, S., Sarkar, R., Nasipuri, M. (2017). Handwritten Mixed-Script Recognition System: A Comprehensive Approach. In: Satapathy, S., Bhateja, V., Udgata, S., Pattnaik, P. (eds) Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications . Advances in Intelligent Systems and Computing, vol 515. Springer, Singapore. https://doi.org/10.1007/978-981-10-3153-3_78
Download citation
DOI: https://doi.org/10.1007/978-981-10-3153-3_78
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3152-6
Online ISBN: 978-981-10-3153-3
eBook Packages: EngineeringEngineering (R0)