Separating Indic Scripts with ‘matra’—A Precursor to Script Identification in Multi-script Documents

Obaidullah, Sk.Md.; Goswami, Chitrita; Santosh, K. C.; Halder, Chayan; Das, Nibaran; Roy, Kaushik

doi:10.1007/978-981-10-2104-6_19

Sk.Md. Obaidullah¹⁸,
Chitrita Goswami¹⁸,
K. C. Santosh¹⁹,
Chayan Halder²⁰,
Nibaran Das²¹ &
…
Kaushik Roy²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 459))

1157 Accesses
4 Citations

Abstract

Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44 % from MLP classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ghosh, D., Dube, T., Shivprasad, S. P.: Script Recognition - A Review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)
Article Google Scholar
Hochberg, J., Bowers, K., Cannon, M., Kelly, P.: Script and Language Identification for Handwritten Document Images. Int. J. Doc. Anal. Recog. 2(2/3), 45–52 (1999)
Article Google Scholar
Zhu, G., Yu, X., Li, Y., Doermann, D.: Language Identification for Handwritten Document Images Using A Shape Codebook. Pattern Recog. 42, 3184–3191 (2009)
Article MATH Google Scholar
Singhal, V., Navin, N., Ghosh, D.: Script-based Classification of Hand-written Text Documents in a Multi-lingual Environment. In: \(13^{th}\) RIDE-MLIM. pp. 47–54 (2003)
Google Scholar
Hangarge, M., Dhandra, B. V.: Offline Handwritten Script Identification in Document Images. Int. J. Comput. Appl. 4(6), 6–10 (2010)
Google Scholar
Rajput, G., H. B., A.: Handwritten Script Recognition using DCT and Wavelet Features at Block Level. IJCA,Special Issue on RTIPPR. 3, 158–163 (2010)
Google Scholar
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D. K.: Word level Script Identification from Bangla and Devanagri Handwritten Texts Mixed with Roman Script. J. Comput. 2(2), 103–108 (2010)
Google Scholar
Hangarge, M., Santosh, K. C., Pardeshi, R.: Directional discrete cosine transform for handwritten script identification. In: ICDAR. pp. 344–348 (2013)
Google Scholar
Rani, R., Dhir, R., Lehal, G. S.: Script Identification for Pre-segmented Multi-font Characters and Digits. In: \(12^{th}\) ICDAR. pp. 2010–1154 (2013)
Google Scholar
Roy, K., Pal, U.: Word-wise Hand-written Script Separation for Indian Postal Automation. In \(10^{th}\) IWFHR. pp. 521–526 (2006)
Google Scholar
Roy, K., Banerjee, A., Pal, U.: A System for Word Wise Handwritten Script Identification for Indian Postal Automation. In: IEEE India Annual Conf. pp. 266–271 (2004)
Google Scholar
Mandelbrot, B. B.: The Fractal Geometry of Nature (New York: Freeman). (1982)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Aliah University, Kolkata, West Bengal, India
Sk.Md. Obaidullah & Chitrita Goswami
Department of Computer Science, The University of South Dakota, Vermillion, SD, USA
K. C. Santosh
Department of Computer Science, West Bengal State University, Kolkata, West Bengal, India
Chayan Halder & Kaushik Roy
Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
Nibaran Das

Authors

Sk.Md. Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
Chitrita Goswami
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
Chayan Halder
View author publications
You can also search for this author in PubMed Google Scholar
Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sk.Md. Obaidullah .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Balasubramanian Raman
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Sanjeev Kumar
Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Partha Pratim Roy
Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Debashis Sen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Obaidullah, S., Goswami, C., Santosh, K.C., Halder, C., Das, N., Roy, K. (2017). Separating Indic Scripts with ‘matra’—A Precursor to Script Identification in Multi-script Documents. In: Raman, B., Kumar, S., Roy, P., Sen, D. (eds) Proceedings of International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 459. Springer, Singapore. https://doi.org/10.1007/978-981-10-2104-6_19

Download citation

DOI: https://doi.org/10.1007/978-981-10-2104-6_19
Published: 23 December 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2103-9
Online ISBN: 978-981-10-2104-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics