Skip to main content

Separating Indic Scripts with ‘matra’—A Precursor to Script Identification in Multi-script Documents

  • Conference paper
  • First Online:
Proceedings of International Conference on Computer Vision and Image Processing

Abstract

Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44 % from MLP classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ghosh, D., Dube, T., Shivprasad, S. P.: Script Recognition - A Review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)

    Article  Google Scholar 

  2. Hochberg, J., Bowers, K., Cannon, M., Kelly, P.: Script and Language Identification for Handwritten Document Images. Int. J. Doc. Anal. Recog. 2(2/3), 45–52 (1999)

    Article  Google Scholar 

  3. Zhu, G., Yu, X., Li, Y., Doermann, D.: Language Identification for Handwritten Document Images Using A Shape Codebook. Pattern Recog. 42, 3184–3191 (2009)

    Article  MATH  Google Scholar 

  4. Singhal, V., Navin, N., Ghosh, D.: Script-based Classification of Hand-written Text Documents in a Multi-lingual Environment. In: \(13^{th}\) RIDE-MLIM. pp. 47–54 (2003)

    Google Scholar 

  5. Hangarge, M., Dhandra, B. V.: Offline Handwritten Script Identification in Document Images. Int. J. Comput. Appl. 4(6), 6–10 (2010)

    Google Scholar 

  6. Rajput, G., H. B., A.: Handwritten Script Recognition using DCT and Wavelet Features at Block Level. IJCA,Special Issue on RTIPPR. 3, 158–163 (2010)

    Google Scholar 

  7. Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D. K.: Word level Script Identification from Bangla and Devanagri Handwritten Texts Mixed with Roman Script. J. Comput. 2(2), 103–108 (2010)

    Google Scholar 

  8. Hangarge, M., Santosh, K. C., Pardeshi, R.: Directional discrete cosine transform for handwritten script identification. In: ICDAR. pp. 344–348 (2013)

    Google Scholar 

  9. Rani, R., Dhir, R., Lehal, G. S.: Script Identification for Pre-segmented Multi-font Characters and Digits. In: \(12^{th}\) ICDAR. pp. 2010–1154 (2013)

    Google Scholar 

  10. Roy, K., Pal, U.: Word-wise Hand-written Script Separation for Indian Postal Automation. In \(10^{th}\) IWFHR. pp. 521–526 (2006)

    Google Scholar 

  11. Roy, K., Banerjee, A., Pal, U.: A System for Word Wise Handwritten Script Identification for Indian Postal Automation. In: IEEE India Annual Conf. pp. 266–271 (2004)

    Google Scholar 

  12. Mandelbrot, B. B.: The Fractal Geometry of Nature (New York: Freeman). (1982)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sk.Md. Obaidullah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this paper

Cite this paper

Obaidullah, S., Goswami, C., Santosh, K.C., Halder, C., Das, N., Roy, K. (2017). Separating Indic Scripts with ‘matra’—A Precursor to Script Identification in Multi-script Documents. In: Raman, B., Kumar, S., Roy, P., Sen, D. (eds) Proceedings of International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 459. Springer, Singapore. https://doi.org/10.1007/978-981-10-2104-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2104-6_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2103-9

  • Online ISBN: 978-981-10-2104-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics