Skip to main content

A Benchmark Dataset of Online Handwritten Gurmukhi Script Words and Numerals

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1148))

Abstract

This paper presents an online handwritten benchmark dataset (OHWR-Gurmukhi) for Gurmukhi script. TIET, Patiala released the unconstrained online handwriting databases, OHWR-GNumerals and OHWR-GScript, which contain isolated strokes samples produced by 190 writers. The OHWR-GNumerals covers 10 stroke classes and OHWR-GScript covers 95 stroke classes to represent the Gurmukhi character set. For data collection, two data sets of Gurmukhi words have been finalized after having a consultation with language experts in order to collect the balanced stroke samples. The preprocessing methods used to prepare these datasets include: size normalization, removing duplicate points, interpolating missing points and re-sampling. The purpose of this benchmark is to create a common platform and make the benchmark dataset publically available for research endeavors in the area of online handwriting recognition. The dataset is available as supplement at https://sites.google.com/view/ohwr-gurmukhi-script/.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. International Unipen Foundation: The Unipen Project (1994). http://www.unipen.org/home.html

  2. Agrawal, M., Bhaskarabhatla, A.S., Madhvanath, S.: Data collection for handwriting corpus creation in Indic scripts. In: International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA 2004), New Delhi, India, November 2004. Citeseer (2004)

    Google Scholar 

  3. Belhe, S., Chakravarthy, S., Ramakrishnan, A.: XML standard for Indic online handwritten database. In: Proceedings of the International Workshop on Multilingual OCR, p. 19. ACM (2009)

    Google Scholar 

  4. Djeddi, C., Al-Maadeed, S., Gattal, A., Siddiqi, I., Ennaji, A., El Abed, H.: ICFHR2016 competition on multi-script writer demographics classification using “QUWI” database. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 602–606. IEEE (2016)

    Google Scholar 

  5. Fisher, W.M.: The DARPA speech recognition research database: specifications and status. In: Proceedings of DARPA Workshop on Speech Recognition, February 1986, pp. 93–99 (1986)

    Google Scholar 

  6. Godfrey, J.J., Holliman, E.C., McDaniel, J.: SWITCHBOARD: telephone speech corpus for research and development. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992, vol. 1, pp. 517–520. IEEE (1992)

    Google Scholar 

  7. Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)

    Google Scholar 

  8. Hull, J.J., Fenrich, R.K.: Large database organization for document images. In: Impedovo, S. (ed.) Fundamentals in Handwriting Recognition, pp. 397–414. Springer, Heidelberg (1994). https://doi.org/10.1007/978-3-642-78646-4_24

    Chapter  Google Scholar 

  9. Khayyat, M., Lam, L., Suen, C.Y.: Arabic handwritten word spotting using language models. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 43–48. IEEE (2012)

    Google Scholar 

  10. Lamel, L.F., Kassel, R.H., Seneff, S.: Speech database development: design and analysis of the acoustic-phonetic corpus. In: Speech Input/Output Assessment and Speech Databases (1989)

    Google Scholar 

  11. Messaoud, I.B., Amiri, H., El Abed, H., Märgner, V.: Region based local binarization approach for handwritten ancient documents. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 633–638. IEEE (2012)

    Google Scholar 

  12. Phillips, I.T., Ha, J., Haralick, R.M., Dori, D.: The implementation methodology for a CD-ROM English document database. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 484–487. IEEE (1993)

    Google Scholar 

  13. Price, P., Fisher, W.M., Bernstein, J., Pallett, D.S.: The DARPA 1000-word resource management database for continuous speech recognition. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1988, pp. 651–654. IEEE (1988)

    Google Scholar 

  14. Singh, H., Sharma, R.K., Singh, V.P.: Efficient zone identification approach for the recognition of online handwritten Gurmukhi script. Neural Comput. & Applic. 31(8), 3957–3968 (2019)

    Article  Google Scholar 

  15. Singh, H., Sharma, R., Singh, V.: Recognition of online unconstrained handwritten Gurmukhi characters based on finite state automata. Sādhanā 43(11), 192 (2018)

    Article  MathSciNet  Google Scholar 

  16. Wilkinson, R.A., et al.: The first census optical character recognition system conference, vol. 184. US Department of Commerce, National Institute of Standards and Technology (1992)

    Google Scholar 

  17. Xing, L., Qiao, Y.: DeepWriter: a multi-stream deep CNN for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589. IEEE (2016)

    Google Scholar 

Download references

Acknowledgment

The authors take this opportunity to thank Technology Development for Indian Languages (TDIL) Programme, Department of Information Technology, Government of India for funding this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harjeet Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, H., Sharma, R.K., Kumar, R., Verma, K., Kumar, R., Kumar, M. (2020). A Benchmark Dataset of Online Handwritten Gurmukhi Script Words and Numerals. In: Nain, N., Vipparthi, S., Raman, B. (eds) Computer Vision and Image Processing. CVIP 2019. Communications in Computer and Information Science, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-15-4018-9_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-4018-9_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-4017-2

  • Online ISBN: 978-981-15-4018-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics