A Benchmark Dataset of Online Handwritten Gurmukhi Script Words and Numerals

Singh, Harjeet; Sharma, R. K.; Kumar, Rajesh; Verma, Karun; Kumar, Ravinder; Kumar, Munish

doi:10.1007/978-981-15-4018-9_41

A Benchmark Dataset of Online Handwritten Gurmukhi Script Words and Numerals

Harjeet Singh⁹,
R. K. Sharma¹⁰,
Rajesh Kumar¹⁰,
Karun Verma¹⁰,
Ravinder Kumar¹⁰ &
…
Munish Kumar¹¹

Conference paper
First Online: 29 March 2020

709 Accesses
4 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1148))

Abstract

This paper presents an online handwritten benchmark dataset (OHWR-Gurmukhi) for Gurmukhi script. TIET, Patiala released the unconstrained online handwriting databases, OHWR-GNumerals and OHWR-GScript, which contain isolated strokes samples produced by 190 writers. The OHWR-GNumerals covers 10 stroke classes and OHWR-GScript covers 95 stroke classes to represent the Gurmukhi character set. For data collection, two data sets of Gurmukhi words have been finalized after having a consultation with language experts in order to collect the balanced stroke samples. The preprocessing methods used to prepare these datasets include: size normalization, removing duplicate points, interpolating missing points and re-sampling. The purpose of this benchmark is to create a common platform and make the benchmark dataset publically available for research endeavors in the area of online handwriting recognition. The dataset is available as supplement at https://sites.google.com/view/ohwr-gurmukhi-script/.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

International Unipen Foundation: The Unipen Project (1994). http://www.unipen.org/home.html
Agrawal, M., Bhaskarabhatla, A.S., Madhvanath, S.: Data collection for handwriting corpus creation in Indic scripts. In: International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA 2004), New Delhi, India, November 2004. Citeseer (2004)
Google Scholar
Belhe, S., Chakravarthy, S., Ramakrishnan, A.: XML standard for Indic online handwritten database. In: Proceedings of the International Workshop on Multilingual OCR, p. 19. ACM (2009)
Google Scholar
Djeddi, C., Al-Maadeed, S., Gattal, A., Siddiqi, I., Ennaji, A., El Abed, H.: ICFHR2016 competition on multi-script writer demographics classification using “QUWI” database. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 602–606. IEEE (2016)
Google Scholar
Fisher, W.M.: The DARPA speech recognition research database: specifications and status. In: Proceedings of DARPA Workshop on Speech Recognition, February 1986, pp. 93–99 (1986)
Google Scholar
Godfrey, J.J., Holliman, E.C., McDaniel, J.: SWITCHBOARD: telephone speech corpus for research and development. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992, vol. 1, pp. 517–520. IEEE (1992)
Google Scholar
Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)
Google Scholar
Hull, J.J., Fenrich, R.K.: Large database organization for document images. In: Impedovo, S. (ed.) Fundamentals in Handwriting Recognition, pp. 397–414. Springer, Heidelberg (1994). https://doi.org/10.1007/978-3-642-78646-4_24
Chapter Google Scholar
Khayyat, M., Lam, L., Suen, C.Y.: Arabic handwritten word spotting using language models. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 43–48. IEEE (2012)
Google Scholar
Lamel, L.F., Kassel, R.H., Seneff, S.: Speech database development: design and analysis of the acoustic-phonetic corpus. In: Speech Input/Output Assessment and Speech Databases (1989)
Google Scholar
Messaoud, I.B., Amiri, H., El Abed, H., Märgner, V.: Region based local binarization approach for handwritten ancient documents. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 633–638. IEEE (2012)
Google Scholar
Phillips, I.T., Ha, J., Haralick, R.M., Dori, D.: The implementation methodology for a CD-ROM English document database. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 484–487. IEEE (1993)
Google Scholar
Price, P., Fisher, W.M., Bernstein, J., Pallett, D.S.: The DARPA 1000-word resource management database for continuous speech recognition. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1988, pp. 651–654. IEEE (1988)
Google Scholar
Singh, H., Sharma, R.K., Singh, V.P.: Efficient zone identification approach for the recognition of online handwritten Gurmukhi script. Neural Comput. & Applic. 31(8), 3957–3968 (2019)
Article Google Scholar
Singh, H., Sharma, R., Singh, V.: Recognition of online unconstrained handwritten Gurmukhi characters based on finite state automata. Sādhanā 43(11), 192 (2018)
Article MathSciNet Google Scholar
Wilkinson, R.A., et al.: The first census optical character recognition system conference, vol. 184. US Department of Commerce, National Institute of Standards and Technology (1992)
Google Scholar
Xing, L., Qiao, Y.: DeepWriter: a multi-stream deep CNN for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589. IEEE (2016)
Google Scholar

Download references

Acknowledgment

The authors take this opportunity to thank Technology Development for Indian Languages (TDIL) Programme, Department of Information Technology, Government of India for funding this work.

Author information

Authors and Affiliations

Chitkara University, Institute of Engineering and Technology, Chitkara University, Punjab, India
Harjeet Singh
Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, 147004, India
R. K. Sharma, Rajesh Kumar, Karun Verma & Ravinder Kumar
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, 151001, India
Munish Kumar

Authors

Harjeet Singh
View author publications
You can also search for this author in PubMed Google Scholar
R. K. Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Karun Verma
View author publications
You can also search for this author in PubMed Google Scholar
Ravinder Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Munish Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harjeet Singh .

Editor information

Editors and Affiliations

Malaviya National Institute of Technology, Jaipur, Rajasthan, India
Neeta Nain
Malaviya National Institute of Technology, Jaipur, Rajasthan, India
Santosh Kumar Vipparthi
Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Balasubramanian Raman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, H., Sharma, R.K., Kumar, R., Verma, K., Kumar, R., Kumar, M. (2020). A Benchmark Dataset of Online Handwritten Gurmukhi Script Words and Numerals. In: Nain, N., Vipparthi, S., Raman, B. (eds) Computer Vision and Image Processing. CVIP 2019. Communications in Computer and Information Science, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-15-4018-9_41

Download citation

DOI: https://doi.org/10.1007/978-981-15-4018-9_41
Published: 29 March 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4017-2
Online ISBN: 978-981-15-4018-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics