Elsevier

Pattern Recognition

Volume 92, August 2019, Pages 203-218
Pattern Recognition

RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning

https://doi.org/10.1016/j.patcog.2019.03.030Get rights and content

Highlights

  • This article proposes a novel approach for online handwritten cursive and non-cursive word recognition in two of the most popular Indian scripts—Devanagari and Bengali, based on two recently developed versions of Recurrent Neural Network (RNN), named as Long–Short Term Memory (LSTM) and Bidirectional Long–Short Term Memory (BLSTM).

  • The proposed approach divides each word horizontally into three zones—upper, middle, and lower, before carrying out training of basic strokes using LSTM and BLSTM versions of RNN. This type of zone division is done to reduce the variations in temporal orders of basic strokes within a word.

  • The major strength of the proposed system is unlike most of the existing wordrecognition systems in these two scripts, it can recognize those words also which are not present in the trainingdataset as it considers basic stroke based class labelling scheme to train the classifier. The proposed system also overcomes various drawbacks of HMM that are common in existing HMM based word recognition systems.

  • The experiments have been carried out in HMM based platform also to show the comparative performance analysis of the present system in both HMM and RNN based platforms.

  • Experimental results show that the proposed zone segmentation technique and adopting LSTM–BLSTM based learning outperform existing word recognition systems including HMM based ones in these two Indian scripts.

Abstract

Devanagari and Bengali scripts are two of the most popular scripts in India. Most of the existing word recognition studies in these two scripts have relied upon the widely used Hidden Markov Model (HMM), in spite of its familiar shortcomings. The existing works were evaluated against and performed well in their chosen metrics. But, the existing word recognition systems in these two scripts could not achieve more than 90% recognition accuracy. This article proposes a novel approach for online handwritten cursive and non-cursive word recognition in Devanagari and Bengali scripts based on two recently developed models of Recurrent Neural Network (RNN)—Long–Short Term Memory (LSTM) and Bidirectional Long–Short Term Memory (BLSTM). The proposed approach divides each word horizontally into three zones—upper, middle, and lower, to reduce the variations in basic stroke order within a word. Next, the word portions from middle zone are re-segmented into its basic strokes. Various structural and directional features are then extracted from each basic stroke of the word separately for each zone. These zone wise basic stroke features are then studied using both LSTM and BLSTM versions of RNN. Most of the existing word recognition systems in these two scripts have followed word based class labelling approach, whereas proposed system has followed the basic stroke based class labelling approach. An exhaustive experiment on large datasets has been performed to evaluate the performance of the proposed approach using both RNN and HMM to make a comparative performance analysis. Experimental results show that the proposed RNN based system is superior over HMM achieving 99.50% and 95.24% accuracies in Devanagari and Bengali scripts respectively and outperforms existing HMM based systems in the literature as well.

Introduction

Recognition of handwritten text can be either offline or online. In online handwriting recognition, handwritten text is written using some pen like stylus on a digitizing tablet [1] or smart devices and in this mode time-series of coordinates, indicating the movement of the pen-tip, is collected. But in offline mode, the handwritten text, written on a piece of paper, is fed to the computer through optical scanning and so, only the image of the text is available in this mode. Online handwritten text recognition problem has been well-known in pattern recognition and machine learning community for long. In online handwriting recognition, basic strokes form the building blocks of any online handwritten text. Actually, stroke is a collection of points from one pen-down to the next pen-up position. Basic stroke refers to the stroke which represents either an entire character/modifier or portion of a character/modifier, but not more than one character/modifier. One character/modifier can be written either using a single stroke or more than one strokes. If the character/modifier is written using more than one strokes then suitable basic strokes are combined to generate the required character/modifier. Several techniques are available for online handwritten text recognition in different non-Indian scripts such as Latin [1], [2], [4], Mongolian [5], Nastaliq [6], and Chinese-Japanese-Korean scripts [7], [8], [9], [10]. India is a multilingual country. There are several languages in India belonging to different language families. The major of these families are the Indo-Aryan languages, spoken by 75% of Indians and the Dravidian languages, spoken by 20% of Indians. Other languages spoken in India belong to the Austroasiatic, Sino-Tibetan, a few other minor language families, and isolates. In India, a single script is used to write multiple languages. Devanagari (or simply Nagari) script is used to write for many Indo-Aryan languages including Sanskrit, Hindi, Nepali, Marathi, Konkani, Bodo, Sindhi, and Maithili, among other languages [16]. On the other hand, Bengali script is used to write other Indo-Aryan languages like Bengali, Assamese (with minor variations), and Manipuri. A distribution of popularity of Indian languages is shown in Fig. 1. After relating the popularity distribution of Indian languages shown in this figure with the scripts used to write these languages, it can be established that Devanagari and Bengali are the most and second most popular Indian scripts respectively. Some research works are available on online handwritten word recognition in different Indian scripts as well such as Devanagari [15], [16], Bengali [12], [13], [14], [15], and Tamil scripts [16], [17], [18]. Online recognition of handwritten text in Indian scripts faces some unique challenges as compared to Latin script due to large number of symbols, symbol order variations—especially the variations in temporal ordering of vowel modifiers occur in different samples of the same word. Such modifiers can exist on top or bottom or left or right or even on both sides of the symbol. In Indian scripts, two or more simple characters can be combined together to form another type of characters, known as compound characters. While in Latin script there are primarily 52 characters considering both upper and lowercase alphabets, most Indian scripts have symbols in excess of 100, considering only simple and compound characters, without considering the numerals and other punctuation symbols which occur commonly during writing. The structure of the symbols can also cause significant challenges during recognition. Most symbols in Indian scripts are written in a much complicated way in comparison to Latin or other European scripts. Many of them look very similar except for some minor changes in the symbol. Few similar shaped pairs of characters in two different Indian scripts (Devanagari and Bengali) are shown in Fig. 29.

One of the main reasons of low recognition rates even for the best existing online handwritten word recognition systems in Devanagari and Bengali scripts is the difficulty of segmenting accurately cursive handwritten texts into its basic stroke level. In the literature, most of the existing studies towards online recognition of isolated words in Devanagari [15], [16] and Bengali [12], [13], [14], [15] scripts have relied upon the same, widely-used Hidden Markov Model (HMM) classifier, in spite of not getting high recognition rates. The main reason for using HMM in these two scripts can be attributed to the fact that HMMs are capable to segment and recognize at the same time, obviating the need of segmenting the cursive handwritten word into its basic stroke level. However, HMMs have various shortcomings. One of these is that contextual effects cannot be modelled in these systems using HMMs as they assume the probability of each sample to be classified depends only on current state. They cannot remember the past context and provide the output based on the current input only. Another is that HMMs contain univariate internal states. So, any internal state of an HMM carries only little bits of information about the past observation sequence. Apart from these, standard HMMs use Gaussian mixture model where Gaussians with diagonal covariance matrices are used and these Gaussians are limited in modelling feature space of independent features.

This article proposes a novel approach for online handwritten cursive and non-cursive word recognition in both Devanagari and Bengali scripts based on two recently developed versions of Recurrent Neural Network (RNN), named as Long–Short Term Memory (LSTM) and Bidirectional Long–Short Term Memory (BLSTM), for the first time. BLSTMs are able to access context in both directions along the input sequence as they contain two separate hidden layers—one for processing the input sequence in forward direction and the other for processing it in backwards. Both hidden layers are connected to the same output layer, thus provide the access to past and future context of each point in the sequence. In Bengali script, most of the existing word recognition systems [12], [13] have followed word based class labelling scheme where a limited number of classes (100 classes in [12], 110 classes in [13]) have been considered to train the classifier. In this scheme, any unknown word which is not used to train the classifier cannot be recognized by the system. On the other hand, the proposed system has followed the basic stroke based class labelling scheme to train the RNN classifier. As basic strokes are basic units of online handwriting, so this scheme of class labelling enables recognition of those words also which are not present in the training dataset. Thus, word recognition accuracy of the system increases. Experimental results show that the proposed system outperforms all existing word recognition systems including the systems using HMM, available in the literature for Devanagari and Bengali scripts.

The detailed block diagram of our framework is shown in Fig. 2. In the proposed system, initially, for the given input word image, the word is divided horizontally into three separate zones—upper, middle, and lower, to reduce the variations in temporal order of basic strokes within a word. This type of horizontal zoning of words is done for the first time to improve the performance of recognizing online handwritten words in both Devanagari and Bengali scripts. Next, middle zone portion is re-segmented into its constituent basic strokes. Basic stroke wise features are then extracted from each zone separately and are studied using both LSTM and BLSTM models of RNN to know the label of that particular basic stroke. Experiments have been carried out using both RNN and HMM to make a comparative performance analysis. To study the feature values in HMM based platform, basic stroke wise features obtained from all three zones are combined to generate the features of the entire word. Finally, in the case of RNN based study of feature values, basic stroke wise labels are combined to get the label of the word after consulting with the lexicon, whereas in the case of HMM based study, feature vector sequence is processed using left-to-right continuous density HMMs to know the label of the word.

The rest of the paper is organized as follows. Primer on Devanagari and Bengali scripts is discussed in Section 2. Section 3 details the related works. Section 4 includes the theoretical background of RNN models. Dataset development process has been discussed in Section 5. Section 6 describes different preprocessing techniques. The technique of horizontal zoning of a word is discussed in Section 7. Basic stroke segmentation technique is presented in Section 8. Section 9 depicts various feature extraction methods. Word recognition processes using RNN and HMM are presented in Section 10. Results and outcomes are discussed in Section 11. Strengths and limitations of the present system are presented in Section 12. Sections 13 and 14 present the conclusion and future scope of this research respectively.

Section snippets

Devanagari and Bengali scripts

The alphabet of the modern Devanagari script consists of 12 vowels and 36 consonants, whereas the alphabet of the modern Bengali script consists of 11 vowels and 39 consonants. Simple characters consist of both vowels and consonants in these two scripts. A word is written in these scripts using characters/characters and vowel modifiers. Such modifiers can exist on top or bottom or left or right or even on both sides of the symbol. Fig. 3 shows few examples of vowels, consonants, and vowel

Literature survey

As mentioned earlier, some studies are available on online handwritten word recognition in several non-Indian as well as Indian scripts. Some of those earlier studies available in both Indian and non-Indian scripts are discussed below.

Related work on non-Indian scripts: Jaeger et al. [1] reported an online handwritten text recognition system in Latin script using multi-state time delay neural networks. In this article, change of writing direction, curvature, linearity, curliness, slope, and

Theoretical background of RNN models

The theoretical background of two models of RNN—LSTM and BLSTM are discussed below.

Dataset development

In online handwriting recognition systems, the data are collected using some graphic tablets like Wacom tablet, A4 take note, among others, and light pen. Here, a sensor picks up the pen-tip movements x(t), y(t) as well as pen-up/pen-down switching. During data capturing process, we obtain information such as current position, direction of movement, stopping points, starting points, temporal information of plotted points, and temporal order of strokes. In these systems, the y-coordinate value

Preprocessing

Raw word data captured by the hardware undergo different preprocessing operations before further processing of data. The main objective of various preprocessing operations is to systematize the words and to remove variations among data. The preprocessing phase of this work consists of interpolation, smoothing, resampling, size normalization, and skew correction operations. During online data collection, if pen moves slow then more points are created for an online stroke and points remain closer

Proposed zone segmentation approach

To carry out the segmentation of each word horizontally into three zones—upper, middle, and lower, we propose the same novel approach for both Devanagari and Bengali scripts because most of the characters within a word in these two scripts have a horizontal line (shirorekha in Devanagari, matra in Bengali) at the upper part and adjacent characters within a cursive word generally touch this horizontal line. In both of these scripts, more than one characters and modifiers can be written using a

Segmenting into basic strokes

As RNN needs the pre-segmented input sequence, so after locating the middle zone of a word, the middle zone portion is segmented into its constituent basic strokes. A robust method using dominant point selection has been used here for the purpose of basic stroke segmentation. Dominant points are determined in each stroke for segmentation purpose. Dominant points of a stroke are those points where the stroke changes its slope drastically. The method of determining dominant points in a stroke may

Feature extraction

In the proposed system, various structural and directional features such as writing direction, slope, curvature, curliness, and linearity [1] are extracted from each basic stroke of a word, separately for each zone. These features can exploit the temporal information of online handwritten data where points are plotted according to the change of direction of pen tip movement. The methods of extracting these features are explained below in details with the help of a portion of a basic stroke

Word recognition

Two special models of RNN, specifically designed for sequential data, named as LSTM and BLSTM, have been used in the present work to recognize online handwritten word. For the BLSTM implementation, the theano toolkit1 has been used. The proposed word recognition approach has also been studied in HMM based platform to get an idea of comparative performances between RNN and HMM based platforms. For the HMM implementation, the HTK toolkit [25] has been used.

Experiments and results

The experimental evaluation of the proposed approach has been carried out using the online handwritten word datasets of Devanagari and Bengali scripts as mentioned in Section 5. A total of 10K words have been used for each script in the lexicon.

Discussion

Strengths and limitations of the present system as well as comparative performance analysis with other existing online handwritten word recognition systems are discussed below.

Conclusion

This work proposes LSTM and BLSTM based online handwritten word recognition system in both Devanagari and Bengali scripts for the first time. The proposed system divides each word into three horizontal zones before carrying out training of basic strokes using LSTM and BLSTM versions of RNN. The experiments are carried out in HMM based platform also to show the comparative performance analysis of the present system in both HMM and RNN based platforms. Experimental results show that the proposed

Future work

The rapid hardware developments will further aid to the inclusion of more advanced online handwriting recognition algorithms even at the cost of computational and memory requirements. Similar work can be carried out in other Indian scripts also and its need is increasing with the availability of operating system in regional languages. There is a scope to improve the performance of the present system by removing the lexicon dependency to recognize a word. This work may be extended towards online

Dr. Rajib Ghosh is working as an Assistant Professor in Computer Science and Engineering department at National Institute of Technology (NIT), Patna, India. He has more than 17 years of experiences of teaching in different Engineering colleges. He completed his Ph.D. (Computer Science and Engineering) degree from National Institute of Technology (NIT), Patna, India. He also holds M.Tech. degree in Information Technology and B.E. degree in Computer Science and Engineering. His broader research

References (27)

  • Z. Yao et al.

    Online handwritten chinese word recognition based on lexicon

    Proceedings of the 18th IEEE International Conference on Pattern Recognition

    (2006)
  • J. Hu et al.

    Writer independent online handwriting recognition using an HMM approach

    Pattern Recognit.

    (2000)
  • U. Bhattacharya et al.

    An analytic scheme for online handwritten Bangla cursive word recognition

    Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition

    (2008)
  • Cited by (63)

    • ESDTW: Extrema-based shape dynamic time warping

      2024, Expert Systems with Applications
    • Advances in online handwritten recognition in the last decades

      2022, Computer Science Review
      Citation Excerpt :

      Table 5 presents the accuracy comparison of different word recognition schemes. Ghosh et al. in [146] have achieved maximum accuracy of 97.27% for Devanagari word recognition. Bengali, the world’s seventh most common language, has achieved character recognition accuracy of 99.99% and 95.49% for 10 000 and 15 000 character database and achieved a maximum of 95.24% accuracy for handwritten word recognition [146].

    View all citing articles on Scopus

    Dr. Rajib Ghosh is working as an Assistant Professor in Computer Science and Engineering department at National Institute of Technology (NIT), Patna, India. He has more than 17 years of experiences of teaching in different Engineering colleges. He completed his Ph.D. (Computer Science and Engineering) degree from National Institute of Technology (NIT), Patna, India. He also holds M.Tech. degree in Information Technology and B.E. degree in Computer Science and Engineering. His broader research domains are Computer Vision, Pattern Recognition and Machine Learning. His research areas of interest are document analysis & recognition, object detection, object tracking, human movement tracking, video surveillance etc. His Ph.D. work explored pattern recognition techniques for recognizing online handwritten text of different Indic scripts such as Bengali, Devanagari, Telugu, Tamil etc. He has published various research articles in different international journals and conferences of repute. He has also provided professional services as reviewers to many international conferences and SCI-indexed journals.

    Chirumavila Vamshi was undergraduate (B.Tech.) student at National Institute of Technology (NIT), Patna, India from 2012 to 2016. Currently, he is pursuing MBA at SPJIMR, Mumbai.

    Dr. Prabhat Kumar is currently working as an Associate Professor and the Head in Computer Science and Engineering Department at National Institute of Technology Patna, India. He is also the Professor-In-charge of the IT Services of the institute and has experience of more than eight years in Network Planning & amp; Management. He holds a Ph.D. in Computer Science and M. Tech. in Information Technology. He has over 50 publications in various National/International Journals & Conferences (viz. IEEE, ACM, Springer and Elsevier). He is also the reviewer of several reputed journals indexed in SCI, SCIE and Scopus. He has chaired sessions at several international conferences held in India and abroad. He is also in the Program Committee of various National/International Conferences. He is an associate member of IEEE, life member of CSI, International Association of Engineers (IAENG) and global member of Internet Society. He has delivered expert talks and guest lectures at various prestigious institutes. His research area includes Wireless Sensor Networks, Internet of Things, Machine learning, Social Networks, E-governance etc.

    View full text