Convolutional Deep Learning Network for Handwritten Arabic Script Recognition

Elleuch, Mohamed; Kherallah, Monji

doi:10.1007/978-3-030-49336-3_11

Mohamed Elleuch^18,19 &
Monji Kherallah¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1179))

Included in the following conference series:

International Conference on Hybrid Intelligent Systems

502 Accesses
4 Citations

Abstract

During the last years, deep convolution networks have emerged to become widespread, resulting in substantial gains in various benchmarks. In this paper, Convolutional Deep Belief Networks (CDBN) is applied to learn automatically the finest discriminative features from textual image data consisting of Arabic Handwritten Script. This architecture is able to lay hold of the advantages of Deep Belief Network and Convolutional Neural Network. We subjoin Regularization methods to our CDBN model so that we can address the issue of over-fitting. We evaluated our proposed model on high-level dimension in Arabic textual images. The obtained outcomes from the experiments prove that our model is more effective if compared to the ultra-modern results in handwritten script recognition using IFN/ENIT data sets.

Download conference paper PDF

Deep Learning for Feature Extraction of Arabic Handwritten Script

Towards Unsupervised Learning for Arabic Handwritten Recognition Using Deep Architectures

Arabic Handwritten Recognition Using Deep Learning: A Survey

Article 23 January 2022

Naseem Alrobah & Saleh Albahli

Keywords

1 Introduction and Related Works

The techniques in relation to the information processing at present cognizes hectic progress in relationship with data processing. It has an increasing potential in the domain of the human-computer interaction. Furthermore, in recent years, human reading’s machine simulation has been intensively subjected to many studies. The recognition of writing is part of the larger domain of pattern recognition. It aims at developing a system able to be the closest to the human ability of reading.

Arabic-handwriting languages are lagging behind mainly because of their complexity and their cursive nature. Consequently, automatic recognition of handwritten script represents a burdening work to be fulfilled. Since the late 1960s, by dint of its broad applicability in several engineering technological areas, Arabic handwritten script (AHS) recognition has been positively seen as the subject of in-depth studies [1]. A lot of studies have been realized to recognize Arabic handwritten characters using unsupervised feature learning and hand-designed features [2, 3].

Improving suitable characteristics from the image describes a difficult and complex chore. It really requires not only a skilled but also an experienced specialist in the domain of feature extraction methods like: MFCC features in speech area, Gabor and HOG features in computer vision. The choice and goodness of these hand-designed features makes it possible to identify the efficiency of the frames utilized for classification and recognition like Multi-layer Perceptron (MLP), Hidden Markov Model (HMM), Support Vector Machine (SVM), etc. However, the majority of classifiers meet a major problem which lies in the variability of the vector features size. Thereby, many researchers have targeted the use of raw or untagged data in training developed handwriting systems, as they are the easiest way to handle large data.

The ability to automatically extract features and model high-level abstraction in various signals, namely image and text, has made deep learning (DL) algorithms widespread in the world of Artificial Intelligence research. Therefore, our first ongoing study is to implement a system for automatic feature extraction that is richer than the one obtained by employing heuristic signal processing based on the knowledge domain. This approach depends on the notion of in-depth learning of a representation of Arabic script from the image signal. So as to carry out that, the use of unsupervised and supervised learning methods has shown some potential. Learning such representations is likely to be applied to various handwriting recognition tasks.

Recent research has shown that DL methods have made it possible to make decisive progress in solving tasks such as object recognition [4, 5], computer vision [6], speech recognition [7, 8] and Arabic handwriting recognition [9].

Elaborate by LeCun et al. [10], Convolutional Neural Network (CNN) is a specialist type of Neural Network (NN) that automatically learning favorable features at every layer of the architecture based on the given dataset, which can be a convolution layer, a pooling layer and a fully connected layer. Then Ranzato et al. [11] improved performance by using unsupervised pre-training on a CNN.

Another classifier which is employed extensively is Deep Belief Network (DBN) [12]. DBN is one of the most classical deep learning models, composed of several Restricted Boltzmann Machines (RBM) in cascade. This model learns representations of high-level features from unlabeled data that uses unsupervised learning algorithms.

In comparison to shallow learning, the pros of DL are that deep structures can be designed to learn internal representation and more abstract details of input data. However, the high number of parameters given can also lead to another problem: over-fitting. Thus, improving or developing novel effective regularization techniques is an unavoidable necessity. In recent years, various regularization techniques have been suggested as batch normalization, Dropout and Dropconnect.

The participations of this paper are to leverage the DL approach to solve the problem of recognizing handwritten text in Arabic. To fulfill our target, we are studying the potential benefits of our suggested hybrid CDBN/SVM structure [13]; this model handled CDBN as an automatic characteristic extractor and let SVM to be the output predictor. On the other hand, to enhance the efficiency of CDBN/SVM model, regularization methods can contribute to the defense of over-fitting as Dropout and Dropconnect techniques.

This paper is organized as follows: Sect. 2 gives an overview of the basic components of Convolutional Deep Belief Network model and regularization techniques. Then, our target architectures are explored and discussed to recognize Arabic handwriting text. Section 3 describes experimental study, and Sect. 4 discusses the results. The last section concludes this work with some remarks.

2 Deep Models for Handwritten Recognition

In this section, the DBN model based on the RBM is firstly represented and after that, the CDBN model is reviewed. Just then, the effect of Dropout and Dropconnect techniques is analyzed in our CDBN architectures.

2.1 Restricted Boltzmann Machine (RBM)

DBN is a hierarchical generative model [12] involving several RBM layers [14, 15] consisting of a layer of observed units and multiple layers of hidden units. The link between the two upper layers of DBN is not oriented, the other links are oriented, and there is no connection for the units of the same layer. To initialize the weights of the network, Deep Belief Networks utilize a greedy layer by layer pre-trained algorithm.

An RBM is a non-oriented graphical model layer consisting of a two layers, in which the visible units ‘v’ are connected to the hidden units ‘h’. The joint probability distribution and the energy function are computed as:

$$ E(v,h) = - \sum\limits_{i,j} {\mathop v\nolimits_{i} } \mathop w\nolimits_{ij} \mathop h\nolimits_{j} - \sum\limits_{j} {\mathop b\nolimits_{j} } \mathop h\nolimits_{j} - \sum\limits_{i} {\mathop c\nolimits_{i} } \mathop v\nolimits_{i} $$

(1)

$$ P(v,h) = \frac{1}{Z}e^{ - E(v,h)} $$

(2)

Where w_ij is the weight between visible units i and hidden units j, b_j is bias terms for hidden unit, c_i is bias terms for visible unit and Z represents the partition function.

2.2 Convolutional Restricted Boltzmann Machine (CRBM)

The construction of hierarchical features structures is a challenge and the Convolutional Deep Belief Network is one of the famous features extractor often used in the last decade in the field of pattern recognition. In this subsection, we thoroughly clarify the basic notion of this approach.

As a hierarchical generative model [16], the Convolutional Deep Belief Network reinforces the efficiency of bottom-up and top-down probabilistic inference. Similar to the Deep Belief Network standard, this model made up of several layers of probabilistic max-pooling CRBMs stack on top of each other, and the training was carried out by the greedy layer-by-layer algorithm [12, 17]. This probabilistically decreases the representation of the detection layers. Decreasing the representation with max-pooling allows representations of the upper layer to never change to local translations of input data, reduces the computational load [18] and is useful for vision recognition issues [19].

Building a convolutional Deep Belief Network, the algorithm learns high-level features using end-to-end training. In our experiments, we trained CDBN architecture with a couple of CRBM layers to automatically learn hierarchical features in an un-supervised/supervised manner. Figure 1 clarifies the architecture of CRBM made up of two layers: a visible layer V and a hidden layer H, both joined by sets of local and common parameters. A detailed technical report is available at [20].

By using visible inputs with real values, the probabilistic max-pooling CRBM is fixed by the following equation:

$$ \begin{aligned} E(v,h) = & \,\frac{1}{2}\sum\limits_{i,j = 1}^{{\mathop N\nolimits_{V} }} {\mathop v\nolimits_{i,j}^{2} } - \sum\limits_{k = 1}^{{\mathop K\nolimits_{{}} }} {\sum\limits_{i,j = 1}^{{\mathop N\nolimits_{H} }} {\sum\limits_{r,s = 1}^{{\mathop N\nolimits_{W} }} {\mathop h\nolimits_{i,j}^{k} } } } \mathop w\nolimits_{r,s}^{k} \mathop v\nolimits_{i + r - 1,j + s - 1} \\ & \, - \,\sum\limits_{k = 1}^{K} {\mathop b\nolimits_{k} } \sum\limits_{i,j = 1}^{{\mathop N\nolimits_{H} }} {\mathop h\nolimits_{i,j}^{k} } - c\sum\limits_{i,j = 1}^{{\mathop N\nolimits_{V} }} {\mathop v\nolimits_{i,j} } \\ \end{aligned} $$

(3)

2.3 Regularization Methods

The utilization of Deep Networks models for cursive handwriting recognition has made significant progress over the past decade. Nevertheless, for these architectures to be used effectively, a wide amount of data needs to be collected.

Consequently, over-fitting is a serious problem in such networks due to the large number of parameters that will be carried out gradually as the network increases and gets deeper. To overcome this problem, many regularization and data augmentation procedures have been ameliorated [21,22,23].

In this sub-section, two regularization techniques will be shortly introduced that may affect the training performance. Dropout and Dropconnect are both methods for preventing over-fitting in a neural network.

To practice Dropout, a subset of units are haphazardly selected and set their output to zero without paying attention to the input. This efficiently removes these units from the model. A Varied subset of units is selected randomly each time we present an example of training.

Dropconnect operates in the same way, excluding that we deactivate individual weights (i.e., fix them to zero), rather of nodes, so a node may stay partly active. In addition, Dropconnect is a generalization of Dropout as it generates yet more possible models, since there are practically still more links than units.

2.4 Model Settings

To extend our study [13] so that we can discover the power of the deep convolutional neural networks classifier done on the problem of AHS recognition, we point out in this work an itemized study of CDBN with Dropout/Dropconnect techniques. In this subsection, we identify the tuning parameters of the chosen convolutional DBN structure.

As noted above, our CDBN architecture is composed of two layers of CRBM (See Fig. 2). The efficiency of this architecture during IFN/ENIT’s handwritten text recognition task was evaluated.

The description of the CDBN architecture exploited in the experiments conducted in the IFN/ENIT database is given as follows: $ 1 \times 300 \times 100 - 12W24G - MP2 - 10W40G - MP2 $. This architecture corresponds to a network with dimension input images $ 300 \times 100 $, the initial layer consisting of 24 groups of $ 12 \times 12 $ pixel filters and the pooling ratio C for each layer is 2. The second layer includes 40 maps, each $ 10 \times 10 $. We define a sparseness parameter of 0.03. The initial layer bases learned strokes consisting of the characters, as for the second layer bases learned characters parts by the groups of strokes. By integrating the activations of the first and second layers, we constructed feature vectors; Support vector machines are used to rank these features.

In order to regularize and make the most effective use of these architectures, units or weights have been removed. Dropout was used only at the input layer with a probability of 20% and at each hidden layer at a probability of 50%, while Dropconnect was only applied at the input layer with a probability of 20%.

3 Experiments with Proposed Model

This section illustrates a test to evaluate the suggested approach performance on the IFN/ENIT benchmark database [24]. In our experiments, each IFN/ENIT dataset image was normalized to the same input data dimension with 300 × 100 pixels for the visible layer. These textual images are at the gray level and resizing is not necessarily square.

Generally, script handwriting recognition system consists of three principal steps: pre-processing, automatic feature extraction and classification.

Pre-processing: This phase consists in generating a normalized and uniform text image.
Feature extraction: Consists in determining different feature vectors.
Training: The training phase consists to find the most appropriate models to the inputs of the problem.
Parameters setting: For configuration, it is a must to identify the number and size of filters, sparsity of the hidden units and max-pooling region size in each layer of the Convolutional DBN model. Referring to the size of the images used (high-dimensional data), we specify a hyper-parameters setting for the configuration of the Convolutional DBN structure. So, to get the most out use of this architecture, two regularization methods have been put into practice separately for the Convolutional DBN structure called Dropout and DropConnect.

3.1 Dataset Description and Experimental Setting

To measure the effectiveness of our system proposed for high-level dimension of data input image, the IFN/ENIT database [24] is employed. Indeed, the IFN/ENIT database comprises 26459 handwritten Arabic words developed with contributions from 411 volunteers, making a total of around 115420 parts of Arabic words (PAWs) and around 212167 letters. The words written are 946 Tunisian town and village names with the postal code of each. Data processing consists of offline handwritten Arabic words. Dataset ‘a’ and ‘b’ are employed for training phase whereas the test set was chosen from set ‘c’. Figure 3 illustrates samples of village name, written by 5 different writers.

3.2 Experimental Results and Comparison

Table 1 makes a comparison between our approach outcomes with those already published outcomes. We noted that the work of our CDBN structure yielded encouraging outcomes, with a Word Error Rate (WER) of around 9.76% if compared to Maalej and kherallah’s works [25] using Recurrent Neural Network (RNN), after applying Dropout. On the other hand, with Dropconnect we got an error rate of 14.09%.

In addition, the rate achieved is contrasted to our earlier work. These experiments clearly prove that the outcome in [13] reaches 16.3% using the Convolutional DBN structure without Dropout, which is not excellently contrasted to the classic approaches [26, 27]. It is thanks to the Convolutional DBN architecture that is able to be over-completed. On an experimental basis, a model that is too complete or too adjusted may be prone to learn inconsiderable solutions, such as pixel detectors. In our present work to find a suitable solution to this issue, we utilize two regularization techniques, namely Dropout and Dropconnect for Convolutional DBN. As a result, the acquired outcomes prove an amelioration rate of approximately 6.54% with Dropout and 2.21% with Dropconnect.

Table 1. Comparison of word recognition performances utilizing the IFN/ENIT database.

Full size table

In general, it is evident that the proposed DL architecture, Convolutional Deep Belief Network with Dropout, provides satisfactory performance, specially against over others approaches such as the Dynamic Time Warping (DTW) and the Hidden Markov Model applied to the IFN/ENIT database.

4 Discussion

As mentioned above, our suggestion depicts a DL approach for Arabic Handwriting Script recognition, in particular the Convolutional DBN. To confirm the efficiency of the proposed framework, we introduced experimental outcomes utilizing Arabic words handwritten databases; IFN/ENIT database.

We are able to observe that our Convolutional DBN architecture with Dropconnect has reached a promising error rate of 14.09% when used with large dimension data. In addition, we have rebuilt our proposed Convolutional DBN setting with Dropout. The effectiveness is then raised to achieve a WER of 9.76%, which corresponds to a gain of 4.33%.

The results obtained, regardless of their size, are sufficiently important compared to scientific researches using other classification methods, in particular those they obtained with raw pixels without feature extraction phase (See Fig. 4). This participation portrays an interesting challenge in the field of computer vision and pattern recognition, as it will be a real incentive to motivate the use of deep machine learning with Big Data analysis.

5 Conclusion

With the development of DL technique, deep hierarchical neural network has drawn great attentions for handwriting recognition. In this article, we first introduced a baseline of the DL approach to Arabic Handwriting Script recognition, primarily the Convolutional Deep Belief Network. Our aim was to leverage the energy of these Deep Networks that can process large dimensions input image, permitting the usage of raw data inputs rather than extracting a feature vector and studying the complex decision boundary between classes. Secondly, we investigated the efficiency of two regularization methods employed separately in the Convolutional DBN structure to recognize Arabic words using IFN/ENIT Database. As we can observe, Dropout is a very efficient regularization technique compared to Dropconnect and the unregulated basic method.

In addition, as a perspective of ours studies, we will evaluate the performance of our system for various applications for the image processing, such as, biometric and medical images analysis.

References

Mota, R., Scott, D.: Education for innovation and independent learning (2014)
Google Scholar
Porwal, U., Shi, Z., Setlur, S.: Machine learning in handwritten Arabic text recognition. In: Handbook of Statistics, vol. 31, pp. 443–469. Elsevier (2013)
Google Scholar
Elleuch, M., Hani, A., Kherallah, M.: Arabic handwritten script recognition system based on HOG and gabor features. Int. Arab J. Inf. Technol. 14(4A), 639–646 (2017)
Google Scholar
Boureau, Y. L., Cun, Y.L.: Sparse feature learning for deep belief networks. In: Advances in Neural Information Processing Systems, pp. 1185–1192 (2008)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(2), 513–529 (2011)
Article Google Scholar
Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2011)
Article Google Scholar
Dahl, G., Mohamed, A.R., Hinton, G.E.: Phone recognition with the mean-covariance restricted Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 469–477 (2010)
Google Scholar
Al-Ayyoub, M., Nuseir, A., Alsmearat, K., Jararweh, Y., Gupta, B.: Deep learning for Arabic NLP: a survey. J. Comput. Sci. 26, 522–531 (2018)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Marc’Aurelio Ranzato, F.J.H., Boureau, Y.L., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2007), vol. 127. IEEE Press, June 2007
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Elleuch, M., Tagougui, N., Kherallah, M.: Deep learning for feature extraction of Arabic handwritten script. In: International Conference on Computer Analysis of Images and Patterns, pp. 371–382. Springer, Cham, September 2015
Google Scholar
Mohamed, A.R., Sainath, T.N., Dahl, G.E., Ramabhadran, B., Hinton, G.E., Picheny, M.A.: Deep belief networks using discriminative features for phone recognition. In: ICASSP, pp. 5060–5063, May 2011
Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Article Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Unsupervised learning of hierarchical representations with convolutional deep belief networks. Commun. ACM 54(10), 95–103 (2011)
Article Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007)
Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616. ACM, June 2009
Google Scholar
Jarrett, K., Kavukcuoglu, K., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2146–2153. IEEE, September 2009
Google Scholar
Elleuch, M., Kherallah, M.: Boosting of deep convolutional architectures for Arabic handwriting recognition. Int. J. Multimed. Data Eng. Manag. (IJMDEM) 10(4), 26–45 (2019)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066, February 2013
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H.: IFN/ENIT database of handwritten Arabic words. In: Colloque International Francophone sur l’Ecrit et le Document (CIFED), pp. 127–136 (2002)
Google Scholar
Maalej, R., Kherallah, M.: Improving MDLSTM for offline Arabic handwriting recognition using dropout at different positions. In: International Conference on Artificial Neural Networks, pp. 431–438. Springer, Cham, September 2016
Google Scholar
AlKhateeb, J.H., Ren, J., Jiang, J., Al-Muhtaseb, H.: Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking. Pattern Recogn. Lett. 32(8), 1081–1088 (2011)
Article Google Scholar
Saabni, R.M., El-Sana, J.A.: Comprehensive synthetic Arabic database for on/off-line script recognition research. Int. J. Doc. Anal. Recogn. (IJDAR) 16(3), 285–294 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National School of Computer Science (ENSI), University of Manouba, Manouba, Tunisia
Mohamed Elleuch
Faculty of Sciences, University of Sfax, Sfax, Tunisia
Mohamed Elleuch & Monji Kherallah

Authors

Mohamed Elleuch
View author publications
You can also search for this author in PubMed Google Scholar
Monji Kherallah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Elleuch .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR), Auburn, WA, USA
Ajith Abraham
School of Computer Science and Engineering, VIT Bhopal University, Bhopal, Madhya Pradesh, India
Shishir K. Shandilya
Area of Project Engineering, University of Cordoba, Córdoba, Spain
Laura Garcia-Hernandez
Escola de Engenharia, Universidade do Minho, Guimarães, Portugal
Maria Leonilde Varela

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elleuch, M., Kherallah, M. (2021). Convolutional Deep Learning Network for Handwritten Arabic Script Recognition. In: Abraham, A., Shandilya, S., Garcia-Hernandez, L., Varela, M. (eds) Hybrid Intelligent Systems. HIS 2019. Advances in Intelligent Systems and Computing, vol 1179. Springer, Cham. https://doi.org/10.1007/978-3-030-49336-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-49336-3_11
Published: 13 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49335-6
Online ISBN: 978-3-030-49336-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics