Elsevier

Neurocomputing

Volume 313, 3 November 2018, Pages 1-13
Neurocomputing

Evidential combination of SVM classifiers for writer recognition

https://doi.org/10.1016/j.neucom.2018.05.096Get rights and content

Abstract

This paper addresses the problem of writer identification from handwritten documents. We propose a new approach for offline writer identification based on a combination of SVM classifiers. The main contribution of this study is to propose a combination module using Dempster–Shafer Theory (DST) in an attempt to improve the overall system performance. DST is an effective theoretical framework to treat uncertainty and imprecision related to information sources. The evaluation of the proposed system was carried on different publicly available databases on Arabic and Latin scripts. Experimental results reveal that the proposed combination approach outperforms the conventional combination methods and achieves interesting results as compared to those reported by the existing writer recognition systems.

Introduction

Handwriting based writer recognition is an interesting behavioral biometric modality that is based on the hypothesis that every individual has a unique writing style differentiating him/her from other writers. In the recent years, the problem of writer recognition has received a renewed research interest of the handwriting recognition community. The writer recognition task consists on making a decision about a questioned handwriting in order to identify its authorship. Writer recognition covers a broad spectrum of applications including forensic document examination [1], [2], biometric recognition [3], [4], [6], [27], [48], medieval and historical document analysis [7], [8] and personalized handwriting recognition [9].

The problem of writer recognition is generally divided into two sub-tasks, writer identification and writer verification. The identification task involves determining the writer of a questioned sample, given a set of samples with known writers. The verification task, on the other hand, involves comparing two writing samples to conclude if they are written by the same or different writers. Furthermore, as a function of handwriting acquisition, writer recognition systems can be classified into online and offline methods. In the first case, the characteristics of writing style are captured directly from a digital device [49], [50], [51] while in the later case; digitized images of handwriting are analyzed to extract writer-specific characteristics. Likewise, as a function of textual content, writer recognition systems can be text-dependent or text-independent. Text-dependent approaches employ predefined text and require all writers to produce the same text for training and test samples. The text-independent techniques, on the contrary, impose no constraints on the textual content of the samples to be compared.

From the view point of feature extraction, writer identification techniques can be categorized into machine learned and handcrafted features. Machine learned features are typically derived from a dataset of handwriting images through a training process. Convolutional Neural Networks (CNNs), for example, have been widely employed to extract such features. In [50], for instance, the authors present an end-to-end writer identification system combining a deep convolutional neural network and a new data augmentation method to improve the generalized application of CNNs to this problem. Likewise, in [49], authors propose an online writer identification framework using a recurrent neural network which does not require any domain knowledge for handwriting data analysis.

Writer identification features can be divided into two categories, structural and statistical features. Structural features are a natural method for capturing intuitive aspects of writing, such as loops, curves, local concavity end points, branch points etc. For instance, Abdi et al. [10] proposed a writer identification technique based on stroke feature combination. Different stroke based features are employed including length, height/width ratio and curvature probability distribution functions. For classification, the authors used several metrics including χ2 distance, weighted Euclidean distance and Manhattan distance and the results are reported on writing samples of 40 writers randomly chosen from the IFN/ENIT database. In [11], Al-Maadeed et al. make use of the edge-based directional probability distribution features extracted from handwritten words. In addition, moment invariants and contour direction features are extracted and are classified using K-nearest neighbor classifier. For evaluation, a customized database of 100 individuals comprising 32,000 handwritten words was employed.

Statistical approaches for computerized analysis of handwriting typically employ features such as texture, curvature and slant etc. Among well-known statistical methods, Djeddi et al. [12] applied three texture analysis methods for characterizing the writing style. These include run-length distributions, edge-direction distribution and edge-hinge distribution. Classification is carried out using multi-class Support Vector Machine and the system was evaluated on 1000 writers of the KHATT database. In another study, Gazzah and Ben Amara [13] proposed a method for texture analysis by using a 2D Discrete Wavelet Transforms (DWT) lifting scheme. To classify a database of 180 text samples, the authors employed a modular Multi-Layer Perceptron (MLP) classifier. In [5], the authors propose a method by combining wavelet transform and generalized Gaussian model for writer identification from Chinese handwriting documents. In another work [37], the authors propose a local approach based on texture analysis of small writing fragments where each fragment is represented by its Local Binary Pattern (LBP) histogram. The proposed method benefits from the efficiency of the LBP as a textural descriptor and the high discriminative power of handwritten fragments to improve the performance of writer identification. In [38], a writer identification system for Oriya script is proposed based on curvature features and Support Vector Machine (SVM) as classifier. In another study [39], the authors propose a grapheme based approach for offline Arabic writer identification and verification. Originality lies in the independence of the grapheme codebook from any training process and the synthesis of graphemes based on the beta-elliptic model.

In order to exploit the benefits of both structural and statistical methods, an hybrid writer identification system is proposed in [14] combining the two types of features. Authors employed different types of features such as connected component based features, gradient distribution features, windowed gradient distribution features and contour chaincode distribution features. Classification was carried using a nearest neighbor classifier with Euclidean distance as metric. In order to reduce the dimensionality of the feature space, authors investigated a number of techniques like principal component analysis, linear discriminant analysis, multiple discriminant analysis, multidimensional scaling and forward/backward feature selection algorithm. A database of 500 paragraphs written by 250 writers was used for the experimental evaluation.

Typically, in writer identification systems, the combination of individual classifier outputs represents an efficient solution for solving complex problems taking advantage of the diversity of base classifiers trained on different sources of information. According to Polikar [15], there are two key components in designing combination classifier systems, creating individual classifiers and choosing the appropriate strategy to combine these classifiers. Various combination strategies have been proposed in the literature. These can be grouped into two broad categories, feature combination methods and decision combination techniques. The first category, commonly known as early integration [32], consists in combining the input features into a unique feature space and, subsequently employing a traditional classifier to classify the combined observations. In contrast, decision combination, also known as late integration [31], consists in combining the outputs (decisions) of multiple classifiers. In general, many combination strategies have been proposed for decision combination including majority voting, Borda count, product/sum/maximum rules, Bayesian methods and Dempster–Shafer Theory etc. [19], [33], [34], [35], [36].

Although most of the initial research on writer identification focused on text in the Latin alphabet, a number of recent studies have considered identification of writers from writing in other scripts as well [42], [43], [44], [45], [46], [47]. Among these, the focus of our study is on Arabic writer identification, which, despite significant research efforts, remains an open problem. The main contribution of this study is the proposition of a Dempster–Shafer Theory (DST) based combination strategy to combine the outputs of multiple SVM classifiers for writer identification. The Dempster–Shafer Theory represents a flexible framework for dealing with incomplete, uncertain, conflicting and imprecise information sources. To the best of authors’ knowledge, this is the first attempt on investigation of DST for writer identification problems. We demonstrate that the proposed DST based strategy outperforms the classical combination methods and achieves interesting results as compared to those reported in the existing writer identification systems.

The paper is organized as follows. In Section 2, we describe the implementation details of the overall system for writer identification followed by a presentation of the basic concepts of Dempster–Shafer Theory in Section 3. Section 4 details the key steps of the proposed approach and the methodology to combine multiple SVM classifiers. Experimental results and discussions are presented in Section 5 and finally we summarize the contributions of this work in the last section.

Section snippets

System overview

We first present the architecture of the overall recognition system. As illustrated in Fig. 1, the proposed writer identification system mainly consists of four stages, pre-processing, feature extraction, classification and combination. In the following sections, we provide the details of each of these steps.

DST based combination

In general, various pieces of information which have to be combined to make a decision (or to perform classification) may be heterogeneous, so that, in case of difficult problems, each evidence may be: (1) imprecise (it is not focused enough on which decision to make), (2) uncertain (when modeling random events), (3) incomplete (when representing a partial point of view on the problem) and (4) conflicting (the evidences do not concur). These constraints restrict the application of Bayesian

Experimental results and discussion

This section presents the experimental results of the evaluations carried out to validate the proposed technique. We first describe the KHATT database used in our experimental study followed by the performance of individual SVM, results of classifier combination and a comparison with the results realized in a recent International competition on writer identification. The proposed DST based combination strategy is also compared to other classical combination methods such as product, sum and

Conclusion

In this paper, we have presented a handwriting based text-independent writer identification system. In particular, we have addressed the problem of classifier combination. In order to improve the identification rates, we have implemented a combination model based on the Dempster–Shafer theory. The main idea consists of merging the outputs of different SVM classifiers to generate the final decision. It can be concluded that the proposed combination framework is generic: It can be applied to any

Yousri KESSENTINNI is graduated in Computer Science engineering from the National Engineering School of Sfax (ENIS) in 2003 and received his Ph.D. degree in the field of Pattern Recognition from the University of Rouen, France in 2009. He was postdoctoral researcher at ITESOFT company and LITIS laboratory from 2011 to 2013. Currently he is Assistant Professor at CRNS. His main research areas concern Deep learning, Document processing, data fusion, and computer vision with applications to video

References (53)

  • S. Srihari et al.

    A survey of computer methods in forensic handwritten document examination

    Proceeding of the Eleventh International Graphonomic Society Conference, Scottsdale

    (2003)
  • M. Tapiador et al.

    Writer identification method based on forensic knowledge

    Proceedings of the First International Conference ICBA 2004, Hong Kong, China

    (2004)
  • S. He et al.

    Writer identification using curvature-free features

    Pattern Recognit.

    (2017)
  • I. Bar-Yosef et al.

    Binarization, character extraction, and writer identification of historical hebrew calligraphy documents

    Int. J. Doc. Anal. Recognit.

    (2007)
  • M. Bulacu et al.

    Automatic handwriting identification on medieval documents

    Proceeding of the Fourteenth International Conference on Image Analysis and Processing, Modena, Italy

    (2007)
  • ZhangX.Y. et al.

    Writer adaptation with style transfer mapping

    IEEE Trans. Pattern Anal. Mach. Int.

    (2013)
  • M.N. Abdi et al.

    A novel approach for off-line arabic writer identification based on stroke feature combination

    Proceedings of the Twenty-Fourth International Symposium on Computer and Information Sciences, France

    (2009)
  • S. Al-Ma’adeed et al.

    Writer identification using edge-based directional probability distribution features for arabic words

    Proceedings of the Sixth ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2008), Doha, Qatar

    (2008)
  • C. Djeddi et al.

    Evaluation of texture features for offline arabic writer identification

    Proceedings of the Eleventh IAPR International Workshop on Document Analysis Systems (DAS), Tours

    (2014)
  • S. Gazzah et al.

    Arabic handwriting texture analysis for writer identification using the DWT-lifting scheme

    Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Parana

    (2007)
  • S.M. Awaida et al.

    Writer identification of arabic text using statistical and structural features

    Cybern. Syst.

    (2013)
  • R. Polikar

    Ensemble based systems in decision making

    IEEE Circuits Syst. Mag.

    (2006)
  • M. Bulacu

    Text-independent writer identification and verification on offline arabic handwriting

    Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Parana

    (2007)
  • C. Djeddi et al.

    A texture based approach for arabic writer identification and verification

    Proceedings of the International Conference on Machine and Web Intelligence (ICMWI), Algiers

    (2010)
  • C. Djeddi et al.

    Text-independent writer recognition using multi-script handwritten texts

    Pattern Recognit. Lett.

    (2013)
  • Y. Kessentini et al.

    A dempster Shafer theory based combination of handwriting recognition systems with multiple rejection strategies

    Pattern Recognit.

    (2015)
  • Cited by (14)

    • Representing uncertainty and imprecision in machine learning: A survey on belief functions

      2024, Journal of King Saud University - Computer and Information Sciences
    • A texture-based approach for offline writer identification

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      In general, the presented offline writer identification work has been based on three main kinds of features: codebook-based, texture-based, and deep-learning. Image texture-based techniques for offline writer identification consider each digitized image of handwriting (or handwriting contours) as a different texture and extract features from the whole document (Entire Image or EI) (Bulacu et al., 2007; Bulacu and Schomaker, 2007; Djeddi et al., 2014), Regions of Interest (or ROIs like blocks, grid cells, connected-components, words, etc.) (Bertolini et al., 2013; Wu et al., 2014; Singh et al., 2018; Chahi et al., 2018; Khan et al., 2019; Chahi et al., 2020), or Writing Fragments (WFs) (Hannad et al., 2016; Kessentini et al., 2018; Hannad et al., 2019). Probability Distribution Functions (PDFs) are calculated and employed to characterize the writer of a given sample (He and Schomaker, 2021).

    • Support vector machines: A robust prediction method with applications in bioinformatics

      2020, Handbook of Statistics
      Citation Excerpt :

      Nevertheless, SVMs are still widely applied in a broad range of domains (either by themselves or in combination with one or more other techniques). Without trying to be exhaustive, SVMs have been successfully applied for face detection/recognition (Judith and Suchitra, 2018; Kumar et al., 2019), for text classification (Goudjil et al., 2018), for image classification (Jain et al., 2018), for handwriting recognition (Kessentini et al., 2018), in geological and environmental sciences (De Boissieu et al., 2018), in finance and insurance (Tran et al., 2018), and last but not least, in bioinformatics. Most problems in the latter field are classification problems, such as diagnosis of brain tumors based on MRI images (Bauer et al., 2011), tissue classification (e.g., for cancer) based on microarray data (Haussler et al., 2000), gene function prediction from microarray data (Brown et al., 2000), protein secondary structure prediction (Guo et al., 2004), protein fold prediction (Li et al., 2016), or splice site detection (Bari et al., 2014; Degroeve et al., 2005; Sonnenburg et al., 2007).

    View all citing articles on Scopus

    Yousri KESSENTINNI is graduated in Computer Science engineering from the National Engineering School of Sfax (ENIS) in 2003 and received his Ph.D. degree in the field of Pattern Recognition from the University of Rouen, France in 2009. He was postdoctoral researcher at ITESOFT company and LITIS laboratory from 2011 to 2013. Currently he is Assistant Professor at CRNS. His main research areas concern Deep learning, Document processing, data fusion, and computer vision with applications to video surveillance. He has coordinate and participate on several research projects in partnership with industry. He is the author and co-author of several papers and has been a reviewer for international conferences and journals. He is also member of several scientific associations including GRCE and IAPR.

    Sana BEN ABDERRAHIM received her applied license in Information Systems Technology in 2012 from the Higher Institute of Management of Gabes (ISGG), Tunisia. She obtained her Master degree in 2015 in the field of pattern recognition. Her current research interest concern handwritten document analysis and recognition including writer identification systems.

    Chawki Djeddi is presently working as Associate Professor in the department of Mathematics and Computer Science, Larbi Tebessi University, Tebessa, Algeria. He received his Ph.D. in 2014 from University of Badji Mokhtar-Annaba, Annaba, Algeria and specializes in document image analysis and recognition. His research interests include image processing and pattern recognition with applications to document image analysis, content based image retrieval, writer demographic classification and signature verification on disguised signatures and skilled forgeries. He has been regularly participating in the top conferences in areas of document analysis and handwriting recognition. He has also participated in several scientific competitions (nine in total) organized in conjunction with ICDAR and ICFHR conferences. Among these competitions, four of their submitted systems were ranked first. He has member of the organizers of the ICDAR2015 Competition on Multi-script Writer Identification and Gender Classification using QUWI Database. He has published more than 23 research papers in various international journals, conference proceedings and edited volumes. He also had the opportunity to participate in the organization of several National and International conferences held at the University of Tebessa. In addition, He has collaborated as a member on several research projects. Currently, he is a member of the Laboratory of Mathematics, Informatics and Systems (LAMIS) at the University of Tebessa. He is also member of several scientific associations including Groupe de Recherche en Communication Ecrite (French-speaking association on handwriting recognition), Association Française pour la Reconnaissance et l’Interprétation des Formes (IAPR representative French-speaking association), the International Association of Computer Science and Information Technology and the Institute of Electrical and Electronics Engineers (IEEE). From 2010 onwards, he has also supervised a number of Masters Theses. As a part of his professional activities, in addition to teaching, he also takes up several administrative responsibilities as requested and when needed.

    View full text