Elsevier

Pattern Recognition

Volume 108, December 2020, 107555
Pattern Recognition

MuLTReNets: Multilingual text recognition networks for simultaneous script identification and handwriting recognition

https://doi.org/10.1016/j.patcog.2020.107555Get rights and content

Highlights

  • A novel multi-task system, named MuLTReNets, to optimize script identification and handwriting recognition jointly for multilingual handwritten text recognition.

  • The MuLTReNets are extended into two versions: one for multi-lingual text recognition with merged alphabet (MuLTReNetV1), one for cascaded script identification and unilingual text recognition with joint training (MuLTReNetV2).

  • Auto-weighter keeps the balance among datasets of different scripts.

  • Performance is superior to cascade systems and unilingual recognition systems.

  • Experimental analysis for better understanding the system.

Abstract

Multilingual handwritten text recognition is often accomplished in two cascaded steps: script identification and handwriting recognition. However, this scheme is not optimal due to error accumulation. To perform simultaneous script identification and handwriting recognition, in this paper, we propose a new framework named multilingual text recognition networks (MuLTReNets). Specifically, the system has four major modules: feature extractor, script identifier, handwriting recognizer and auto-weighter. The feature extractor integrates both spatial and temporal knowledge to encode text images into features shared by the script identifier and recognizer. The script identifier predicts script category from a variable-length sequence incorporating an auto-weighter for balancing different scripts, while the handwriting recognizer adopts long-short term memory (LSTM) and Connectionist Temporal Classification (CTC) to accomplish sequence decoding. Via multi-task learning, the proposed framework can benefit both two multilingual recognition schemes: unified recognition with merged alphabet (MuLTReNetV1) and cascaded script identification-single script recognition with joint training (MuLTReNetV2). We evaluated the performance of the proposed method on handwritten text databases of five languages, which are English, French, Kannada, Urdu, and Bangla. Experimental results demonstrate that our method performs superiorly for both script identification and handwriting recognition. The accuracy of script identification reaches 99.9%. While in handwriting recognition, the proposed system not only outperforms cascade systems but also surpasses systems particularly designed for specific scripts.

Introduction

In recent years, there is an increasing demand of analyzing multilingual handwritten documents, including mixed documents of different scripts1 and documents mixed with multiple scripts [1]. For instance, there are dozens of commonly used languages in India, causing a mass of multilingual documents [2]. The problem exists in practical applications of many scenarios, such as banking, health, insurance, education, finance, government agencies, etc. Hence, it is necessary to take language diversity into account while digitizing documents.

Multilingual handwritten text recognition (MLHTR), for converting document images of multiple languages into texts, has attracted extensive attention in recent years [3], [4], [5], [6], [7], [8], [9], [10], [11]. To the best of our knowledge, most schemes for this problem are cascaded [3], [4], [5], [6], [7], [8] by performing in two steps: script identification and handwriting recognition, as illustrated in Fig. 1 a. In general, these systems employ a script identification model as central controller, managing a bank of unilingual recognizers built for each scripts. Based on the script identification in the first stage, the corresponding handwriting recognition model will be applied to recognize the texts of specific script. Though much attention has been paid to script identification [12], [13], [14], [15], [16], [17], [18] and unilingual handwriting recognition [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], the cascade methods still suffer from some drawbacks:

  • Usually, a n-script cascade MLHTR system is composed of at least one script identification model and n script-specific recognition models. Thus, the storage and computation cost will be increased dramatically.

  • The cascade system is potentially error-prone because of error accumulation. Once the script identification was wrong, the following recognition procedures cannot give correct result.

  • It is difficult for existing script identification methods to distinguish short texts like words or text lines in similar languages, such as English and French.

On the other hand, some methods share encoder and integrate multiple alphabets for multilingual recognition [9], [10] to reduce the complexity of the system. Nevertheless, these methods can only achieve satisfactory results in languages with similar alphabets. Besides, they recognize texts in an end-to-end manner regardless of script categories, causing inconvenience for subsequent processing, such as syntactic parsing, machine translation et al.

In this paper, we aim to design a MLHTR system to perform script identification and text recognition simultaneously. To overcome the abovementioned drawbacks, we propose a novel multilingual text recognition networks (MuLTReNets), where all the modules are optimized jointly by multi-task learning. The system, as shown in Fig. 1 b, contains a shared feature extractor and two task-specific parts for script identification and handwriting recognition, respectively. First, the feature extractor extracts semantic features integrating both spatial and temporal knowledge. Then, based on the shared features, the branches of both script identification and handwriting recognition will decode variable-length sequence respectively. Moreover, as optimized jointly, the architecture can achieve information sharing between script identification and handwriting recognition tasks, and therefore promote both parts.

Our MuLTReNets have been evaluated on public datasets of five scripts, namely, IAM (English), Rimes (French), KHTD (Kannada), UCOM (Urdu), and BHTD (Bangla). The experimental results demonstrate the superiority of the proposed method on both script identification and handwriting recognition. The average accuracy of script identification is 99.9%, and the recognition performance, in terms of character error rate (CER) and word error rate (WER), is superior to cascade systems and even defeats some state-of-the-art unilingual recognition systems.

A preliminary version of this work was published in a conference [11], which has proposed a method for simultaneous script identification and handwritten text line recognition in multi-task learning framework and experimental results have demonstrated the promising performance in three scripts. This paper further extends the previous conference work from several perspectives, and the major contributions of this work are summarized as follows:

  • The system architecture is extended into two versions, named MuLTReNetV1 and MuLTReNetV2: one for multi-lingual text recognition with the merged alphabet, the other for cascaded script identification and unilingual text recognition with joint training.

  • To keep the balance among datasets of different scripts, we propose a novel auto-weighter to the script identification task.

  • We provide extensive experiments on data of more languages and experimental analysis of different parts for better understanding the MuLTReNets.

  • The MuLTReNets report script identification and handwriting recognition performance superior to our previous methods [11], cascade systems, and also competitive to state-of-the-art unilingual recognition systems.

The rest of the paper is organized as follows. Section 2 reviews related works; Section 3 describes details of the proposed MuLTReNets framework; Section 4 presents the experimental results, and finally, the paper is concluded in Section 5.

Section snippets

Multilingual handwritten text recognition

Multilingual handwritten text recognition has been studied extensively in recent years [3], [4], [5], [6], [7], [8], [9], [10], [11]. This problem is solved mostly by cascade frameworks. Mioulet et al [5]. used a script recognizer based on a shape feature approach to identify various scripts, then two character recognizers were used for Latin and Bengali to process corresponding scripts and output their text contents. And finally, language identifiers were adopted to discriminate different

System overview

For multilingual handwriting recognition, the recognizer is required to manage multiple alphabets. Assuming alphabet for each script is denoted asCi={c1i,c2i,},where Ci represents the alphabet of the i-th script and c*i denotes characters in Ci. To verify the robustness of the framework, we propose two versions of MuLTReNets, namely MuLTReNetV1 and MuLTReNetV2, conducting character sets in different manners. With a merged alphabet, the MuLTReNetV1 can effectively decrease parameter amount and

Datasets

To evaluate the performance of the proposed method, we collected the datasets of five scripts to form a multilingual database, named MLHTD, and establish the benchmarks.

The five datasets, IAM [38], Rimes [39], KHTD [40], UCOM [41] and BHTD [42], are in English, French, Kannada, Urdu, and Bangla, respectively. As English and French are similar in the glyph, both using the Latin alphabet, they are usually distinguished by a cascade system with text recognition and n-gram language models, such as

Conclusion

In this paper, to overcome the error accumulation in the cascade system, we have proposed multi-lingual handwritten text recognition networks (MuLTReNets) for performing script identification and handwriting recognition simultaneously in the multi-task framework. We designed two versions of the network, one for multi-lingual text recognition with merged alphabet (MuLTReNetV1), one with multiple unilingual recognizers activated after script identification (MuLTReNetV2). In both versions, the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (NSFC) grants 61733007, 61573355, 61633021, 61721004, and 71621002, and in part by the Key Research Program of Frontier Sciences of CAS grant ZDBS-LY-7004 and the Youth Innovation Promotion Association of CAS grant 2019141.

Zhuo Chenreceived the B.S. degree in electronic information engineering from C hina Agricultural University, Beijing, China, in 2015. He is currently pursuing his Ph.D. degree in Pattern Recognition and Intelligent Systems at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include multilingual handwritten text recognition, multi task learning, and sequence pattern recognition.

References (44)

  • M. Kozielski et al.

    Multilingual off-line handwriting recognition in real-world images

    Proceedings of the 2014 11th IAPR International Workshop on Document Analysis Systems

    (2014)
  • L. Mioulet et al.

    Language identification from handwritten documents

    Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR)

    (2015)
  • P. Barlas et al.

    Language identification in document images

    J. Imaging Sci. Technol.

    (2016)
  • D. Keysers et al.

    Multi-language online handwriting recognition

    IEEE Trans Pattern Anal Mach Intell

    (2017)
  • Y. Fujii et al.

    Sequence-to-label script identification for multilingual ocr

    Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

    (2017)
  • T. Bluche et al.

    Gated convolutional recurrent neural networks for multilingual handwriting recognition

    Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

    (2017)
  • Z. Chen et al.

    Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks

    Proceedings of the 14th International Conference on Document Analysis and Recognition (ICDAR)

    (2017)
  • A.L. Spitz et al.

    Palace: A multilingual document recognition system

    Proceedings of the IAPR Workshop on DAS

    (1995)
  • A.L. Spitz

    Multilingual document recognition

    Handbook of Character Recognition and Document Image Analysis

    (1997)
  • V. Singhal et al.

    Script-based classification of hand-written text documents in a multilingual environment

    Proceedings of the 17th Workshop on Parallel and Distributed Simulation

    (2003)
  • A. Busch et al.

    Texture for script identification

    IEEE Trans Pattern Anal Mach Intell

    (2005)
  • A.L. Spitz

    Determination of the script and language content of document images

    IEEE Trans Pattern Anal Mach Intell

    (1997)
  • Cited by (23)

    • Vector of Locally and Adaptively Aggregated Descriptors for Image Feature Representation

      2021, Pattern Recognition
      Citation Excerpt :

      For videos, we often use 3D Convolutions [7]. LSTM (Long Short Time Memory) network [8] and text CNN [9] are more suitable for texts. One of the bottleneck problems of neural networks is that they are basically not interpretable [10].

    • Multi-task learning for simultaneous script identification and keyword spotting in document images

      2021, Pattern Recognition
      Citation Excerpt :

      The MTL architectures are generally trained in an End-to-End (E2E) way, to explicitly ignore the problem decomposition during the learning process, which is usually characteristic in the traditional pipeline designs. Recently, several MTL systems [8,9] have been proposed to perform script identification/OCR, where task-wise results outperformed the mono-task systems. The originality of this paper lies in two different points: the first is the use of the compact bilinear pooling to combine global and local features, enabling compact and highly discriminative feature representation.

    • Lifelong Scene Text Recognizer via Expert Modules

      2023, MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
    View all citing articles on Scopus

    Zhuo Chenreceived the B.S. degree in electronic information engineering from C hina Agricultural University, Beijing, China, in 2015. He is currently pursuing his Ph.D. degree in Pattern Recognition and Intelligent Systems at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include multilingual handwritten text recognition, multi task learning, and sequence pattern recognition.

    Fei Yin is an associate professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He received the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences in 2010. He got his BS and MS from Xian University of Posts and Telecommunications in 1999 and Huazhong University of Science and Technology in 2002 respectively. His current research interests include character recognition and document processing, etc. The current projects include “Theory and Key Techniques for Perturbation based Character Recognition” and “Video/Image text detection and recognition”. He has published more than thirty papers on the international journals and conferences.

    Xu Yao Zhang is an associate professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He received the B.S. degree in computational mathematics from Wuhan University, Wuhan, China, in 2008, and the Ph.D. degree in pattern recogniti on and intelligent systems from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2013. His research interests include machine learning, pattern recognition, and especially large category classification, dimensionality reduction, cla ssifier adaptation (non iid problems), sequential pattern recognition, deep neural networks, online and offline handwriting recognition, image processing, and GPU based large scale optimization.

    Qing Yang is a professor at the National Laboratory of Pis a professor at the National Laboratory of Pattern Recognition attern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He received the PhD degree in computer science from the Institute of He received the PhD degree in computer science from the Institute of Automation, Chinese Academy of Sciences, Beijing. His research interests Automation, Chinese Academy of Sciences, Beijing. His research interests include image proinclude image processing, pattern recognition, and bioinformatics.cessing, pattern recognition, and bioinformatics.

    Cheng-Lin Liu received the B.S. degree in electronic engineering from Wuhan received the B.S. degree in electronic engineering from Wuhan University, Wuhan, China, the M.E. degUniversity, Wuhan, China, the M.E. degree in electronic engineering free in electronic engineering from rom Beijing Polytechnic University (currently Beijing Beijing Polytechnic University (currently Beijing University of Technology), University of Technology), Beijing, China, the Ph.D. degree in patBeijing, China, the Ph.D. degree in pattern recognition and intelligentern recognition and intelligent control t control from the Institute of Automation of Chinese Academy of Sciences, Beijing, from the Institute of Automation of Chinese Academy of Sciences, Beijing, China, in 1989, 1992 and 1995, respectively. He was a postdoctoral fellow China, in 1989, 1992 and 1995, respectively. He was a postdoctoral fellow a t a t Korea Advanced Institute of Science and Technology (KAIST) and later at Tokyo Korea Advanced Institute of Science and Technology (KAIST) and later at Tokyo University of Agriculture and Technology from March 1996 to March 1999. University of Agriculture and Technology from March 1996 to March 1999. From 1999 to 2004, he was a research staff member and later a senior researcher From 1999 to 2004, he was a research staff member and later a senior researcher at the Central Researcat the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. From 2005, he h Laboratory, Hitachi, Ltd., Tokyo, Japan. From 2005, he has been a Professor at the National Laboratory of Pattern Recognition (NLPR), has been a Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, Beijing, China, and is Institute of Automation of Chinese Academy of Sciences, Beijing, China, and is now the director of the laboratory. His rnow the director of the laboratory. His research interests include pattern esearch interests include pattern recognition, image processing, neural networks, machine learning, and especially recognition, image processing, neural networks, machine learning, and especially the applications to character rthe applications to character recognition and document analysis. He has ecognition and document analysis. He has contributed many effective methods to different aspects of handwrittcontributed many effective methods to different aspects of handwritten document en document analysis, including image preanalysis, including image pre–processing, feature extraction, classifier design, processing, feature extraction, classifier design, character string recognition, and language modeling. His algorithms have yielded character string recognition, and language modeling. His algorithms have yielded superior performance, and have been transferred to industrial applications superior performance, and have been transferred to industrial applications includiincluding mail sorting and form processing. He has published over 200 technical ng mail sorting and form processing. He has published over 200 technical papers at prestigious international journals and conferences, including IEEE papers at prestigious international journals and conferences, including IEEE TPAMI, Pattern Recognition, IEEE TNN, ICPR, ICDAR, CVPR, ICDM, AAAI TPAMI, Pattern Recognition, IEEE TNN, ICPR, ICDAR, CVPR, ICDM, AAAI and IJCAI. He won the IAPR/ICDAR Youngand IJCAI. He won the IAPR/ICDAR Young Investigator Award of 2005, and Investigator Award of 2005, and received the Outstanding Youth Fund of NSFC in 2008. He is on the editorial received the Outstanding Youth Fund of NSFC in 2008. He is on the editorial board of journals Pattern Recognition, Image and Vision Computing, and board of journals Pattern Recognition, Image and Vision Computing, and InternationalInternational Journal on Document Analysis aJournal on Document Analysis and Recognition. He is a Fellow nd Recognition. He is a Fellow of the IEEE and the IAPR, member of the ACM and the IEICE Japan.the IEEE and the IAPR, member of the ACM and the IEICE Japan.

    View full text