Elsevier

Pattern Recognition

Volume 73, January 2018, Pages 47-64
Pattern Recognition

Comparative study of conventional time series matching techniques for word spotting

https://doi.org/10.1016/j.patcog.2017.07.011Get rights and content

Highlights

  • Experimented 32 sequence matching techniques by following a simple and classical word spotting architecture.

  • Many such techniques have never been experimented in the context of word spotting but shows interesting word spotting results.

  • Each sequence matching technique is explained in a detailed manner to quickly understand the idea behind.

  • Six historical datasets of different kinds (handwritten and printed) are experimented.

  • Experimental results are explained, analyzed and important conclusions are drawn on which algorithms to be used in a given context.

Abstract

In word spotting literature, many approaches have considered word images as temporal signals that could be matched by classical Dynamic Time Warping algorithm. Consequently, DTW has been widely used as a on the shelf tool. However there exists many other improved versions of DTW, along with other robust sequence matching techniques. Very few of them have been studied extensively in the context of word spotting whereas it has been well explored in other application domains such as speech processing, data mining etc. The motivation of this paper is to investigate such area in order to extract significant and useful information for users of such techniques. More precisely, this paper has presented a comparative study of classical Dynamic Time Warping (DTW) technique and many of its improved modifications, as well as other sequence matching techniques in the context of word spotting, considering both theoretical properties as well as experimental ones. The experimental study is performed on historical documents, both handwritten and printed, at word or line segmentation level and with a limited or extended set of queries. The comparative analysis is showing that classical DTW remains a good choice when there is no segmentation problems for word extraction. Its constrained version (e.g. Itakura Parallelogram) seems better on handwritten data, as well as Hilbert transform also shows promising performances on handwritten and printed datasets. In case of printed data and low level features (pixel’s column based), the aggregation of features (e.g. Piecewise-DTW) seems also very important. Finally, when there are important word segmentation errors or when we are considering line segmentation level, Continuous Dynamic Programming (CDP) seems to be the best choice.

Introduction

The advancement of high quality document digitization has provided a stirring alternative to preserve and easy, hassle-free access of ancient manuscripts for historians and researchers. To allow searching into these mass of digitized datasets, indexation based on Optical Character Recognition (OCR) or manual (semi-manual) transcriptions is applied traditionally. Nevertheless, the performance of available OCR engines on such historical documents are not up-to the mark because of the writing and font style variability, linguistics and script dependencies and poor document quality caused by high degradation effects. Even when learning is possible, this one becomes a burdensome process due to the need of ground truth. Whereas, the process of manual or semi-automatic transcription of handwritten or printed documents is a tedious and costly job. For these reasons, word-spotting technique appears to be an interesting alternative and research on this topic has been emphasized. This technique can be defined as the “localization of words of interest in the dataset without actually interpreting the content” and it allows to index or search inside a document using queries.

For spotting words in handwritten manuscripts and historical printed document images, word images can be thought as 2D signals, that can be matched by sequence matching algorithms like DTW [14], [17], [32]. In other application domains, DTW’s variants have been intensively evaluated to demonstrate their interest [7], [34], but they have not been clearly studied and compared in the case of word spotting. In this paper, we propose a detailed comparative study of DTW and it’s variants for word spotting. This study extends the one performed in [27] by including more sequence matching algorithms. Some of them have never been tested in word spotting context whereas they have shown promising results in other domains. Also, more experimental datasets are used (six in total), including both handwritten and printed document images.

The remainder of this paper is organized as follows. The datasets used for experiments as well as the word spotting framework are detailed in Section 2. The baseline of DTW approach and various other dynamic programming (DP) paths, warping constraints are studied in Section 3. The specific techniques to reduce the quadratic time complexity of DTW algorithm are next evaluated in Section 4. Behavior of several other approaches designed for improving the quality of DTW are studied in Section 5. Other dynamic programming based sequence matching approaches, which has shown better performance than classical DTW in several other domains e.g. shape matching, time series signal matching etc. are experimented in Section 7. Finally, a summary of results with discussion and future work is presented in Section 8.

Section snippets

Feature extraction

For all experiments and datasets used, the comparison between a query (word image) and a target (word image or text line(piece of) image) is done by transforming text images into a vector sequence using classical features; such as column based features (please see [28]) or Slit Style HOG features [36].

Column-based features : For an image with a width of N pixels, 8 statistical features, F1,F2,,F8 (Table 1) are computed from left to right on each pixel columns. The features F1F6 have been used

Evaluation of dynamic time warping methods

DTW [31] is a technique for measuring similarity between two different time series by finding their best correspondence. Let’s assume, two 2D signal : X=x1,x2,x3,,xp and Y=y1,y2,y3,.,yq. To align these two sequences using DTW, we construct an p × q matrix, where the (ith, jth) element of the matrix contains the distance (D(xi,yj)) between two points xi and yj (i.e. D(xi,yj)=(xiyj)2).6 The

Speeding up DTW

Beside the global constraints already mentioned in Section 3.2, some other techniques to reduce the time and space complexity of DTW (which is O(mn)), can be broadly classified into following two categories.

Improving the quality of DTW

Here, we discuss the techniques proposed in the literature for improving the performance of DTW.

Finding subsequence with DTW

All of the above mentioned algorithms was designed for matching all elements of the sequences. But none of these above mentioned techniques can handle subsequence matching, which is specially needed in word spotting especially for Dataset-GW-HOG and Dataset-Japanese-HOG. In this section, we speak about simple modifications of classical DTW for subsequence matching.

Other relevant sequence matching techniques

There are others relevant sequence matching techniques, which were proposed to overcome some of the architectural drawbacks of DTW by removing some constraints (especially boundary and continuity conditions), which helps these techniques to skip outliers from query and/or target sequences. At the same time, the many-to-one and one-to-many matching property of DTW is missing in these techniques.

Overall comparative analysis of algorithms and conclusions

In this paper, different dynamic programming matching techniques were explored for word spotting purpose. Indeed, there exists a wide variety of variations of the popular DTW, only classical-DTW has been used most of the time without any justification. Our comparison was based on experimental protocols, involving handwritten datasets (George Washington, Bentham and a Japanase dataset) and a historical printed document. Two levels of segmentation were considered: word level, with perfect

Acknowledgment

This work is partly supported by Indo-French Center for Promotion of Advanced Research (IFCPAR/CEFIPRA). Authors want to thank Myong K. Jeong and Longin Latecki for providing the code of WDTW and OSB, respectively.

Tanmoy Mondal: received B.Tech. degree in information technology from West Bengal University of Technology, Kolkata (India), in 2007 and the M.Tech. degree in mechatronics & robotics from Bengal Engineering and Science University, Kolkata (India) in 2009. Before joining as a PhD student at Poly-Tech Tours (France) in 2012, he worked at several industries and premier R&D centers as a researcher. After completing his PhD from Laboratoire d'Informatique, Poly-Tech, Tours (France) in 2015.

References (38)

  • J.a. Rodríguez-Serrano et al.

    Handwritten word-spotting using hidden Markov models and universal vocabularies

    Pattern Recognit.

    (2009)
  • G. Al-Naymat et al.

    SparseDTW: A novel approach to speed up dynamic time warping

    Conferences in Research and Practice in Information Technology Series

    (2009)
  • T. Albrecht, Dynamic Time Warping (DTW) (2009)...
  • L. Benedikt et al.

    Facial Dynamics in Biometric Identification

    Proceedings of the British Machine Vision Conference

    (2008)
  • H. Chen

    Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions

    18th IEEE International Conference on Image Processing (ICIP)

    (2011)
  • S. Chu et al.

    Iterative Deepening Dynamic Time Warping for Time Series

    Proc 2nd SIAM International Conference on Data Mining

    (2002)
  • M.J.P. Eamonn J. Keogh

    Scaling up dynamic time warping for datamining applications

    KDD

    (2000)
  • V. Frinken et al.

    A novel word spotting method based on recurrent neural networks.

    IEEE TPAMI

    (2012)
  • B. Gatos et al.

    Ground-Truth production in the tranScriptorium project

    11th IAPR International Workshop on Document Analysis Systems (DAS)

    (2014)
  • Cited by (15)

    • Debiased learning and forecasting of first derivative

      2022, Knowledge-Based Systems
      Citation Excerpt :

      It can be applied, for example, to the change point problems for detecting the cellular morphology changes [1], the bump hunting [2], and the trend in time series [3]. Also in the field of pattern recognition, the derivative estimation can be used for time series classification [4], texture classification [5], and word spotting [6]. While for more applications of the first derivative, they include, but not limited to, the following areas: cell biology [7], computer vision [8], medicine [9], machine learning [10], and effect evaluation [11].

    • On-line Elastic Similarity Measures for time series

      2019, Pattern Recognition
      Citation Excerpt :

      In recent years, time series have extended to many scientific and social domains such as medicine, manufacturing industry, energy consumption and geophysics, among others [1–3]. In order to extract valuable information or respond to the specific needs and challenges of these areas of application, the scientific community has made a great effort to develop different time series mining and machine learning models [4–7]. When analyzing this particular type of data, distance-based classification, clustering, anomaly detection and motif discovery algorithms [8–12] – have played a central role.

    • Word spotting and recognition via a joint deep embedding of image and text

      2019, Pattern Recognition
      Citation Excerpt :

      Understanding handwritten text in document images is an essential problem that receives a growing amount of attention from the pattern recognition community. This problem involves various challenging tasks including word recognition, where the goal is to identify the word corresponding to a given region of the document image, and word spotting, which aims at finding all occurrences of a query word in a dataset of document images [16,33,40,44]. Word spotting can further be divided in two different scenarios: query-by-example (QBE), for which the query word is an image, and query-by-string (QBS), where the query is a text string.

    View all citing articles on Scopus

    Tanmoy Mondal: received B.Tech. degree in information technology from West Bengal University of Technology, Kolkata (India), in 2007 and the M.Tech. degree in mechatronics & robotics from Bengal Engineering and Science University, Kolkata (India) in 2009. Before joining as a PhD student at Poly-Tech Tours (France) in 2012, he worked at several industries and premier R&D centers as a researcher. After completing his PhD from Laboratoire d'Informatique, Poly-Tech, Tours (France) in 2015. Currently, he is doing Post-Doc at INSA, Lyon, France. His research interests include pattern recognition, image processing and analysis, and computer vision. His current research is mainly related to time series matching techniques and document image processing.

    Nicolas Ragot: received his Ph.D. degree in computer science in 2003 from IRISA lab, Rennes University (France). Since 2005, he joined the Computer Science Lab (LI EA 6300) in the RFAI group of Université François-Rabelais, Tours (France), where he is an assistant professor at Poly-Tech Tours (French engineering school). His main research area is Pattern Recognition applied to Document Analysis. During the past 10 years, he worked mainly on online signature recognition, robust and adaptive OCR systems based on HMM, OCR control and defects detection (with French National Library-BnF). More recently he and Indian Statistical Institute-Kolkata received a 3 years grant from IFCPAR for project collaboration on robust and multilingual word spotting. He and his group were also involved in several National projects funded by government (ANR NAVIDOMAS, DIGIDOC etc.) as well as companies (ATOS Worldline, Nexter). His group has also received (during 2 years) Google Digital Humanities award to work on interactive layout analysis and the use of pattern redundancy for transcription and retrieval of old printed books.

    Jean-Yves Ramel: received his Ph.D. in Computer Science (1996) from the RFV/LIRIS Laboratory in Lyon (France). From 1998 to 2002, he was working in the field of Man-Machine Interaction at INSA Lyon. Since 2002, he is working in the field of Pattern Recognition and Image Analysis at the Computer Sciences Laboratory (LI) of Tours (RFAI team) at Poly-Tech Tours (France). Since September 2007, he is Professor at the LI laboratory in the RFAI group.

    Umapada Pal: received his Ph.D. in 1997 from Indian Statistical Institute. He did his Post Doctoral research at INRIA (Institut National de Recherche en Informatique et en Automatique), France. From January 1997, he is a Faculty member of Computer Vision and Pattern Recognition Unit of the Indian Statistical Institute, Kolkata and at present he is a Professor. His fields of research interest include Digital Document Processing, Optical Character Recognition, Biometrics, Word spotting etc. He has published 263 research papers in various international journals, conference proceedings and edited volumes. Because of his significant impact in the Document Analysis research, in 2003 he received ICDAR Outstanding Young Researcher Award from International Association for Pattern Recognition (IAPR). In 2008, 2011 and 2012, Dr. Pal received Visiting fellowship from Spain, France and Australia government, respectively. Dr. Pal has been serving as General/Program/Organizing Chair of many conferences including International Conference on Document Analysis and Recognition (ICDAR), International Conference on Frontiers of Handwritten Recognition (ICFHR), International Workshop on Document Analysis and Systems (DAS), Asian Conference on Pattern recognition (ACPR) etc. Also he has served as a program committee member of more than 50 international events. He has many international research collaborations and supervising Ph.D. students of many foreign universities. He is an associate Editor of the journal of ACM Transactions of Asian Language Information Processing (ACM-TALIP), Pattern recognition Letters (PRL), Electronic Letters on Computer Vision and Image Analysis (ELCVIA) etc. He has also served as a guest editor of several special issues. He is a Fellow of IAPR (International Association of Pattern Recognition).

    1

    The Matlab implementation of this article is available here: https://github.com/tanmayGIT/ICDAR-2015-DTW.

    View full text