Comparative study of conventional time series matching techniques for word spotting
Introduction
The advancement of high quality document digitization has provided a stirring alternative to preserve and easy, hassle-free access of ancient manuscripts for historians and researchers. To allow searching into these mass of digitized datasets, indexation based on Optical Character Recognition (OCR) or manual (semi-manual) transcriptions is applied traditionally. Nevertheless, the performance of available OCR engines on such historical documents are not up-to the mark because of the writing and font style variability, linguistics and script dependencies and poor document quality caused by high degradation effects. Even when learning is possible, this one becomes a burdensome process due to the need of ground truth. Whereas, the process of manual or semi-automatic transcription of handwritten or printed documents is a tedious and costly job. For these reasons, word-spotting technique appears to be an interesting alternative and research on this topic has been emphasized. This technique can be defined as the “localization of words of interest in the dataset without actually interpreting the content” and it allows to index or search inside a document using queries.
For spotting words in handwritten manuscripts and historical printed document images, word images can be thought as 2D signals, that can be matched by sequence matching algorithms like DTW [14], [17], [32]. In other application domains, DTW’s variants have been intensively evaluated to demonstrate their interest [7], [34], but they have not been clearly studied and compared in the case of word spotting. In this paper, we propose a detailed comparative study of DTW and it’s variants for word spotting. This study extends the one performed in [27] by including more sequence matching algorithms. Some of them have never been tested in word spotting context whereas they have shown promising results in other domains. Also, more experimental datasets are used (six in total), including both handwritten and printed document images.
The remainder of this paper is organized as follows. The datasets used for experiments as well as the word spotting framework are detailed in Section 2. The baseline of DTW approach and various other dynamic programming (DP) paths, warping constraints are studied in Section 3. The specific techniques to reduce the quadratic time complexity of DTW algorithm are next evaluated in Section 4. Behavior of several other approaches designed for improving the quality of DTW are studied in Section 5. Other dynamic programming based sequence matching approaches, which has shown better performance than classical DTW in several other domains e.g. shape matching, time series signal matching etc. are experimented in Section 7. Finally, a summary of results with discussion and future work is presented in Section 8.
Section snippets
Feature extraction
For all experiments and datasets used, the comparison between a query (word image) and a target (word image or text line(piece of) image) is done by transforming text images into a vector sequence using classical features; such as column based features (please see [28]) or Slit Style HOG features [36].
Column-based features : For an image with a width of N pixels, 8 statistical features, (Table 1) are computed from left to right on each pixel columns. The features have been used
Evaluation of dynamic time warping methods
DTW [31] is a technique for measuring similarity between two different time series by finding their best correspondence. Let’s assume, two 2D signal : and . To align these two sequences using DTW, we construct an p × q matrix, where the (ith, jth) element of the matrix contains the distance () between two points xi and yj (i.e. ).6 The
Speeding up DTW
Beside the global constraints already mentioned in Section 3.2, some other techniques to reduce the time and space complexity of DTW (which is O(mn)), can be broadly classified into following two categories.
Improving the quality of DTW
Here, we discuss the techniques proposed in the literature for improving the performance of DTW.
Finding subsequence with DTW
All of the above mentioned algorithms was designed for matching all elements of the sequences. But none of these above mentioned techniques can handle subsequence matching, which is specially needed in word spotting especially for Dataset-GW-HOG and Dataset-Japanese-HOG. In this section, we speak about simple modifications of classical DTW for subsequence matching.
Other relevant sequence matching techniques
There are others relevant sequence matching techniques, which were proposed to overcome some of the architectural drawbacks of DTW by removing some constraints (especially boundary and continuity conditions), which helps these techniques to skip outliers from query and/or target sequences. At the same time, the many-to-one and one-to-many matching property of DTW is missing in these techniques.
Overall comparative analysis of algorithms and conclusions
In this paper, different dynamic programming matching techniques were explored for word spotting purpose. Indeed, there exists a wide variety of variations of the popular DTW, only classical-DTW has been used most of the time without any justification. Our comparison was based on experimental protocols, involving handwritten datasets (George Washington, Bentham and a Japanase dataset) and a historical printed document. Two levels of segmentation were considered: word level, with perfect
Acknowledgment
This work is partly supported by Indo-French Center for Promotion of Advanced Research (IFCPAR/CEFIPRA). Authors want to thank Myong K. Jeong and Longin Latecki for providing the code of WDTW and OSB, respectively.
Tanmoy Mondal: received B.Tech. degree in information technology from West Bengal University of Technology, Kolkata (India), in 2007 and the M.Tech. degree in mechatronics & robotics from Bengal Engineering and Science University, Kolkata (India) in 2009. Before joining as a PhD student at Poly-Tech Tours (France) in 2012, he worked at several industries and premier R&D centers as a researcher. After completing his PhD from Laboratoire d'Informatique, Poly-Tech, Tours (France) in 2015.
References (38)
- et al.
A New Character Segmentation Approach for Off-Line Cursive Handwritten Words
Procedia Comput. Sci.
(2013) - et al.
Lexicon-free handwritten word spotting using character HMMs
Pattern Recognit. Lett.
(2012) Using derivatives in a longest common subsequence dissimilarity measure for time series classification
Pattern Recognit. Lett.
(2014)- et al.
Non-isometric transforms in time series classification using DTW
Knowl. Based Syst.
(2014) - et al.
Multivariate time series classification with parametric derivative dynamic time warping
Expert Syst. Appl.
(2015) - et al.
Weighted dynamic time warping for time series classification
Pattern Recognition
(2011) - et al.
Word spotting in historical printed documents using shape and sequence comparisons
Pattern Recognition
(2012) - et al.
An elastic partial shape matching technique
Pattern Recognition
(2007) - et al.
Text search for medieval manuscript images
Pattern Recognition
(2007) - et al.
Flexible Sequence Matching technique: An effective learning-free approach for word spotting
Pattern Recognition
(2016)
Handwritten word-spotting using hidden Markov models and universal vocabularies
Pattern Recognit.
SparseDTW: A novel approach to speed up dynamic time warping
Conferences in Research and Practice in Information Technology Series
Facial Dynamics in Biometric Identification
Proceedings of the British Machine Vision Conference
Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions
18th IEEE International Conference on Image Processing (ICIP)
Iterative Deepening Dynamic Time Warping for Time Series
Proc 2nd SIAM International Conference on Data Mining
Scaling up dynamic time warping for datamining applications
KDD
A novel word spotting method based on recurrent neural networks.
IEEE TPAMI
Ground-Truth production in the tranScriptorium project
11th IAPR International Workshop on Document Analysis Systems (DAS)
Cited by (15)
Debiased learning and forecasting of first derivative
2022, Knowledge-Based SystemsCitation Excerpt :It can be applied, for example, to the change point problems for detecting the cellular morphology changes [1], the bump hunting [2], and the trend in time series [3]. Also in the field of pattern recognition, the derivative estimation can be used for time series classification [4], texture classification [5], and word spotting [6]. While for more applications of the first derivative, they include, but not limited to, the following areas: cell biology [7], computer vision [8], medicine [9], machine learning [10], and effect evaluation [11].
On-line Elastic Similarity Measures for time series
2019, Pattern RecognitionCitation Excerpt :In recent years, time series have extended to many scientific and social domains such as medicine, manufacturing industry, energy consumption and geophysics, among others [1–3]. In order to extract valuable information or respond to the specific needs and challenges of these areas of application, the scientific community has made a great effort to develop different time series mining and machine learning models [4–7]. When analyzing this particular type of data, distance-based classification, clustering, anomaly detection and motif discovery algorithms [8–12] – have played a central role.
Word spotting and recognition via a joint deep embedding of image and text
2019, Pattern RecognitionCitation Excerpt :Understanding handwritten text in document images is an essential problem that receives a growing amount of attention from the pattern recognition community. This problem involves various challenging tasks including word recognition, where the goal is to identify the word corresponding to a given region of the document image, and word spotting, which aims at finding all occurrences of a query word in a dataset of document images [16,33,40,44]. Word spotting can further be divided in two different scenarios: query-by-example (QBE), for which the query word is an image, and query-by-string (QBS), where the query is a text string.
Optimized DTW-Resnet for Fault Diagnosis by Data Augmentation Toward Unequal Length Time Series
2023, IEEE Transactions on Instrumentation and MeasurementZ-Transform-Based Profile Matching to Develop a Learning-Free Keyword Spotting Method for Handwritten Document Images
2022, International Journal of Computational Intelligence Systems
Tanmoy Mondal: received B.Tech. degree in information technology from West Bengal University of Technology, Kolkata (India), in 2007 and the M.Tech. degree in mechatronics & robotics from Bengal Engineering and Science University, Kolkata (India) in 2009. Before joining as a PhD student at Poly-Tech Tours (France) in 2012, he worked at several industries and premier R&D centers as a researcher. After completing his PhD from Laboratoire d'Informatique, Poly-Tech, Tours (France) in 2015. Currently, he is doing Post-Doc at INSA, Lyon, France. His research interests include pattern recognition, image processing and analysis, and computer vision. His current research is mainly related to time series matching techniques and document image processing.
Nicolas Ragot: received his Ph.D. degree in computer science in 2003 from IRISA lab, Rennes University (France). Since 2005, he joined the Computer Science Lab (LI EA 6300) in the RFAI group of Université François-Rabelais, Tours (France), where he is an assistant professor at Poly-Tech Tours (French engineering school). His main research area is Pattern Recognition applied to Document Analysis. During the past 10 years, he worked mainly on online signature recognition, robust and adaptive OCR systems based on HMM, OCR control and defects detection (with French National Library-BnF). More recently he and Indian Statistical Institute-Kolkata received a 3 years grant from IFCPAR for project collaboration on robust and multilingual word spotting. He and his group were also involved in several National projects funded by government (ANR NAVIDOMAS, DIGIDOC etc.) as well as companies (ATOS Worldline, Nexter). His group has also received (during 2 years) Google Digital Humanities award to work on interactive layout analysis and the use of pattern redundancy for transcription and retrieval of old printed books.
Jean-Yves Ramel: received his Ph.D. in Computer Science (1996) from the RFV/LIRIS Laboratory in Lyon (France). From 1998 to 2002, he was working in the field of Man-Machine Interaction at INSA Lyon. Since 2002, he is working in the field of Pattern Recognition and Image Analysis at the Computer Sciences Laboratory (LI) of Tours (RFAI team) at Poly-Tech Tours (France). Since September 2007, he is Professor at the LI laboratory in the RFAI group.
Umapada Pal: received his Ph.D. in 1997 from Indian Statistical Institute. He did his Post Doctoral research at INRIA (Institut National de Recherche en Informatique et en Automatique), France. From January 1997, he is a Faculty member of Computer Vision and Pattern Recognition Unit of the Indian Statistical Institute, Kolkata and at present he is a Professor. His fields of research interest include Digital Document Processing, Optical Character Recognition, Biometrics, Word spotting etc. He has published 263 research papers in various international journals, conference proceedings and edited volumes. Because of his significant impact in the Document Analysis research, in 2003 he received ICDAR Outstanding Young Researcher Award from International Association for Pattern Recognition (IAPR). In 2008, 2011 and 2012, Dr. Pal received Visiting fellowship from Spain, France and Australia government, respectively. Dr. Pal has been serving as General/Program/Organizing Chair of many conferences including International Conference on Document Analysis and Recognition (ICDAR), International Conference on Frontiers of Handwritten Recognition (ICFHR), International Workshop on Document Analysis and Systems (DAS), Asian Conference on Pattern recognition (ACPR) etc. Also he has served as a program committee member of more than 50 international events. He has many international research collaborations and supervising Ph.D. students of many foreign universities. He is an associate Editor of the journal of ACM Transactions of Asian Language Information Processing (ACM-TALIP), Pattern recognition Letters (PRL), Electronic Letters on Computer Vision and Image Analysis (ELCVIA) etc. He has also served as a guest editor of several special issues. He is a Fellow of IAPR (International Association of Pattern Recognition).
- 1
The Matlab implementation of this article is available here: https://github.com/tanmayGIT/ICDAR-2015-DTW.