Lexicon reduction using key characters in cursive handwritten words

https://doi.org/10.1016/S0167-8655(99)00099-9Get rights and content

Abstract

The concept of key characters in a cursively handwritten word image is introduced and a method for extracting the key characters is presented. Key characters capture the unambiguous parts of the cursive words that can be reliably segmented and recognized. We propose a method for lexicon reduction using key characters in conjunction with a word-length estimation.

Introduction

Handwritten word recognition is a challenging problem encountered in many real-world applications, such as postal mail sorting, bank check recognition, and automatic data entry from business forms. A prevalent technique for off-line cursive word recognition is based on over-segmentation followed by dynamic programming (Bozinovic and Srihari, 1989, Gader et al., 1997, Mao et al., 1998). It seems to outperform segmentation-free hidden Markov models (HMMs) using a sliding window (Mohamed and Gader, 1996).

In the over-segmentation followed by dynamic programming approach, a set of split points on word strokes is chosen based on heuristics to divide the word into a sequence of graphemes (primitive structures of characters, see Fig. 1(b)). A character may consist of one or more graphemes. Then, the word recognition problem is posed as a problem of finding the best path in a graph named segmentation graph (see Fig. 1(b)). Since our over-segmentor rarely produces more than three graphemes for a character, we remove all the edges which cover more than three graphemes. A character classifier is usually used to assign a cost to each edge in the segmentation graph. The dynamic programming technique is then used for finding the best (hopefully the desired) path from the leftmost node to the rightmost node. A sequence of characters can then be obtained from the sequence of segments on the desired path (see Fig. 1(c)). Note that this sequence of characters may not form a valid word (or string) in a dictionary. Therefore, in situations where a lexicon of limited size can be derived (e.g., in postal address recognition, a list of city–state names can be retrieved from a database once we know the zip candidates), a lexicon-driven matching is more desirable. For each entry in the lexicon, the dynamic programming technique is used to choose which path in the segmentation graph best matches with the entry, and a matching score is then assigned to the entry. The entry with the highest matching score is chosen as the recognition hypothesis.

Given a sequence of N graphemes and a string (lexicon entry) of length W, the dynamic programming technique can be used for obtaining the best grouping of the N graphemes into W segments (Gader et al., 1997). A dynamic table of size (N×W) must be constructed to obtain the best path. Given a lexicon of L entries, the complexity of the lexicon-driven matching is O(L×N×W) and the speed of the lexicon-driven system decreases linearly with the lexicon size. Recognition accuracy also decreases when the lexicon size becomes larger. Therefore, it is very important to perform lexicon reduction in a lexicon-driven recognition system.

There are several techniques proposed in the literature for lexicon reduction (Madvanath and Govindaraju, 1993, Madvanath and Srihari, 1996). Lexicon reduction based on holistic word features (word length, presence of ascenders, descenders, t-crossings and i-dots) (Srihari, 1993, Madvanath and Srihari, 1996) is commonly used. In this approach, holistic word features of the input image are matched against the holistic features of every exemplar for each of the lexicon entries. Lexicon entries which do not match well with the holistic features of the input image are discarded. Typically, more than one exemplar must be stored or synthesized for each lexicon entry because of various different writing styles. The efficiency of this approach is limited by the computational overhead for extracting holistic word features and feature-matching with more than one exemplars for each lexicon entry.

Kimura et al. (1993) proposed a lexicon reduction method in which the input image is first segmented into segments based on a set of heuristics and an ascii string is created based on the recognition results on these segments. A dynamic program is then used to match this ascii string with each lexicon entry. Lexicon entries with high matching costs are eliminated from further consideration. The performance of this approach heavily relies on the initial segmentation of word into “characters” which is problematic for cursive words.

In this paper, we define the concept of key characters in a cursive word image and present a method for extracting the key characters. We also propose a method for lexicon reduction using key characters in conjunction with a word-length estimation. One advantage of this method is that it fits well into any cursive handwritten recognition system based on over-segmentation followed by dynamic programming. The overhead in extracting key characters and length estimation is very small. Unlike in the method by Kimura et al. (1993) which requires segmenting the words twice, the proposed method makes use of the available graphemes.

This paper is organized as follows. In Section 2 the concept and the extraction of key characters are described. Section 3 explains our approach to estimate the length (i.e., the number of characters) of a given image and Section 4 explains the lexicon reduction using both key characters and length estimation. Experiments and results are reported in Section 5 and conclusions are made in Section 6.

Section snippets

Key characters

Key characters identify unambigous characters of cursive words which can be segmented and recognized reliably without performing word recognition or contextual analysis. The extraction and recognition of key characters works directly on the sequence of graphemes which can be obtained by any oversegmentation method (see Fig. 2 for the graphemes obtained by the system described by Mao et al. (1998)).

Generally it is not reliable to identify characters in cursive words solely based on recognition

Length estimation

The goal of length estimation is to provide an estimate of how many characters are present in a given image without performing expensive recognition. Given an image of text line, five features are first extracted from the line. A neural network is used to estimate the line length (in term of the number of characters). Since such an estimation will not be accurate without recognition, a range for the length is also provided by the network.

Since in our system the image is already segmented into

Two-stage lexicon reduction

We implement a lexicon reduction method which consists of the following two stages: (i) lexicon reduction using length estimation and (ii) lexicon reduction using key character strings. The order of these two steps was determined by the fact that examining a lexicon entry in the first stage is faster than in the second stage.

Once the estimation for a minimum and a maximum number of characters is available the two numbers can be used to find lexicon entries which do not fit into the provided

Experiments and results

This section presents the results for both the lexicon reduction and the overall system performance of the recognition system for handwritten addresses.

All the experiments were done using images provided by United States Postal Services. In order to generate the character statistics for the geometric confidences and to extract the length estimation features for the training of the neural network, truthed data of about 800 images were used. For the testing of the lexicon reduction 811 addresses

Conclusions

We introduced the concept of key characters which capture the unambigous parts of a cursive word image. Key characters can be reliably extracted and recognized without performing word recognition or contextual analysis.

The proposed approach of lexicon reduction can be used for speeding up any handwriting recognition system which is based on over-segmentation followed by a lexicon matching step. Only minor changes of an existing system are necessary to achieve a significant speed up without a

References (8)

There are more references available in the full text version of this article.

Cited by (0)

View full text