Abstract
In this paper, a method of semi-automatic training set acquisition for character classifiers used in cursive handwriting recognition is described. The training set consists of character samples extracted from a training corpus by segmentation. The method first splits the word images from the corpus into a sequence of graphemes. Then, the set of candidate segmentation variants is elicited with an evolutionary algorithm, where the segmentation variant determines subdivision of grapheme sequences of words into subsequences corresponding to consecutive letters. Segmentation variants are modeled by a chromosome population. Next, each segmentation variant from the final population is tuned in an iterative process and the best chromosome is selected. Then character samples resulting from application of the segmentation modeled by the selected chromosome are grouped into sets corresponding to letters from the alphabet. Finally, the most outstanding samples are rejected so as to maximize the accuracy of words recognition obtained with a character classifier trained with the reduced samples set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Schomaker, L., Teulings, H.: Unsupervised learning of prototype allographs in cursive script using invariant handwritting features. In: Simon, J.C., Impedovo, S. (eds.) From Pixels to Features III, North-Holland, Amsterdam (1992)
Mestetskii, L., Reyer, I., Sederberg, T.: Continuous approach to segmentation of handwritten text. In: Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 440–444 (2002)
Mackowiak, J., Schomaker, L., Vuurpijl, L.: Semi-automatic determination of allograph duration and position in on-line handwriting words based on the expected number of strokes. In: Progress in Handwriting Recognition, World Scientific, London (1997)
Yulyakov, S., Govindaraju, V.: Probabilistic model for segmentation based word recognition with lexicon. In: Proc. of the Sixth International Conference on Document Analysis and Recognition, pp. 164–167. IEEE Press, Orlando, Florida, USA (2001)
Sadri, J., Suen, C., Bui, T.D.: A genetic framework using contextual knowledge for segmentation and recognition of handwritten numeral strings. Pattern Recognition 40, 898–919 (2007)
Lamprier, S., Amghar, T., Levrat, B., Saubion, F.: Seggen: a genetic algorithm for linear text segmentation. In: Proceedings of IJCAI, pp. 1647–1652 (2007)
Connel, S., Jain, A.: Writer adaptation for online handwriting recognition. IEEE Trans. on PAMI 24, 329–346 (2002)
Arica, N., Yarman-Vural, F.: Optical character recognition for cursive handwriting. IEEE Trans. on PAMI 24, 801–813 (2002)
Knjazew, D.: OmeGA: A Competent Genetic Algorithm for Solving Permutation and Scheduling Problems. Kluwer Academic Publishers, Boston, MA (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sas, J., Markowska-Kaczmar, U. (2007). Semi-automatic Training Sets Acquisition for Handwriting Recognition. In: Kropatsch, W.G., Kampel, M., Hanbury, A. (eds) Computer Analysis of Images and Patterns. CAIP 2007. Lecture Notes in Computer Science, vol 4673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74272-2_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-74272-2_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74271-5
Online ISBN: 978-3-540-74272-2
eBook Packages: Computer ScienceComputer Science (R0)