Abstract
This article presents a research project carried out with the aim of investigating the improvements in recognition performances that result from the use of linguistic information in a handwriting-recognition system. The purpose of the study was to design a postprocessor that would enhance an existing handwriting-recognition system by identifying and correcting words it did not recognize initially. This was done by integrating linguistic information (both lexical and syntactical) into the system. Every sentence containing one or more incorrect words is parsed and all possible grammatical classes for each incorrect word are listed. Then, a lexical enquiry searches for words in the lexicon corresponding to the grammatical class of the word in question. Finally, a string-comparison algorithm selects only the words in the lexicon that are close to the incorrect word. The results of this experimentation show that such a system is more efficient in correcting words (even highly distorted ones) than conventional systems that only integrate lexical information. In conclusion, the integration of linguistic information to correct words not recognized by a handwritingrecognition system is shown to be an effective approach, and one that might be worth pursuing.
Similar content being viewed by others
References
Aho A, Ullman J (1972) The theory of parsing, translation and compiling. Parsing, Series in Automatic Computation, Prentice-Hall, Englewood Cliffs, N.J., vol. 1
Bahl LR, Jelinek F, Mercer RL (1983) A maximum likelihood approach to continuous speech recognition. IEEE Trans Patt Anal Machine Intell 5:179–90
Baker JK (1975) Stochastic modeling for automatic speech understanding. In Reddy DR (ed.) Speech Recognition, Academic Press New York, pp 521–542
Barrière C (1991) Exploration de l'approche par réseaux neuronnaux pour la reconnaissance de symboles manuscrits. M.Sc.A. Thesis, Montreal School of Polytechnics
Barriére C, Plamondon R (1992a) Recognizing sequences of letters in mixed-script handwriting. Proceedings Vision Interface, pp 83–91
Barriére C, Plamondon R (1992b) Réseaux neuronaux et mesure de similarité pour la reconnaissance d'écriture cursive. Bigre n.80-CNED'92: Conference on Handwriting and Documents, Nancy, France, pp 178–186
Chomsky N (1957) Syntactic structures. Mouton, The Hague
Catach N (1984) Les listes orthographiques de base du franÇais (LOB): les mots les plus fréquents et leurs formes fléchies les plus fréquentes. Nathan F (ed.) Paris
Church KW (1989) Stochastic parts program and noun phrase parser for unrestricted text. Internation Conference on Acoustics, Speech and Signal Processing '89, pp 695–698
Clergeau S (1993) Intégration de connaissances lexicales et syntaxiques à un système de reconnaissance d'écriture manuscrite. M.Sc. A. Thesis, Montreal School of Polytechnics
Corraza A, De Mori R, Gretter G, Satta G (1991) Computation of probabilities for an island-driven parser. IEEE Trans Patt Anal Machine Intell 13:936–950
Derouault AM, Merialdo B (1984) Natural language modelling for phoneme-to-text transcription. IEEE Trans Patt Anal Machine Intell 8:742–749
Dumouchel P, Gupta V, Lennig L, Mermelstein P (1988) Three probabilistic language models for a large-vocabulary speech recognizer. International Conference on Acoustics, Speech and Signal Processing '88, pp 513–516
Ford DM, Higgins CA (1990) A tree-based dictionary search technique and comparison with n-gram letter graph reduction. In: Plamondon R, Leedham G (eds.) Computer Processing of Handwriting pp 291–312, World Scientific Pub., Singapore
Goshtasby A, Ehrich RW (1988) Contextual word recognition using probabilistic relaxation labelling. Patt Recogn 21:455–462
Hull JJ, Srihari SN (1982) Experiments in text recognition with binary n-grams and viterbi algorithms. IEEE Trans Patt Anal Machine Intell, 4:520–530
Jelinek F (1991) Up from trigrams! The struggle for improved language models. Eurospeech 1991, Continuous Speech Recognition Group
Jelinek F and Lafferty JD (1990) Computation of the probability of initial substring generation by stochastic context-free grammars. Internal Report, Continuous Speech Recognition Group, IBM Research, T.J. Watson Research Center, Yorktown Heights, NY
Jelinek F, Lafferty JD, Mercer RL (1991) Basic method of probabilistic context-free grammars. Internal Report, T.J. Watson Research Center, Yorktown Heights, NY
Jones A, Story A, Ballard W (1991) Integrating multiple knowledge sources in a bayesian OCR postprocessor. International Conference on Document Analysis and Recognition, St-Malo, France pp 925–933
Keenan FG, Evett LJ, Whitrow RJ (1991) A large vocabulary stochastic syntax analyzer for handwriting recognition. International Conference on Document Analysis and Recognition, St-Malo, France, pp 794–802
Lowerre B (1980) The HARPY speech understanding system. In: Les WA (ed.) Trends in speech recognition, Prentice-Hall
Mergel D, Peaseler A (1987) Construction of language models for spoken database queries. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Dallas, pp 844–847
Parisse C, Rosenthal V, Imadache A, Andreewsky E, Cochu F (1990) A task oriented approach to reading and to handwritten text recognition. In: Plamondon R & Leedham CG (eds) Computer Processing of Handwriting. World Scientific Publishing, pp 313–335
Plamondon R, Clergeau-Tournemire S., Barrière C (1994) Handwritten sentence recognition: From signal to syntax. Proceedings of the 12th International Conference on Pattern Recognition, Jerusalem, pp 117–122
Préfontaine R, Répertoire du vocabulaire oral des 6-12 ans: évaluation de l'étendue du vocabulaire oral et écrit. Le Sablier, Boucherville, Canada
Proximity Technology (1987) PF474 Developer Toolkit
Quinton P (1977) Utilisation d'un analyseur syntatxique pour la reconnaissanece de la parole continue. Annales des Télécommunications, vol. 32
Sabah G (1988) Traitement des non-attendus. L'intelligence artificielle et le langage, pp 152–184
Seneff S (1989) TINA: a probabilistic syntactic parser for speech understanding systems. International Conference on Acoustics, Speech and Signal Processing '89, pp 711–714
Ters F (1986) Les 1000 mots fondamentaux de l'école élémentaire: échelle Dubois-Buyse, vocabulaire actif. In: Orgeval, MDI (ed)
Ullmann JR (1975) A binary n-gram technique for automatic correction of substitution, deletion, insertion and reversal errors in words. Comp J, vol. 20
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans Inform Theory 13:260–269
Wagner RA, Fischer MJ (1974) The string to string correction problem. J ACM 21:168–173
Wells CJ, Evett LJ, Whitby PE, Whitrow RJ (1991) Word look-up for script recognition — choosing a candidate. International Conference on Document Analysis and Recognition, St-Malo, France, pp 620–628
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Clergeau-Tournemire, S., Plamondon, R. Integration of lexical and syntactical knowledge in a handwriting-recognition system. Machine Vis. Apps. 8, 249–259 (1995). https://doi.org/10.1007/BF01219593
Issue Date:
DOI: https://doi.org/10.1007/BF01219593