Skip to main content
Log in

Orthonormal wavelet representations for recognizing complex annotations

  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper describes a novel method of pattern recognition targeted for recognizing complex annotations found in paper documents. Our investigation is motivated by the high reliability required for accomplishing autonomous interpretation of maps and engineering drawings. The recognition problem is made difficult in part because characters and text may be expressed in arbitrary fonts and orientations. Our approach includes a novel incremental strategy based on the multiscale representation of wavelet decompositions. Our approach is motivated by biological mechanisms of the human visual system. Choosing wavelets that are simultaneously localized in both space and frequency, and decomposing a signal into a multiscale hierarchical basis with orientation selectivity, can provide a powerful methodology for pattern analysis. We evaluated several wavelets with different spatial-frequency characteristics and measured their performance in the context of character recognition. Wavelet bases are more attractive than traditional hierarchical bases because they are orthonormal, linear, continuous, and continuously invertible. The multiscale representation of wavelet transforms provides a mathematically coherent basis for multigrid techniques. In contrast to previous adhoc approaches, our method promises a practical solution embedded in a unified mathematical theory. A feasibility study is described in which more than 10000 patterns were recognized with an error rate of 2.6% by a neural network trained using multiscale representations from a class of 52 distinct alphanumeric patterns and graphical symbols. We observed a 10-fold reduction in the amount of information needed to represent each character for recognition. These results suggest that high reliability is possible at a reduced cost of representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baird HS, Kahan S, Pavlidis T (1986) Components of an omnifont page reader. IEEE Computer Society Conference on Pattern Recognition, pp 344–347, Paris, France

  • Burr DJ (1988) Experiments on Neural Net Recognition of Spoken and Written Text. IEEE Trans Acoustics Speech Sig Processing 36(7):1162–1168

    Article  MATH  Google Scholar 

  • Chui CK (1992) Wavelet analysis and its applications. Academic Press, San Diego

    Google Scholar 

  • Coifman RR, Wickerhauser MV (1990) Entropy-based algorithms for best basis selection. IEEE Trans Info Theory 38(n2): 713–718

    Article  Google Scholar 

  • Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math 41:909–996

    MATH  MathSciNet  Google Scholar 

  • Kaiman B (1990) Super linear learning in back propagation neural nets. Department of Computer Science Technical Report WUCS-90-21. Washington University, St. Louis

    Google Scholar 

  • Kasturi R, Bow ST, El-Masri W, Shah J, Cattiler JR, Mokate UB (1990) A system for interpretation of line drawings. IEEE Trans Pattern Analysis Machine Intelligence 12(10):978–991

    Article  Google Scholar 

  • Khotanzad A, Hong YH (1990) Invariant image recognition by Zernike moments. IEEE Trans Pattern Analysis Machine Intelligence (12)5:489–497

    Article  Google Scholar 

  • Khotanzad A, Lu JH (1989) Object recognition using a neural network and invariant Zernike features. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, pp 200–205

  • Khotanzad A, Lu JH (1990) Classification of invariant image representations using a neural network. IEEE Trans Acoustics Speech Signal Processing (38)6:1028–1038

    Article  Google Scholar 

  • Kumar A, Fuhrmann DR, Frazier M, Jawerth B (1992) A new transform for time-frequency analysis. Transactions on IEEE Signal Processing 30:1697–1707

    Article  Google Scholar 

  • Li HF, Jayakumar R, Youssef M (1989) Parallel algorithms for recognizing handwritten characters using shape features. Pattern Recognition 22(6):641–652

    Article  Google Scholar 

  • Mallat S (1989) A Theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Analysis Machine Intelligence 11(7):674–693

    Article  MATH  Google Scholar 

  • Ohya J, Shio A, Akamatsu S (1988) A relaxational extracting method for character recognition in scene images. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Ann Arbor, pp 424–429

  • SÄckinger E, Boser BE, Bromley J, LeCun Y, Jackel LD (1992) Application of the ANNA neural network chip to high-speed character recognition. IEEE Transactions on Neural Networks 3(3):498–505

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laine, A., Schuler, S. & Girish, V. Orthonormal wavelet representations for recognizing complex annotations. Machine Vis. Apps. 6, 110–123 (1993). https://doi.org/10.1007/BF01211935

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01211935

Key words

Navigation