Abstract
Optical Character Recognition (OCR) systems show poor performance while processing documents like old books or newspapers, Xerox materials, faxed documents, etc. Such documents are considered as degraded documents. One of the important reasons for poor recognition rate for degraded documents is existence of touching or connected characters, which create a major problem for designing an effective character segmentation procedure. In this paper, a new technique is proposed for segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting cut-points to segment touching characters. Initially, our proposed method has been applied for segmenting touching characters that appear in Devnagari (Hindi) and Bangla, two major scripts in Indian sub-continent. The results obtained from a test-set of considerable size show that a high recognition rate can be achieved with a reasonable amount of computations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R.G. Casey and G. Nagy, “Recursive segmentation and classification of composite character patterns”, Proc. 6 th Int. Conf. Pattern Recognition (ICPR), Munich, Germany, pp. 1023–1026, 1982.
S. Tsujimoto and H. Asada, “major Components of a Complete Text Reading System”, Proc. IEEE, 80(7), pp. 1133–1149, 1992.
H. Fujisawa, Y. Nakano, and K. Kurino, “Segmentation Methods for Character Recognition: From Segmentation to Document Structure Analysis”, Proc. IEEE, 80(7), pp. 1079–1092, 1992.
T. Nartker, ISRI 1993 Annual Report, Univ. of Nevada, Las Vegas, 1993.
D.G. Elliman and I.T. Lancaster, “A Review of Segmentation and Contextual Analysis Techniques for Text Recognition”, Pattern Recognition, 23(3/4), pp. 337–346, 1990.
R.G. Casey and E. Lecolinet, “A Survey of Methods and Strategies in Character Segmentation”, IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(7), 1996.
S. Liang, M. Shridhar, and M. Ahmadi, “Segmentation of Touching Characters in printed Document Recognition”, Pattern Recognition, 27(6), pp. 825–840, 1994.
Y. Lu, “On the Segmentation of Touching Characters”, Proc. Int. Conf. On Document Analysis and Recognition (ICDAR), Japan, pp. 440–443, 1993.
B.B. Chaudhuri and U. Pal, “A Complete Printed Bangla OCR System”, Pattern Recognition, 31, pp. 531–549, 1998.
B.B. Chaudhuri and U. Pal, “An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)”, in Proc. 4th Int. Conf. on Document Analysis and Recognition (ICDAR), Ulm, Germany, pp. 1011–1016, 1997.
B.B. Chaudhuri, U. Garain, and M. Mitra, “On OCR of the Most Popular Indian Scripts: Devnagari and Bangla”, Visual Text Recognition and Document Processing, Ed: N. Murshed, World Scientific, 2000 (in press).
U. Garain and B.B. Chaudhuri, “On Recognition of Touching Characters in Printed Bangla Documents”, in Proc. of Indian Conf. on Computer Vision, Graphics and Image Processing, Eds: Santanu Chaudhury and Shree K. Nayar, Viva Books Private Limited, Delhi, India, pp. 377–380, 1998.
P.-Z. Wang and M. Sugeno, "The factor fields and background structure for fuzzy subsets", Fuzzy Mathematics, 2(2), pp. 45–54, 1982.
H.X. Li and V.C. Yen, Fuzzy sets and fuzzy decision-making, CRC Press, 1995, USA.
Aho, Sethi and Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley Publishing Co., 1986.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garain, U., Chaudhuri, B.B. (2002). On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis. In: Pal, N.R., Sugeno, M. (eds) Advances in Soft Computing — AFSS 2002. AFSS 2002. Lecture Notes in Computer Science(), vol 2275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45631-7_52
Download citation
DOI: https://doi.org/10.1007/3-540-45631-7_52
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43150-3
Online ISBN: 978-3-540-45631-5
eBook Packages: Springer Book Archive