An optical character recognition system for printed Telugu text

Lakshmi, C. Vasantha; Patvardhan, C.

doi:10.1007/s10044-004-0217-2

An optical character recognition system for printed Telugu text

Theoretical Advances
Published: 01 July 2004

Volume 7, pages 190–204, (2004)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

C. Vasantha Lakshmi¹ &
C. Patvardhan²

704 Accesses
16 Citations
Explore all metrics

Abstract

Telugu is one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. Not much work has been reported on the development of optical character recognition (OCR) systems for Telugu text. Therefore, it is an area of current research. Some characters in Telugu are made up of more than one connected symbol. Compound characters are written by associating modifiers with consonants, resulting in a huge number of possible combinations, running into hundreds of thousands. A compound character may contain one or more connected symbols. Therefore, systems developed for documents of other scripts, like Roman, cannot be used directly for the Telugu language.

The individual connected portions of a character or a compound character are defined as basic symbols in this paper and treated as a unit of recognition. The algorithms designed exploit special characteristics of Telugu script for processing the document images efficiently. The algorithms have been implemented to create a Telugu OCR system for printed text (TOSP). The output of TOSP is in phonetic English that can be transliterated to generate editable Telugu text. A special feature of TOSP is that it is designed to handle a large variety of sizes and multiple fonts, and still provides raw OCR accuracy of nearly 98%. The phonetic English representation can be also used to develop a Telugu text-to-speech system; work is in progress in this regard.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optical Character Recognition of Telugu Text Using Inception Model

Algorithm for segmenting script-dependant portion in a bilingual Optical Character Recognition system

Article 01 July 2017

How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine – Final Notes on Development and Evaluation

References

Nagy G (2000) Twenty years of document image analysis in PAMI. IEEE T Pattern Anal 22(1):38–63
Article Google Scholar
Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. P IEEE 80(7):1029–1058
Article Google Scholar
Govindan VK, Shivaprasad AP (1990) Character recognition: a review. Pattern Recogn 23(7):671–683
Article Google Scholar
Bansal V, Sinha RMK (2001) A survey of OCR in Indian languages and a Devanagari OCR scheme. In: Proceedings of the symposium on translation support systems (STRANS-2001), Kanpur, India, February 2001
Chaudhuri BB, Pal U (1998) A complete printed Bangla OCR system. Pattern Recogn, 31:531–549
Google Scholar
Nagabhushan P, Radhika A (1997) Improved region decomposition method for the recognition of non-uniform sized characters. In: Proceedings of the 1st international conference on cognitive science , Seoul, Korea, August 1997 1:36–42
Anna Durai S et al (1995) Tamil character recognition using multilayer neural network. In: Proceedings of the Indian conference on pattern recognition, image processing and computer vision, Kharagpur, India, December 1995, pp 155–160
Bishnu A, Chaudhuri B (1999) Segmentation of Bangla handwritten text into characters by recursive contour following. In: Proceedings of the 5th international conference on document analysis and recognition (ICDAR’99), Bangalore, India, September 1999, pp 402–405
Pal U, Chaudhuri B (1999) Script line separation from Indian multi-script documents. In: Proceedings of the 5th international conference on document analysis and recognition (ICDAR’99), Bangalore, India, September 1999, pp 406–409
Bansal V, Sinha R (1999) On how to describe shapes of Devanagari characters and use them for recognition. In: Proceedings of the 5th international conference on document analysis and recognition (ICDAR’99), Bangalore, India, September 1999, pp 410–413
Anatani S, Agnihotri L (1999) Gujarati character recognition. In: Proceedings of the 5th international conference on document analysis and recognition (ICDAR’99), Bangalore, India, September 1999, pp 418–421
Sundaresan C, Keerthi S (1999) A study of representation for pen based handwriting recognition of Tamil characters. In: Proceedings of the 5th international conference on document analysis and recognition (ICDAR’99), Bangalore, India, September 1999, pp 422 – 425.
Sukhaswami MB, Seetharamulu P, Pujari AK (1995) Recognition of Telugu characters using neural networks. Int J Neural Syst, September, 1995, 6(3):317–357
Google Scholar
Negi A, Bhagvati C, Krishna B (2001) An OCR system for Telugu. In: Proceedings of the international conference on document analysis and recognition (ICDAR 2001), Seattle, Washington, September 2001
Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE T Pattern Anal 18:690 –706
Article Google Scholar
Pavilidis T, Zhou J (1992) Page segmentation and classification. Computer Vision Graph 54:484–496
Google Scholar
Akiyama T, Hagita N (1990) Automatic entry system for printed documents. Pattern Recogn 23:1141–1154
Article Google Scholar
Le DS, Thoma GR, Wechsler H (1994) Automatic page orientation and skew angle detection for binary document images. Pattern Recogn 27:1325–1344
Article Google Scholar
Sonka M, Hlavac V, Boyle R (1998) Image processing, analysis, and machine vision, 2nd edn. PWS, New York
Yan H (1993) Skew detection of document images using interline cross-correlation. CVGIP–Graph Model Im 55:538–543
Google Scholar
Srikanthan G, Lam SW, Srihari SN (1996) Gradient-based contour encoding for character recognition. Pattern Recogn 29(7):1147–1160
Article Google Scholar
Fausett L (1994) Fundamentals of neural networks. Prentice Hall, Englewood Cliffs, New Jersey
Vasantha Lakshmi C (2003) PhD thesis (unpublished), Dayalbagh Educational Institute, Agra, India

Download references

Author information

Authors and Affiliations

Department of Physics and Computer Science, Dayalbagh Educational Institute, 282005, Agra, India
C. Vasantha Lakshmi
Department of Electrical Engineering, Dayalbagh Educational Institute, 282005, Agra, India
C. Patvardhan

Authors

C. Vasantha Lakshmi
View author publications
You can also search for this author in PubMed Google Scholar
C. Patvardhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Vasantha Lakshmi.

Appendix

Table A

Table A Confusion table

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lakshmi, C.V., Patvardhan, C. An optical character recognition system for printed Telugu text. Pattern Anal Applic 7, 190–204 (2004). https://doi.org/10.1007/s10044-004-0217-2

Download citation

Received: 01 June 2003
Accepted: 13 May 2004
Published: 01 July 2004
Issue Date: July 2004
DOI: https://doi.org/10.1007/s10044-004-0217-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optical character recognition system for printed Telugu text

Abstract

Access this article

Similar content being viewed by others

Optical Character Recognition of Telugu Text Using Inception Model

Algorithm for segmenting script-dependant portion in a bilingual Optical Character Recognition system

How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine – Final Notes on Development and Evaluation

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An optical character recognition system for printed Telugu text

Abstract

Access this article

Similar content being viewed by others

Optical Character Recognition of Telugu Text Using Inception Model

Algorithm for segmenting script-dependant portion in a bilingual Optical Character Recognition system

How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine – Final Notes on Development and Evaluation

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation