Computer recognition of printed Tamil characters

doi:10.1016/0031-3203(78)90032-8

Pattern Recognition

Volume 10, Issue 4, 1978, Pages 243-247

https://doi.org/10.1016/0031-3203(78)90032-8 Get rights and content

Abstract

Computer recognition of machine-printed letters of the Tamil alphabet is described. Each character is represented as a binary matrix and encoded into a string using two different methods. The encoded strings form a dictionary. A given text is presented symbol by symbol and information from each symbol is extracted in the form of a string and compared with the strings in the dictionary. When there is agreement the letters are recognized and printed out in Roman letters following a special method of transliteration. The lengthening of vowels and hardening of consonants are indicated by numerals printed above each letter.

References (8)

G. Siromoney
Entropy of Tamil prose
Inf. Control
(1963)
R. Narasimhan et al.
A syntax-aided recognition scheme for handprinted English letters
Pattern Recognition
(1971)
G. Siromoney et al.
The invention of the Brahmi script
MCC Mag.
(1977)
V.A. Kovalevsky
Character Readers and Pattern Recognition
(1968)

There are more references available in the full text version of this article.

Cited by (46)

A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts
2017, Pattern Recognition
Citation Excerpt :
Among different feature extraction techniques mentioned in contemporary literature, some are generic e.g. basic shape based primitive features [1,2], gradient based features [4,19], shadow features [20,9], moment-based features [7], contour based features [9,10] etc. which have been successfully used for the recognition of isolated handwritten characters and digits of multiple Indic scripts. On the other hand, some methodologies utilize some shape based trait, specific to the script used in their respective approaches e.g. Das et al. [21], Negi et al. [22], Bajaj et al. [23], Siromoney et al. [24] etc. Although the first approach is easier to design, the latter provides better [1,25] performance in terms of recognition accuracy.
Recognition of handwritten characters is a challenging task. Variations in writing styles from one person to another, as well as for a single individual from time to time, make this task harder. Hence, identifying the local invariant patterns of a handwritten character or digit is very difficult. These challenges can be overcome by exploiting various script specific characteristics and training the OCR system based on these special traits. Finding ubiquitous invariant patterns and peculiarities, applicable for handwritten characters or digits of multiple scripts, is much more difficult. In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose. A deep quad-tree based staggered prediction model has been proposed for faster character recognition. These denote the most significant contributions of the present work. The proposed methodology has been tested on 9 publicly available datasets of isolated handwritten characters or digits of Indic scripts. Promising results have been achieved by the proposed system for all of the datasets. A comparative analysis has also been performed against some of the contemporary OCR systems to prove the superiority of the proposed system. We have also evaluated our system on MNIST dataset and achieved a maximum recognition accuracy of 99.74%, without any data augmentation to the original dataset.
Basic handwritten character recognition from multi-lingual image dataset using multi-resolution and multi-directional transform
2012, International Journal of Wavelets, Multiresolution and Information Processing
Development of a Robust Dataset for Printed Tamil Character Recognition
2024, Lecture Notes in Networks and Systems
A New Model Evaluation Framework for Tamil Handwritten Character Recognition
2023, Lecture Notes in Networks and Systems
uTHCD: A new benchmarking for Tamil handwritten OCR
2021, arXiv
An effective feature set for enhancing printed tamil character recognition
2021, Journal of the National Science Foundation of Sri Lanka

View all citing articles on Scopus

View full text

Computer recognition of printed Tamil characters

Abstract

Inf. Control

Pattern Recognition

The invention of the Brahmi script

MCC Mag.

Character Readers and Pattern Recognition