Abstract
In this article, we aim at reducing the error rate of the online Tamil symbol recognition system by employing multiple experts to reevaluate certain decisions of the primary support vector machine classifier. Motivated by the relatively high percentage of occurrence of base consonants in the script, a reevaluation technique has been proposed to correct any ambiguities arising in the base consonants. Secondly, a dynamic time-warping method is proposed to automatically extract the discriminative regions for each set of confused characters. Class-specific features derived from these regions aid in reducing the degree of confusion. Thirdly, statistics of specific features are proposed for resolving any confusions in vowel modifiers. The reevaluation approaches are tested on two databases (a) the isolated Tamil symbols in the IWFHR test set, and (b) the symbols segmented from a set of 10,000 Tamil words. The recognition rate of the isolated test symbols of the IWFHR database improves by 1.9 %. For the word database, the incorporation of the reevaluation step improves the symbol recognition rate by 3.5 % (from 88.4 to 91.9 %). This, in turn, boosts the word recognition rate by 11.9 % (from 65.0 to 76.9 %). The reduction in the word error rate has been achieved using a generic approach, without the incorporation of language models.
Similar content being viewed by others
References
Sundaram S (2011) Lexicon-free recognition strategies for online handwritten Tamil words, PhD Thesis, Indian Institute of Science
Sundaresan CS, Keerthi SS (1999) A study of representations for pen based handwriting recognition of Tamil characters In: Proceedings International Conference on Document Analysis and Recognition, pp 422–425
Toselli AH, Pastor M, Vidal E (2007) On-line handwriting recognition system for Tamil handwritten characters, In: Proceedings Pattern Recognition Image Analysis, pp 370–377
Prasanth L, Babu J, Sharma R, Rao P, Dinesh M (2007) Elastic matching of online handwritten Tamil and Telugu scripts using local features In: Proceedings International Conference on Document Analysis and Recognition, pp 1028–1032
Joshi N, Sita G, Ramakrishnan AG, Madhvanath S (2004) Comparison of elastic matching algorithms for online Tamil handwritten character recognition In: Proceedings International Workshop Frontiers Handwriting Recognition, pp 444–449
Deepu V, Madhvanath S, Ramakrishnan AG (2004) Principal component analysis for online handwritten character recognition In: Proceedings International Conference Pattern Recognition, pp 327–330
Raghavendra BS, Narayanan CK, Sita G, Ramakrishnan AG, Sriganesh M (2005) Prototype learning methods for online handwriting recognition In: Proceedings International Conference on Document Analysis and Recognition, pp 287–291
Swethalakshmi H, Chandra Sekhar C, Chakravarthy VS (2007) Spatiostructural features for recognition of online handwritten characters in Devanagari and Tamil scripts. Proc Intern Conf Artif Neural Netw 2:230–239
Aparna KH, Subramanian V, Kasirajan M, Prakash GV, Chakravarthy VS, Madhvanath S (2004) Online handwriting recognition for Tamil In: Proceedings International Worshop Frontiers Handwriting Recognition, pp 438–443
Vuurpijl L, Schomaker L, Van Erp M (2003) Architectures for detecting and solving conflicts: two-stage classification and support vector classifiers. Intern J Doc Aanal Recogn, 5(4):213–223
Bellili A, Gilloux M, Gallinari P (2003) An MLP–SVM combination architecture for offline handwritten digit recognition. Intern J Doc Aanal Recogn 5(4):244–252
Prevost L, Oudot L, Moises A, Michel-Sendis C, Milgram M (2005) Hybrid generative/discriminative classifier for unconstrained character recognition. Pat Recogn Lett 26(12):1840–1848
Alaei A, Nagabhushan P, Pal U (2009) Fine classification of unconstrained handwritten persian/arabic numerals by removing confusion amongst similar classes In: Proceedings International Conference on Document Analysis and Recognition, pp 601–605
Sharma DV, Lehal GS, Mehta S (2009) Shape encoded post processing of Gurmukhi OCR In: Proceedings International Conference on Document Analysis and Recognition, pp 788–792
Lehal GS, Singh C (2002) A post processor for Gurmukhi OCR. SADHANA 27(1):99–112
Nair K, Jawahar CV (2010) A post-processing scheme for Malayalam using statistical sub-character language models In: Proceedings Document Analysis System, pp 363–370
Chaudhuri BB, Pal U (1996) OCR error detection and correction of an inflectional Indian language script. Proc Intern Conf Pat Recogn 3:245–249
Nethravathi B, Archana CP, Shashikiran K, Ramakrishnan AG, Kumar V (2010) Creation of a huge annotated database for Tamil and Kannada OHR In: Proceedings International Workshop Frontiers Handwriting Recognition, pp 415–420
Isolated IWFHR 2006 Tamil Handwritten Character Dataset www.hpl.hp.com/india/research/penhw-interfaces-1linguistics.html
Burges JC (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowl Dis 2:121–167
Duda, Hart, Stork (1995) Pattern classification, Springer Wiley
Chang CC, Lin CJ (2011) LIBSVM : a library for support vector machines, ACM transactions on intelligent systems and technology, Vol 2, Issue 3
Rahman AFR, Fairhurst MC (1997) Selective partition algorithm for finding regions of maximum pairwise dissimilarity among statistical class models. Pat Recogn Lett 18(7):605–611
Leung KC, Leung CH (2010) Recognition of handwritten Chinese characters by critical region analysis. Pat Recogn 43(3):949–961
Sundaram S, Ramakrishnan AG (2011) Lexicon-free, novel segmentation of online handwritten Indic words In: Proceedings International Conference on Document Analysis and Recognition, pp 1175–1179
Suresh S, Ramakrishnan AG (2013) Attention-feedback based robust segmentation of online handwritten isolated Tamil words. ACM Trans Asian Lang Inform Process vol 12, Issue 1, Article 4, (March 2013)
Acknowledgements
The authors thank Technology Development for Indian Languages (TDIL), Department of Information Technology, Govt of India for funding this work. The help rendered by the staff of Medical Intelligence and Language Engineering (MILE) Laboratory in data collection and truthing is acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Originality and contributions
1. In the literature, in the context of online Indic handwriting, there is hardly any comprehensive work that addresses the problem of disambiguating confused characters. To the knowledge of the authors, this may be a maiden attempt at reducing the error rate of online handwritten Tamil symbols with reevaluation strategies.
2. A dynamic time-warping approach has been proposed to capture the regions of the trace that discriminate confused Tamil symbols. Thereafter, novel class-specific discriminative features are proposed from the extracted regions to disambiguate these symbols.
3. Dedicated to each confusion set (derived from the confusion matrix), an SVM classifier (referred to as expert) has been proposed. The expert classifier operates on the novel discriminative features.
4. A set of novel features have been proposed to reduce the confusions of vowel modifiers in CV combinations.
5. A systematic study of the occurrence frequency of linguistically similar Tamil symbols has been performed on a text corpus.
Appendix: The complete list of Tamil characters
Appendix: The complete list of Tamil characters
Rights and permissions
About this article
Cite this article
Sundaram, S., Ramakrishnan, A.G. Performance enhancement of online handwritten Tamil symbol recognition with reevaluation techniques. Pattern Anal Applic 17, 587–609 (2014). https://doi.org/10.1007/s10044-013-0353-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-013-0353-7