Abstract
The present work deals with recognition of handwritten characters of Bangla, a major script of the Indian sub-continent. The main contributions presented here are (a) generation of a database of handwritten basic characters of Bangla and (b) development of a handwritten character recognition scheme suitable for scripts like Bangla consisting of many similar shaped characters for the benchmark results. The present database is a pioneering development in the context of recognition of off-line handwritten characters of this script. It has 37,858 handwritten samples and accommodates a large spectrum of handwriting style by Bangla speaking population. This database will be made available (http://www.isical.ac.in/~ujjwal/download/Banglabasiccharacter.html) free of cost to researchers for further studies. Also, we identified two major factors affecting high recognition accuracies for the present character samples, namely, (a) erratic nature of the presence of headline (shapes of Bangla characters usually contain a horizontal line in its upper part) and (b) existence of several pairs of similar shaped characters. The proposed recognition approach takes care of the above factors. It identifies any confusion in the first stage classification between a pair of similar shaped character classes and resolves the same in the second stage classification by extracting a feature vector based on a non-uniform grid.
Similar content being viewed by others
References
Suen CY, Berthod M, Mori S (1980) Automatic recognition of handprinted characters—the state of the art. Proc IEEE 68(4):469–487
Govindan VK, Shivaprasad AP (1990) Character recognition: a review. Pattern Recognit 7:671–683
Trier OD, Jain AK, Taxt T (1996) Feature extraction methods for character recognition—a survey. Pattern Recognit 29(4):641–662
Plamondon R, Srihari SN (2000) On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63–84
Arica N, Yarman-Vural F (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C Appl Rev 31(2):216–232
Cheriet M, Kharma N, Liu C-L, Suen CY (2007) Character recognition systems: a guide for students and practitioner. Wiley, New York
Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058
Uchida S, Sakoe H (2005) “A survey of elastic matching techniques for handwritten character recognition”. IEICE Transactions on Information and Systems E88-D(8): 1781–1790
Liu C-L, Sako H, Fujisawa H (2002) Performance evaluation of pattern classifiers for handwritten character recognition. Int J Doc Anal Recognit 4(3):191–204
Park H-S, Sin B-K, Moon J, Lee S-W (2001) A 2-D HMM method for offline handwritten character recognition. Int J Pattern Recognit Artif Intell 15(1):91–105
Vinciarelli A, Bengio S (2002) Writer adaptation techniques in HMM based off-line cursive script recognition. Pattern Recognit Lett 23:905–916
Al-Omari FA, Al-Jarrah O (2004) Handwritten Indian numerals recognition system using probabilistic neural networks. Adv Eng Inform 18(1):9–16
Liu C-L, Fujisawa H (2008) Classification and learning methods for character recognition: advances and remaining problems. Stud Comput Intell (SCI) 90:139–161
Kim D, Bang S-Y (2000) A handwritten numeral character classification using tolerant rough set. IEEE Trans Pattern Anal Mach Intell 22(9):923–937
Parizeau M, Plamondon R (1995) A fuzzy-syntactic approach to allograph modeling for cursive script recognition. IEEE Trans Pattern Anal Mach Intell 17:702–712
Hanmandlu M, Ramana Murthy OV (2007) Fuzzy model based recognition of handwritten numerals. Pattern Recognit 40(6):1840–1854
Dong J-X, Krzyak A, Suen CY (2005) An improved handwritten Chinese character recognition system using support vector machine. Pattern Recognit Lett 26:1849–1856
Camastra F (2007) SVM-based cursive character recognizer. Pattern Recognit 40:3721–3727
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Srihari SN, Cohen E, Hull JJ, Kuan L (1989) A system to locate and recognize ZIP codes in handwritten addresses. Int J Res Eng Post Appl 1(1):37–56
Marti U-V, Bunke H (2002) The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recognit 5:39–46
Tang Y et al (1998) Off-line recognition of Chinese handwriting by multifeature and multilevel classification. IEEE Trans Pattern Anal Mach Intell 20:556–561
Shi D, Damper RI, GUNN SR (2003) Offline handwritten Chinese character recognition by radical decomposition. ACM Trans Asian Lang Inf Process 2(1):2748
Lee SW, Park JS (1994) Nonlinear shape normalization methods for the recognition of large-set handwritten characters. Pattern Recognit 27(7):895–902
Yamada H, Yamamoto K, Saito T (1990) A non-linear normalization method for handprinted Kanji character recognition—line density equalization. Pattern Recognit 23(9):1023–1029
Miyao H, Maruyama M, Nakano Y, Hananoi T (2005) Off-line handwritten character recognition by SVM on the virtual examples synthesized from on-line characters. In: Proceedings of the eighth international conference on document analysis and recognition, pp 494–498
Sethi IK, Chatterjee B (1977) Machine recognition of constrained handprinted Devanagari. Pattern Recognit 9(2):69–75
Parui SK, Chaudhuri BB, Dutta Majumder D (1982) A procedure for recognition of connected hand written numerals. Int J Syst Sci 13:1019–1029
Dutta AK, Chaudhuri S (1993) Bengali alpha-numeric character recognition using curvature features. Pattern Recognit 26:1757–1770
Bhattacharya U, Das TK, Datta A, Parui SK, Chaudhuri BB (2002) A hybrid scheme for handprinted numeral recognition based on a self-organizing network and MLP classifiers. Int J Patt Recog Artif Intell 16:845–864
Bhattacharya U, Chaudhuri BB (2005) Fusion of combination rules of an ensemble of MLP classifiers for improved recognition accuracy of handprinted Bangla numerals. In: Proceedings of the eighth international conference on document analysis and recognition, pp 322–326
Bhattacharya U, Chaudhuri BB (2009) Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans Pattern Anal Mach Intell 31(3):444–457
Rahman AFR, Rahman R, Fairhurst MC (2002) Recognition of handwritten Bengali characters: a novel multistage approach. Pattern Recognit 35:997–1006
Bhowmick TK, Bhattacharya U, Parui SK (2004) Recognition of Bangla handwritten characters using an MLP classifier based on stroke features. In: Proceedings of 11th international conference on neural information processing, pp 814–819
Bhattacharya U, Parui SK, Shaw B (2007) A hybrid scheme for recognition of handwritten Bangla basic characters based on HMM and MLP classifiers. In: Proceedings of 6th international conference on advances in pattern recognition, pp 101–106
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Patt Anal Mach Intell 16:550–554
Khosravi H, Kabir E (2007) Introducing a very large dataset of handwritten Farsi digits and a study on their varieties. Pattern Recognit Lett 28:1133–1141
Al-Maadeed S, Elliman D, Higgins CA (2002) A database for Arabic handwritten text recognition research. In: Proceedings of the eighth international workshop on frontiers in handwriting recognition, p 485
Su T, Zhang T, Guan D (2007) Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. Int J Doc Anal Recognit 10:27–38
Saito T, Yamada H, Yamamoto K (1985) On the database ELT9 of handprinted characters in JIS Chinese characters and its analysis (in Japanese). Trans IECEJ 68-D(4):757–764
Al-Ohali Y, Cheriet M, Suen C (2003) Databases for recognition of handwritten Arabic cheques. Pattern Recognit 36:111–121
Noumi T, Matsui T, Yamashita I, Wakahara T, Tsutsumida T (1994) Tegaki Suji database ‘IPTP CD-ROM1’ no ichi bunseki (in Japanese). In: 1994 autumn meeting of IEICE, vol D-309, September 1994
Bhattacharya U, Shridhar M, Parui SK (2006) On recognition of handwritten Bangla characters. In: Proceedings of 5th Indian conference on computer vision, graphics and image processing, pp 817–828
Chaudhuri BB, Ghosh S (1998) A statistical study of Bangla corpus, recognition. In: Proceedings of international conference on computational linguistics, speech and document processing, Calcutta, India, pp C32–C37, February 1998
Bhattacharya U, Shaw B, Parui SK (2006) Analysis of error sources towards improved form processing. In: Proceedings of the 9th Int. international conference on information technology (ICIT 2006), pp 137–138
Bulacu M, Schomaker L (2007) Text-independent writer identification and verification using textural and allographic features. IEEE Trans Pattern Anal Mach Intell 29(4):701–717
Otsu N (1979) A threshold selection method from grey-level histograms. IEEE Trans Syst Man Cybern 9:377–393
Liu C-L, Nakashima K, Sako H, Fujisawa H (2003) Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognit 36:2271–2285
Freeman H (1974) Computer processing of line-drawing images. ACM Comput Surv 6:57–97
Kimura F, Takashina K, Tsuruoka S, Miyake Y (1987) Modified quadratic discriminant functions and the application to Chinese character recognition. IEEE Trans Pattern Anal Mach Intell 9(1):149–153
Cao J, Shridhar M, Kimura F, Ahmadi M (1992) Statistical and neural classification of handwritten numerals: a comparative study. In: Proceedings of 11th international conference on pattern recognition, vol II, pp 643–646
Duda RO, Hart PE (1973) Pattern classification and scence analysis. Wesley, New York, p 67
Duin RPW, Krose BJ (1980) On the possibility of avoiding peaking. In: Proceedings of 5th international conference pattern recognition, Miami, FL, pp 1375–1378
Noumi T, Matsui T, Yamashita I, Wakahara T, Tsutsumida T (1994) Result of the second IPTP character recognition competition and studies on multi-expert handwritten numeral recognition. In: Proceedings international workshop on frontiers in handwriting recognition, pp 338–346
Acknowledgments
The authors would like to acknowledge the support of Bikash Shaw, Suman K. Ghosh and Saikat Das of the Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, towards the development of the database described in the present article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhattacharya, U., Shridhar, M., Parui, S.K. et al. Offline recognition of handwritten Bangla characters: an efficient two-stage approach. Pattern Anal Applic 15, 445–458 (2012). https://doi.org/10.1007/s10044-012-0278-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-012-0278-6