Abstract
In the present work, we present a benchmark image database of isolated handwritten Bangla compound characters, used in the standard Bangla literature. A thorough survey over more than 2 million Bangla words has revealed that there exist around 334 compound characters in Bangla script. Of which, only around 171 character classes form unique pattern shapes, and some of these classes are often written in multiple styles. Altogether, 55,278 isolated character images, belonging to 199 different pattern shapes, are collected using three different data collection modalities. The database is divided into training and test sets in 4:1 ratio for each pattern class, by considering a balanced distribution of shapes from different modalities. A convex hull and quadtree-based feature set has been designed, and the test set recognition performance is reported with the support vector machine classifier. We have achieved a recognition accuracy of 79.35 % on the test database consisting of 171 character classes. The complete compound character image database is freely available as CMATERdb 3.1.3.3 from the website http://code.google.com/p/cmaterdb/, which may facilitate research on handwritten character recognition, especially related to Bangla form document processing systems.
Similar content being viewed by others
References
Fujisawa, H.: Forty years of research in character and document recognition—an industrial perspective. Pattern Recognit. 41(8), 2435–2446 (2008)
Cheriet, M., El Yacoubi, M., Fujisawa, H., Lopresti, D., Lorette, G.: Handwriting recognition research: twenty years of achievement \(\cdots \) and beyond. Pattern Recognit. 42(12), 3131–3135 (2009)
Su, T.-H., Zhang, T.-W., Guan, D.-J., Huang, H.-J.: Off-line recognition of realistic chinese handwriting using segmentation-free strategy. Pattern Recognit. 42(1), 167–182 (2009)
Srihari, S., Yang, X., Ball, G.: Offline chinese handwriting recognition: an assessment of current technology. Front. Comput. Sci. China 1(2), 137–155 (2007)
Kimura, F.: OCR Technologies for machine printed and hand printed Japanese text. In: Chaudhuri, B.B. (ed.) Digital document processing. Advances in pattern recognition, pp. 49–71. Springer, London (2007)
Kwon, J.-O., Sin, B., Kim, J.H.: Recognition of on-line cursive korean characters combining statistical and structural methods. Pattern Recognit. 30(8), 1255–1263 (1997)
Kim, H.J., Kim, P.K.: Recognition of off-line handwritten korean characters. Pattern Recognit. 29(2), 245–254 (1996)
Amin, A.: Off line Arabic character recognition: a survey. In: The fourth international conference on document analysis and recognition, pp. 596–599 (1997)
Pal, U., Chaudhuri, B.B.: Indian script character recognition: a survey. Pattern Recognit. 37(9), 1887–1899 (2004)
Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. 11(1), 1–35 (2012)
Arya, D., Jawahar, C., Bhagvati, C., Patnaik, T., Chaudhuri, B., Lehal, G., Chaudhury, S., Ramakrishna, A.: Experiences of integration and performance testing of multilingual OCR for printed Indian scripts. In: Proceedings of the 2011 joint workshop on multilingual OCR and analytics for noisy unstructured text data, p. 9. ACM (2011)
Pal, U., Wakabayashi, T., Kimura, F.: Comparative study of Devnagari handwritten character recognition using different feature and classifiers. In: 10th international conference on document analysis and recognition (ICDAR ’09.), pp. 1111–1115 (2009)
Jagadeesh Kannan, R., Prabhakar, R.: A comparative study of optical character recognition for tamil script. Eur. J. Sci. Res. 35(4), 570–582 (2009)
Pal, U., Wakabayashi, T., Kimura, F.: A system for off-line Oriya handwritten character recognition using curvature feature. In: 10th international conference on information technology (ICIT 2007), pp. 227–229 (2007)
Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: A hierarchical approach to recognition of handwritten bangla characters. Pattern Recognit. 42(7), 1467–1484 (2009)
Pal, U., Wakabayashi, T., Kimura, F.: Handwritten Bangla compound character recognition using gradient feature. In: 10\(^{th}\) international conference on information technology-07, pp. 208–213 (2007)
Roy, K., Pal, U., Kimura, F.: Bangla handwritten character recognition. In: Prasad, B. (ed.) 2\(^{nd}\) Indian international conference on artificial intelligence, pp. 431–443. Pune, India (2005)
Bhattacharya, U., Parui, S.K., Shridhar, M., Kimura, F.: Two-stage recognition of handwritten Bangla alphanumeric characters using neural classifiers. In: Prasad, B. (ed.) 2\(^{nd}\) Indian international conference on artificial intelligence, pp. 1357–1376. Pune, India (2005)
Bhowmik, T., Bhattacharya, U., Parui, S.: Recognition of bangla handwritten characters using an mlp classifier based on stroke features. In: Pal, N., Kasabov, N., Mudi, R., Pal, S., Parui, S. (eds.) Neural Inf. Process. Lecture notes in computer science, vol. 3316, pp. 814–819. Springer, Berlin (2004)
Chaudhuri, B.B., Pal, U.: A complete printed bangla ocr system. Pattern Recognit. 31(5), 531–549 (1998)
Bhowmik, T., Ghanty, P., Roy, A., Parui, S.: Svm-based hierarchical architectures for handwritten bangla character recognition. Int. J. Doc. Anal. Recognit. 12(2), 97–108 (2009)
Das, N., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl. Soft Comput. 12(5), 1592–1606 (2012)
Das, N., Pramanik, S., Basu, S., Saha, P.K., Sarkar, R., Kundu, M., Nasipuri, M.: Recognition of handwritten Bangla basic characters and digits using convex hull based feature set. In: Dimitrios A. Karras, Z.M., Etienne E. Kerre, Chunping Li (eds.) International conference on artificial intelligence and pattern recognition, Orlando, Florida, USA, pp. 380–386. ISRST (2009)
http://censusindia.gov.in/Census_Data_2001/Census_Data_Online/Language/Statement1.htm. Accessed 22nd July 2011
http://en.wikipedia.org/wiki/Bengali_language. Accessed 22nd July 2011
Das, N., Das, B., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: Handwritten bangla basic and compound character recognition using mlp and svm classifier. J. Comput. 2(2), 109–115 (2010)
http://en.wikipedia.org/wiki/Paschimbanga_Bangla_Akademi. Accessed 22nd July 2011
Sarkar, P., Mukhopadhay, A., DasGupta, P.: Akaademi Bannan Abhidhan. In: Chakrabarty, N., Ghosh, S., Sarkar, P., Chaki, J., Das, N., Mukhopadhay, A., Bhattachajee, S., Amitava, C., Mukhopadhay, A., Bhattacharjee, S., Das, P., Chattopadhay, S., Basu, A., Mandal, S. (eds.). Akademi Bannan Abhidhan, p. 582. Pachimbanga Bangla Akaademi, Kolkata (2008)
Wilkinson, R.A., Geist, J., Janet, S., Grother, P.J., Burges, C.J.C., Creecy, R., Hammond, B., Hull, J.J., Larsen, N.J., Vogl, T.P., Wilson, C.L.: In: The first census optical character recognition system conference. p. 372 (1992)
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)
Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)
MNIST Dataset. http://yann.lecun.com/exdb/mnist. Accessed 29th July 2011
OCR Database. http://ai.stanford.edu/~btaskar/ocr/ (2011). Accessed 22nd July 2011
Honggang, Z., Jun, G., Guang, C., Chunguang, L.: HCL2000 - A large-scale handwritten Chinese character database for handwritten character recognition. In: ICDAR ’09., pp. 286–290 (2009)
Abdleazeem, S., El-Sherif, E.: Arabic handwritten digit recognition. Int. J. Doc. Anal. Recognit. 11(3), 127–141 (2008)
Khosravi, H., Kabir, E.: Introducing a very large dataset of handwritten farsi digits and a study on their varieties. Pattern Recognit. Lett. 28(10), 1133–1141 (2007)
Mozaffari, S., Faez, K., Faradji, F., Ziaratban, M. A., Golzan, S.M.: A comprehensive isolated Farsi/Arabic character database for handwritten OCR research. In: Tenth international workshop on frontiers in handwriting recognition, La Baule (France), pp. 385–389 (2006)
Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten Arabic cheques. Pattern Recognit. 36(1), 111–121 (2003)
Kavallieratou, E., Liolios, N., Koutsogeorgos, E., Fakotakis, N., Kokkinakis, G.: The GRUHD database of Greek unconstrained handwriting. In: Sixth international conference on document analysis and recognition, pp. 561–565 (2001)
Viard-Gaudin, C., Lallican, P.M., Knerr, S., Binter, P.: The IRESTE On/Off (IRONOFF) dual handwriting database. In: Fifth international conference on document analysis and recognition (ICDAR ’99.), pp. 455–458 (1999)
Kim, D.-H., Hwang, Y.-S., Park, S.-T., Kim, E.-J., Paek, S.-H., Bang, S.-y.: Handwritten Korean Character Image Database PE92. In. IEICE transactions on information and systems, pp. 943–950 (1996)
Noumi, T., Matsui, T., Yamashita, I., Wakahara, T., Tsutsumida, T.: Tegaki Suji database ’IPTP CD-ROM1’ no ichi bunseki (in Japanese). Autumn Meeting of IEICE D-309 (1994)
Yamada, H., Yamamoto, K., Saito, T.: A nonlinear normalization method for handprinted kanji character recognition-line density equalization. Pattern Recognit. 23(9), 1023–1029 (1990)
Liu, Y., Tai, J., Liu, J.: An introduction to the 4 million handwriting Chinese character samples library. In: International conference on Chinese computing and orient language processing, Changsa, China, pp. 94–97 (1989)
Saito, T., Yamada, H., Yamamoto, K.: On the Database ELT9 of Handprinted Characters in JIS Chinese Characters and Its Analysis (in Japanese). Trans. IECEJ J.68-D(4), 757–764 (1985)
Mori, S., Yamamoto, K., Yamada, H., Saito, T.: On a handprinted kyoiku-kanji character data base. Bull. Electrotech. Lab. 43(11–12), 752–773 (1979)
http://www.hpl.hp.com/india/research/penhw-interfaces-1linguistics.html. (2011). Accessed 22nd July 2011
http://code.google.com/p/hit-mw-database/wiki/HomePage. (2011). Accessed 22nd July 2011
http://users.iit.demokritos.gr/~bgat/HandSegmCont2009/. (2011). Accessed 22nd July 2011
Bhattacharya, U.: Handwritten character databases of indic scripts. http://www.isical.ac.in/~ujjwal/download/database.html (2011). Accessed 22nd July 2011
Bhattacharya, U., Chaudhuri, B.B.: Handwritten numeral databases of indian scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 444–457 (2009)
Bhattacharya, U., Shridhar, M., Parui, S.K., Sen, P.K., Chaudhuri, B.B.: Offline recognition of handwritten bangla characters: an efficient two-stage approach. Pattern Anal. Appl. 15(4), 445–458 (2012)
Das, N., Reddy, J.M., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: A statistical-topological feature combination for recognition of handwritten numerals. Appl. Soft Comput. 12(8), 2486–2495 (2012)
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.: Cmaterdb1: a database of unconstrained handwritten Bangla and Bangla-english mixed script document image. Int. J. Doc. Anal. Recognit. 15(1), 71–83 (2012)
Chattopadhyay, S.K.: Bangla Bhasatattver Bhumika. Calcutta University Press, Kolkata (1974)
Sproat, R.: A formal computational analysis of indic scripts. In: International symposium on indic scripts: past and future, Tokyo (2003)
Consortium, U.: The unicode standard, Version 6.1—core specification. The Unicode Consortium, Mountain View, CA, 2012. In. ISBN 978-1-936213-02-3. URL http://www.unicode.org/versions/Unicode6.1.0
MSVS, B.R., Vardhan, V., GA, N., Reddy, P.: A noval security model for indic scripts-a case study on Telugu. Int. J. Comput. Sci. Secur. (IJCSS) 3(4), 303
Das, N.S.: Modern Bengali script: an introduction. Dakhabharati, Kolkata (2010)
AnandaBazar Patrika. http://www.anandabazar.in/ (2012). Accessed 10th March 2012
AajKaal. http://www.aajkaal.net (2011). Accessed 29th July 2011
Bartaman. http://bartamanpatrika.com (2011). Accessed 29th July 2011
Anandamela, Desh. http://my.anandabazar.com/content/magazines (2011). Accessed 29th July 2011
Sarat Rachanabali. http://www.sarat-rachanabali.nltr.org/ (2012). Accessed 3rd March 2012
Bangla Documets. http://banglalibrary.evergreenbangla.com/ (2012). Accessed 4th March 2012
Newspapers from Bangladesh. http://new.ittefaq.com.bd/, http://www.prothom-alo.com/ (2012). Accessed 4th March 2012
CMATER Handwritten Character Database. http://code.google.com/p/cmaterdb/ (2011). Accessed 1st Aug 2011
Bhattacharya, U., Shridhar, M., Parui, S.: On recognition of handwritten Bangla characters. In: Kalra, P., Peleg, S. (eds.) Computer vision, graphics and image processing. Lecture notes in computer science, pp. 817–828. Springer, Berlin (2006)
Rahman, A.F.R., Rahman, R., Fairhurst, M.C.: Recognition of handwritten bengali characters: a novel multistage approach. Pattern Recognit. 35(5), 997–1006 (2002)
Wen, Y., Lu, Y., Shi, P.: Handwritten bangla numeral recognition system and its application to postal automation. Pattern Recognit. 40(1), 99–107 (2007)
Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Kumar Basu, D.: A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recognit. 43(10), 3507–3521 (2010)
Das, N., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: Handwritten Bangla compound character recognition: potential challenges and probable solution. In: Prasad, B., Lingras, P., Ram, A. (eds.) 4th Indian international conference on artificial intelligence, Bangalore, pp. 1901–1913 (2009)
Chaudhuri, B.B., Pal, U.: Relational studies between phoneme and grapheme statistics in current bangla. J. Acoust. Soc. India 23, 67–77 (1995)
Pal, U., Chaudhury, B.B.: Character occurrence statistics in Bangla language and recognition of Bangla printed script. In: ICAPRDT, Kolkata, pp. 52–59 (1993)
Das, N., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: An Improved feature descriptor for recognition of handwritten Bangla alphabet. In: Guru, D.S., Vasudev, T. (eds.) International conference on signal and image processing, Mysore, India, pp. 451–454. Excel India Publishers (2009)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.: Recognition of numeric postal codes from multi-script postal address blocks. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S. (eds.) Pattern Recognit. Mach. Intell. Lecture notes in computer science, vol. 5909, pp. 381–386. Springer, Berlin (2009)
Marlow, B.K., Batchelor, B.G.: Improving the speed of convex hull calculations. Electron. Lett. 16(9), 319–321 (1980)
Chang, C.-C., Lin, C.-J.: Libsvm : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Das, N., Mandal, B., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M.: An SVM-MLP classifier combination scheme for recognition of handwritten Bangla digits. In: Kale, K.V., Malhrota, S.C., Manza, R.R. (eds.) 2nd International conference on advances in computer vision and information technology, Aurangabad, India, pp. 615–623. I. K. International Publishing House Pvt. Ltd. (2009)
Basu, S., Chaudhuri, C., Kundu, M., Nasipuri, M., Basu, D.: A two-pass approach to pattern classification neural information processing. In: Pal, N., Kasabov, N., Mudi, R., Pal, S., Parui, S. (eds.), vol. 3316. Lecture notes in computer science, pp. 781–786. Springer, Berlin (2004)
El Abed, H., Märgner, V., Blumenstein, M.: international conference on frontiers in handwriting recognition (ICFHR 2010)—competitions overview. In: 12th international conference on frontiers in handwriting recognition pp. 703–708 (2010)
Acknowledgments
Authors are thankful to the “Center for Microprocessor Application for Training Education and Research”, “Project on Storage Retrieval and Understanding of Video for Multimedia” of Computer Science & Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The work reported here has been partially funded by DST, Govt. of India, PURSE (Promotion of University Research and Scientific Excellence) Program.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Das, N., Acharya, K., Sarkar, R. et al. A benchmark image database of isolated Bangla handwritten compound characters. IJDAR 17, 413–431 (2014). https://doi.org/10.1007/s10032-014-0222-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-014-0222-y