Skip to main content
Log in

A benchmark image database of isolated Bangla handwritten compound characters

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In the present work, we present a benchmark image database of isolated handwritten Bangla compound characters, used in the standard Bangla literature. A thorough survey over more than 2 million Bangla words has revealed that there exist around 334 compound characters in Bangla script. Of which, only around 171 character classes form unique pattern shapes, and some of these classes are often written in multiple styles. Altogether, 55,278 isolated character images, belonging to 199 different pattern shapes, are collected using three different data collection modalities. The database is divided into training and test sets in 4:1 ratio for each pattern class, by considering a balanced distribution of shapes from different modalities. A convex hull and quadtree-based feature set has been designed, and the test set recognition performance is reported with the support vector machine classifier. We have achieved a recognition accuracy of 79.35 % on the test database consisting of 171 character classes. The complete compound character image database is freely available as CMATERdb 3.1.3.3 from the website http://code.google.com/p/cmaterdb/, which may facilitate research on handwritten character recognition, especially related to Bangla form document processing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Fujisawa, H.: Forty years of research in character and document recognition—an industrial perspective. Pattern Recognit. 41(8), 2435–2446 (2008)

    Article  Google Scholar 

  2. Cheriet, M., El Yacoubi, M., Fujisawa, H., Lopresti, D., Lorette, G.: Handwriting recognition research: twenty years of achievement \(\cdots \) and beyond. Pattern Recognit. 42(12), 3131–3135 (2009)

    Article  Google Scholar 

  3. Su, T.-H., Zhang, T.-W., Guan, D.-J., Huang, H.-J.: Off-line recognition of realistic chinese handwriting using segmentation-free strategy. Pattern Recognit. 42(1), 167–182 (2009)

    Article  MATH  Google Scholar 

  4. Srihari, S., Yang, X., Ball, G.: Offline chinese handwriting recognition: an assessment of current technology. Front. Comput. Sci. China 1(2), 137–155 (2007)

    Article  Google Scholar 

  5. Kimura, F.: OCR Technologies for machine printed and hand printed Japanese text. In: Chaudhuri, B.B. (ed.) Digital document processing. Advances in pattern recognition, pp. 49–71. Springer, London (2007)

    Chapter  Google Scholar 

  6. Kwon, J.-O., Sin, B., Kim, J.H.: Recognition of on-line cursive korean characters combining statistical and structural methods. Pattern Recognit. 30(8), 1255–1263 (1997)

    Article  Google Scholar 

  7. Kim, H.J., Kim, P.K.: Recognition of off-line handwritten korean characters. Pattern Recognit. 29(2), 245–254 (1996)

    Article  Google Scholar 

  8. Amin, A.: Off line Arabic character recognition: a survey. In: The fourth international conference on document analysis and recognition, pp. 596–599 (1997)

  9. Pal, U., Chaudhuri, B.B.: Indian script character recognition: a survey. Pattern Recognit. 37(9), 1887–1899 (2004)

    Article  Google Scholar 

  10. Pal, U., Jayadevan, R., Sharma, N.: Handwriting recognition in indian regional scripts: a survey of offline techniques. ACM Trans. Asian Lang. Inf. Process. 11(1), 1–35 (2012)

    Article  Google Scholar 

  11. Arya, D., Jawahar, C., Bhagvati, C., Patnaik, T., Chaudhuri, B., Lehal, G., Chaudhury, S., Ramakrishna, A.: Experiences of integration and performance testing of multilingual OCR for printed Indian scripts. In: Proceedings of the 2011 joint workshop on multilingual OCR and analytics for noisy unstructured text data, p. 9. ACM (2011)

  12. Pal, U., Wakabayashi, T., Kimura, F.: Comparative study of Devnagari handwritten character recognition using different feature and classifiers. In: 10th international conference on document analysis and recognition (ICDAR ’09.), pp. 1111–1115 (2009)

  13. Jagadeesh Kannan, R., Prabhakar, R.: A comparative study of optical character recognition for tamil script. Eur. J. Sci. Res. 35(4), 570–582 (2009)

    Google Scholar 

  14. Pal, U., Wakabayashi, T., Kimura, F.: A system for off-line Oriya handwritten character recognition using curvature feature. In: 10th international conference on information technology (ICIT 2007), pp. 227–229 (2007)

  15. Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: A hierarchical approach to recognition of handwritten bangla characters. Pattern Recognit. 42(7), 1467–1484 (2009)

    Article  MATH  Google Scholar 

  16. Pal, U., Wakabayashi, T., Kimura, F.: Handwritten Bangla compound character recognition using gradient feature. In: 10\(^{th}\) international conference on information technology-07, pp. 208–213 (2007)

  17. Roy, K., Pal, U., Kimura, F.: Bangla handwritten character recognition. In: Prasad, B. (ed.) 2\(^{nd}\) Indian international conference on artificial intelligence, pp. 431–443. Pune, India (2005)

  18. Bhattacharya, U., Parui, S.K., Shridhar, M., Kimura, F.: Two-stage recognition of handwritten Bangla alphanumeric characters using neural classifiers. In: Prasad, B. (ed.) 2\(^{nd}\) Indian international conference on artificial intelligence, pp. 1357–1376. Pune, India (2005)

  19. Bhowmik, T., Bhattacharya, U., Parui, S.: Recognition of bangla handwritten characters using an mlp classifier based on stroke features. In: Pal, N., Kasabov, N., Mudi, R., Pal, S., Parui, S. (eds.) Neural Inf. Process. Lecture notes in computer science, vol. 3316, pp. 814–819. Springer, Berlin (2004)

    Chapter  Google Scholar 

  20. Chaudhuri, B.B., Pal, U.: A complete printed bangla ocr system. Pattern Recognit. 31(5), 531–549 (1998)

    Article  Google Scholar 

  21. Bhowmik, T., Ghanty, P., Roy, A., Parui, S.: Svm-based hierarchical architectures for handwritten bangla character recognition. Int. J. Doc. Anal. Recognit. 12(2), 97–108 (2009)

    Article  Google Scholar 

  22. Das, N., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl. Soft Comput. 12(5), 1592–1606 (2012)

    Article  Google Scholar 

  23. Das, N., Pramanik, S., Basu, S., Saha, P.K., Sarkar, R., Kundu, M., Nasipuri, M.: Recognition of handwritten Bangla basic characters and digits using convex hull based feature set. In: Dimitrios A. Karras, Z.M., Etienne E. Kerre, Chunping Li (eds.) International conference on artificial intelligence and pattern recognition, Orlando, Florida, USA, pp. 380–386. ISRST (2009)

  24. http://censusindia.gov.in/Census_Data_2001/Census_Data_Online/Language/Statement1.htm. Accessed 22nd July 2011

  25. http://en.wikipedia.org/wiki/Bengali_language. Accessed 22nd July 2011

  26. Das, N., Das, B., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: Handwritten bangla basic and compound character recognition using mlp and svm classifier. J. Comput. 2(2), 109–115 (2010)

    Google Scholar 

  27. http://en.wikipedia.org/wiki/Paschimbanga_Bangla_Akademi. Accessed 22nd July 2011

  28. Sarkar, P., Mukhopadhay, A., DasGupta, P.: Akaademi Bannan Abhidhan. In: Chakrabarty, N., Ghosh, S., Sarkar, P., Chaki, J., Das, N., Mukhopadhay, A., Bhattachajee, S., Amitava, C., Mukhopadhay, A., Bhattacharjee, S., Das, P., Chattopadhay, S., Basu, A., Mandal, S. (eds.). Akademi Bannan Abhidhan, p. 582. Pachimbanga Bangla Akaademi, Kolkata (2008)

  29. Wilkinson, R.A., Geist, J., Janet, S., Grother, P.J., Burges, C.J.C., Creecy, R., Hammond, B., Hull, J.J., Larsen, N.J., Vogl, T.P., Wilson, C.L.: In: The first census optical character recognition system conference. p. 372 (1992)

  30. Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)

    Article  MATH  Google Scholar 

  31. Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)

  32. MNIST Dataset. http://yann.lecun.com/exdb/mnist. Accessed 29th July 2011

  33. OCR Database. http://ai.stanford.edu/~btaskar/ocr/ (2011). Accessed 22nd July 2011

  34. Honggang, Z., Jun, G., Guang, C., Chunguang, L.: HCL2000 - A large-scale handwritten Chinese character database for handwritten character recognition. In: ICDAR ’09., pp. 286–290 (2009)

  35. Abdleazeem, S., El-Sherif, E.: Arabic handwritten digit recognition. Int. J. Doc. Anal. Recognit. 11(3), 127–141 (2008)

    Article  Google Scholar 

  36. Khosravi, H., Kabir, E.: Introducing a very large dataset of handwritten farsi digits and a study on their varieties. Pattern Recognit. Lett. 28(10), 1133–1141 (2007)

    Article  Google Scholar 

  37. Mozaffari, S., Faez, K., Faradji, F., Ziaratban, M. A., Golzan, S.M.: A comprehensive isolated Farsi/Arabic character database for handwritten OCR research. In: Tenth international workshop on frontiers in handwriting recognition, La Baule (France), pp. 385–389 (2006)

  38. Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten Arabic cheques. Pattern Recognit. 36(1), 111–121 (2003)

    Article  MATH  Google Scholar 

  39. Kavallieratou, E., Liolios, N., Koutsogeorgos, E., Fakotakis, N., Kokkinakis, G.: The GRUHD database of Greek unconstrained handwriting. In: Sixth international conference on document analysis and recognition, pp. 561–565 (2001)

  40. Viard-Gaudin, C., Lallican, P.M., Knerr, S., Binter, P.: The IRESTE On/Off (IRONOFF) dual handwriting database. In: Fifth international conference on document analysis and recognition (ICDAR ’99.), pp. 455–458 (1999)

  41. Kim, D.-H., Hwang, Y.-S., Park, S.-T., Kim, E.-J., Paek, S.-H., Bang, S.-y.: Handwritten Korean Character Image Database PE92. In. IEICE transactions on information and systems, pp. 943–950 (1996)

  42. Noumi, T., Matsui, T., Yamashita, I., Wakahara, T., Tsutsumida, T.: Tegaki Suji database ’IPTP CD-ROM1’ no ichi bunseki (in Japanese). Autumn Meeting of IEICE D-309 (1994)

  43. Yamada, H., Yamamoto, K., Saito, T.: A nonlinear normalization method for handprinted kanji character recognition-line density equalization. Pattern Recognit. 23(9), 1023–1029 (1990)

    Article  Google Scholar 

  44. Liu, Y., Tai, J., Liu, J.: An introduction to the 4 million handwriting Chinese character samples library. In: International conference on Chinese computing and orient language processing, Changsa, China, pp. 94–97 (1989)

  45. Saito, T., Yamada, H., Yamamoto, K.: On the Database ELT9 of Handprinted Characters in JIS Chinese Characters and Its Analysis (in Japanese). Trans. IECEJ J.68-D(4), 757–764 (1985)

  46. Mori, S., Yamamoto, K., Yamada, H., Saito, T.: On a handprinted kyoiku-kanji character data base. Bull. Electrotech. Lab. 43(11–12), 752–773 (1979)

    Google Scholar 

  47. http://www.hpl.hp.com/india/research/penhw-interfaces-1linguistics.html. (2011). Accessed 22nd July 2011

  48. http://code.google.com/p/hit-mw-database/wiki/HomePage. (2011). Accessed 22nd July 2011

  49. http://users.iit.demokritos.gr/~bgat/HandSegmCont2009/. (2011). Accessed 22nd July 2011

  50. Bhattacharya, U.: Handwritten character databases of indic scripts. http://www.isical.ac.in/~ujjwal/download/database.html (2011). Accessed 22nd July 2011

  51. Bhattacharya, U., Chaudhuri, B.B.: Handwritten numeral databases of indian scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 444–457 (2009)

    Article  Google Scholar 

  52. Bhattacharya, U., Shridhar, M., Parui, S.K., Sen, P.K., Chaudhuri, B.B.: Offline recognition of handwritten bangla characters: an efficient two-stage approach. Pattern Anal. Appl. 15(4), 445–458 (2012)

    Article  MathSciNet  Google Scholar 

  53. Das, N., Reddy, J.M., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: A statistical-topological feature combination for recognition of handwritten numerals. Appl. Soft Comput. 12(8), 2486–2495 (2012)

    Article  Google Scholar 

  54. Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.: Cmaterdb1: a database of unconstrained handwritten Bangla and Bangla-english mixed script document image. Int. J. Doc. Anal. Recognit. 15(1), 71–83 (2012)

    Article  Google Scholar 

  55. Chattopadhyay, S.K.: Bangla Bhasatattver Bhumika. Calcutta University Press, Kolkata (1974)

    Google Scholar 

  56. Sproat, R.: A formal computational analysis of indic scripts. In: International symposium on indic scripts: past and future, Tokyo (2003)

  57. Consortium, U.: The unicode standard, Version 6.1—core specification. The Unicode Consortium, Mountain View, CA, 2012. In. ISBN 978-1-936213-02-3. URL http://www.unicode.org/versions/Unicode6.1.0

  58. MSVS, B.R., Vardhan, V., GA, N., Reddy, P.: A noval security model for indic scripts-a case study on Telugu. Int. J. Comput. Sci. Secur. (IJCSS) 3(4), 303

  59. Das, N.S.: Modern Bengali script: an introduction. Dakhabharati, Kolkata (2010)

    Google Scholar 

  60. AnandaBazar Patrika. http://www.anandabazar.in/ (2012). Accessed 10th March 2012

  61. AajKaal. http://www.aajkaal.net (2011). Accessed 29th July 2011

  62. Bartaman. http://bartamanpatrika.com (2011). Accessed 29th July 2011

  63. Anandamela, Desh. http://my.anandabazar.com/content/magazines (2011). Accessed 29th July 2011

  64. Sarat Rachanabali. http://www.sarat-rachanabali.nltr.org/ (2012). Accessed 3rd March 2012

  65. Bangla Documets. http://banglalibrary.evergreenbangla.com/ (2012). Accessed 4th March 2012

  66. Newspapers from Bangladesh. http://new.ittefaq.com.bd/, http://www.prothom-alo.com/ (2012). Accessed 4th March 2012

  67. CMATER Handwritten Character Database. http://code.google.com/p/cmaterdb/ (2011). Accessed 1st Aug 2011

  68. Bhattacharya, U., Shridhar, M., Parui, S.: On recognition of handwritten Bangla characters. In: Kalra, P., Peleg, S. (eds.) Computer vision, graphics and image processing. Lecture notes in computer science, pp. 817–828. Springer, Berlin (2006)

  69. Rahman, A.F.R., Rahman, R., Fairhurst, M.C.: Recognition of handwritten bengali characters: a novel multistage approach. Pattern Recognit. 35(5), 997–1006 (2002)

    Article  MATH  Google Scholar 

  70. Wen, Y., Lu, Y., Shi, P.: Handwritten bangla numeral recognition system and its application to postal automation. Pattern Recognit. 40(1), 99–107 (2007)

    Article  MATH  Google Scholar 

  71. Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Kumar Basu, D.: A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recognit. 43(10), 3507–3521 (2010)

  72. Das, N., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: Handwritten Bangla compound character recognition: potential challenges and probable solution. In: Prasad, B., Lingras, P., Ram, A. (eds.) 4th Indian international conference on artificial intelligence, Bangalore, pp. 1901–1913 (2009)

  73. Chaudhuri, B.B., Pal, U.: Relational studies between phoneme and grapheme statistics in current bangla. J. Acoust. Soc. India 23, 67–77 (1995)

    Google Scholar 

  74. Pal, U., Chaudhury, B.B.: Character occurrence statistics in Bangla language and recognition of Bangla printed script. In: ICAPRDT, Kolkata, pp. 52–59 (1993)

  75. Das, N., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: An Improved feature descriptor for recognition of handwritten Bangla alphabet. In: Guru, D.S., Vasudev, T. (eds.) International conference on signal and image processing, Mysore, India, pp. 451–454. Excel India Publishers (2009)

  76. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)

    Article  Google Scholar 

  77. Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.: Recognition of numeric postal codes from multi-script postal address blocks. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S. (eds.) Pattern Recognit. Mach. Intell. Lecture notes in computer science, vol. 5909, pp. 381–386. Springer, Berlin (2009)

  78. Marlow, B.K., Batchelor, B.G.: Improving the speed of convex hull calculations. Electron. Lett. 16(9), 319–321 (1980)

    Article  Google Scholar 

  79. Chang, C.-C., Lin, C.-J.: Libsvm : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)

    Article  Google Scholar 

  80. Das, N., Mandal, B., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M.: An SVM-MLP classifier combination scheme for recognition of handwritten Bangla digits. In: Kale, K.V., Malhrota, S.C., Manza, R.R. (eds.) 2nd International conference on advances in computer vision and information technology, Aurangabad, India, pp. 615–623. I. K. International Publishing House Pvt. Ltd. (2009)

  81. Basu, S., Chaudhuri, C., Kundu, M., Nasipuri, M., Basu, D.: A two-pass approach to pattern classification neural information processing. In: Pal, N., Kasabov, N., Mudi, R., Pal, S., Parui, S. (eds.), vol. 3316. Lecture notes in computer science, pp. 781–786. Springer, Berlin (2004)

  82. El Abed, H., Märgner, V., Blumenstein, M.: international conference on frontiers in handwriting recognition (ICFHR 2010)—competitions overview. In: 12th international conference on frontiers in handwriting recognition pp. 703–708 (2010)

Download references

Acknowledgments

Authors are thankful to the “Center for Microprocessor Application for Training Education and Research”, “Project on Storage Retrieval and Understanding of Video for Multimedia” of Computer Science & Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The work reported here has been partially funded by DST, Govt. of India, PURSE (Promotion of University Research and Scientific Excellence) Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mita Nasipuri.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4340 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, N., Acharya, K., Sarkar, R. et al. A benchmark image database of isolated Bangla handwritten compound characters. IJDAR 17, 413–431 (2014). https://doi.org/10.1007/s10032-014-0222-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-014-0222-y

Keywords

Navigation