Skip to main content
Log in

A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Bangla Optical Character Recognition (OCR) poses a unique challenge due to the presence of hundreds of diverse conjunct characters formed by the combination of two or more letters. In this paper, we propose two novel grapheme representation methods that improve the recognition of these conjunct characters and the overall performance of OCR in Bangla. We have utilized the popular Convolutional Recurrent Neural Network architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Experimental results proved the effectiveness of our novel approach. Furthermore, our synthetically generated training dataset and the test protocols are made available to serve as benchmarks for future Bangla OCR research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://doi.org/10.6084/m9.figshare.20186825.

  2. https://github.com/apurba-nsu-rnd-lab/bangla-ocr-crnn.

  3. https://www.unicode.org/charts/PDF/U0980.pdf.

  4. https://github.com/Belval/TextRecognitionDataGenerator.

References

  1. Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)

    Article  Google Scholar 

  2. Congdon, P.: Bayesian Statistical Modelling. Wiley Series in Probability and Statistics, Wiley (2006). https://doi.org/10.1002/9780470035948

    Book  Google Scholar 

  3. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  5. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  PubMed  Google Scholar 

  6. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)

  7. Feng, X., Yao, H., Zhang, S.: Focal CTC loss for Chinese optical character recognition on unbalanced datasets. Complexity 2019 (2019)

  8. Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:2005.13044 (2020)

  9. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z

    Article  MathSciNet  Google Scholar 

  10. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  PubMed  Google Scholar 

  11. Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: Guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020)

  12. Rifat, M.J.R., Banik, M., Hasan, N., Nahar, J., Rahman, F.: A novel machine annotated balanced Bangla OCR corpus. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) Comput. Vis. Image Process., pp. 149–160. Springer, Singapore (2021)

    Chapter  Google Scholar 

  13. Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Anal. Appl. 16(3), 431–446 (2013)

    Article  MathSciNet  Google Scholar 

  14. Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE International Conference on Image Processing, pp. 2609–2612 (2011). IEEE

  15. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010). IEEE

  16. Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: European Conference on Computer Vision, pp. 497–511 (2014). Springer

  17. Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid hmm maxout models. arXiv preprint arXiv:1310.1811 (2013)

  18. Gordo, A.: Supervised mid-level features for word image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2956–2964 (2015)

  19. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783 (2010). Springer

  20. Mishra, A., Alahari, K., Jawahar, C.: Image retrieval using textual cues. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3040–3047 (2013)

  21. Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633 (2007). IEEE

  22. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)

  23. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  CAS  PubMed  Google Scholar 

  24. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013). IEEE

  25. Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

  26. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  27. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

  28. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019). https://doi.org/10.1109/TPAMI.2018.2848939

    Article  PubMed  Google Scholar 

  29. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5086–5094 (2017). https://doi.org/10.1109/ICCV.2017.543

  30. Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11959–11969 (2020). https://doi.org/10.1109/CVPR42600.2020.01198

  31. Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)

  32. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015). IEEE

  33. Feng, X., Yao, H., Qi, Y., Zhang, J., Zhang, S.: Scene text recognition via transformer. arXiv preprint arXiv:2003.08077 (2020)

  34. Atienza, R.: Vision transformer for fast and efficient scene text recognition. In: Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, vol. 16, pp. 319–334 (2021). Springer

  35. Wu, J., Peng, Y., Zhang, S., Qi, W., Zhang, J.: Masked vision-language transformers for scene text recognition. arXiv preprint arXiv:2211.04785 (2022)

  36. Wang, P., Da, C., Yao, C.: Multi-granularity prediction for scene text recognition. In: Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 339–355 (2022). Springer

  37. Xie, X., Fu, L., Zhang, Z., Wang, Z., Bai, X.: Toward understanding wordart: corner-guided transformer for scene text recognition. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 303–321 (2022). Springer

  38. Aberdam, A., Ganz, R., Mazor, S., Litman, R.: Multimodal semi-supervised learning for text recognition. arXiv preprint arXiv:2205.03873 (2022)

  39. Yang, M., Liao, M., Lu, P., Wang, J., Zhu, S., Luo, H., Tian, Q., Bai, X.: Reading and writing: discriminative and generative modeling for self-supervised text recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4214–4223 (2022)

  40. Chu, X., Wang, Y.: IterVM: iterative vision modeling module for scene text recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1393–1399 (2022). IEEE

  41. Du, Y., Chen, Z., Jia, C., Yin, X., Zheng, T., Li, C., Du, Y., Jiang, Y.-G.: Svtr: scene text recognition with a single visual model. arXiv preprint arXiv:2205.00159 (2022)

  42. Zheng, C., Li, H., Rhee, S.-M., Han, S., Han, J.-J., Wang, P.: Pushing the performance limit of scene text recognizer without human annotation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14116–14125 (2022)

  43. Chammas, E., Mokbel, C., Likforman-Sulem, L.: Handwriting recognition of historical documents with few labeled data. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 43–48 (2018). IEEE

  44. Kišš, M., Hradiš, M., Beneš, K., Buchal, P., Kula, M.: SoftCTC—Semi-Supervised Learning for Text Recognition using Soft Pseudo-labels. arXiv (2022). arXiv:2212.02135

  45. Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogn. 108, 107482 (2020). https://doi.org/10.1016/j.patcog.2020.107482

    Article  Google Scholar 

  46. Maillette de Buy Wenniger, G., Schomaker, L., Way, A.: No padding please: efficient neural handwriting recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 355–362 (2019). https://doi.org/10.1109/ICDAR.2019.00064

  47. Kass, D., Vats, E.: AttentionHTR: handwritten text recognition based on attention encoder–decoder networks. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 507–522. Springer, Cham (2022)

    Chapter  Google Scholar 

  48. Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: Scrabblegan: Semi-supervised varying length handwritten text generation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4323–4332 (2020). https://doi.org/10.1109/CVPR42600.2020.00438

  49. Souibgui, M.A., Fornés, A., Kessentini, Y., Megyesi, B.: Few shots are all you need: a progressive learning approach for low resource handwritten text recognition. Pattern Recogn. Lett. 160, 43–49 (2022). https://doi.org/10.1016/j.patrec.2022.06.003

    Article  ADS  Google Scholar 

  50. Rahman, A., Kaykobad, M.: A complete Bengali OCR: a novel hybrid approach to handwritten Bengali character recognition. J. Comput. Inf. Technol. 6(4), 395–413 (1998)

    Google Scholar 

  51. Pal, U., Chaudhuri, B.B.: OCR in Bangla: an Indo-Bangladeshi language. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3—Conference C: Signal Processing (Cat. No.94CH3440-5), vol. 2, pp. 269–2732 (1994). https://doi.org/10.1109/ICPR.1994.576917

  52. Sattar, M., Rahman, S.: An experimental investigation on Bangla character recognition system. Bangladesh Comput. Soc. J. 4(1), 1–4 (1989)

    Google Scholar 

  53. Rahman, A.F.R., Fairhurst, M.: Multi-prototype classification: improved modelling of the variability of handwritten data using statistical clustering algorithms. Electron. Lett. 33(14), 1208–1210 (1997)

    Article  ADS  Google Scholar 

  54. Pal, U.: On the development of an optical character recognition (OCR) system for printed Bangla script. PhD thesis, Indian Statistical Institute, Calcutta (1997)

  55. Chaudhuri, B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)

    Article  ADS  Google Scholar 

  56. Rahman, A.F.R., Fairhurst, M.C.: A new hybrid approach in combining multiple experts to recognise handwritten numerals. Pattern Recogn. Lett. 18(8), 781–790 (1997)

    Article  ADS  Google Scholar 

  57. Rahman, A.F.R., Rahman, R., Fairhurst, M.C.: Recognition of handwritten Bengali characters: a novel multistage approach. Pattern Recogn. 35(5), 997–1006 (2002)

    Article  ADS  Google Scholar 

  58. Mahmud, J.U., Raihan, M.F., Rahman, C.M.: A complete OCR system for continuous Bengali characters. In: TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, vol. 4, pp. 1372–1376 (2003). IEEE

  59. Kamruzzaman, J., Aziz, S.: Improved machine recognition for Bangla characters. In: International Conference on Electrical and Computer Engineering 2004, pp. 557–560 (2004). ICECE 2004 Conference Secretariat, Bangladesh of Engineering and Technology

  60. Alam, M.M., Kashem, M.A.: A complete Bangla OCR system for printed characters. JCIT 1(01), 30–35 (2010)

    Google Scholar 

  61. Ahmed, S., Kashem, M.A.: Enhancing the character segmentation accuracy of Bangla OCR using BPNN. Int. J. Sci. Res. (IJSR) ISSN (Online), 2319–7064 (2013)

  62. Chowdhury, A.A., Ahmed, E., Ahmed, S., Hossain, S., Rahman, C.M.: Optical character recognition of Bangla characters using neural network: a better approach. In: 2nd ICEE (2002)

  63. Ahmed, S., Sakib, A.N., Ishtiaque Mahmud, M., Belali, H., Rahman, S.: The anatomy of Bangla OCR system for printed texts using back propagation neural network. Glob. J. Comput. Sci. Technol. (2012)

  64. Afroge, S., Ahmed, B., Hossain, A.: Bangla optical character recognition through segmentation using curvature distance and multilayer perceptron algorithm. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 253–257 (2017). IEEE

  65. Hossain, S.A., Tabassum, T.: Neural net based complete character recognition scheme for Bangla printed text books. In: 16th International Conference on Computer and Information Technology, pp. 71–75 (2014). IEEE

  66. Pramanik, R., Bag, S.: Shape decomposition-based handwritten compound character recognition for Bangla OCR. J. Vis. Commun. Image Represent. 50, 123–134 (2018)

    Article  Google Scholar 

  67. Ghosh, R., Vamshi, C., Kumar, P.: RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recogn. 92, 203–218 (2019)

    Article  ADS  Google Scholar 

  68. Purkaystha, B., Datta, T., Islam, M.S.: Bengali handwritten character recognition using deep convolutional neural network. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–5 (2017). IEEE

  69. Islam, M.S., Rahman, M.M., Rahman, M.H., Rivolta, M.W., Aktaruzzaman, M.: Ratnet: a deep learning model for Bengali handwritten characters recognition. Multimed. Tools Appl. 81, 10631–10651 (2022). https://doi.org/10.1007/s11042-022-12070-4

    Article  Google Scholar 

  70. Maity, S., Dey, A., Chowdhury, A., Banerjee, A.: Handwritten Bengali character recognition using deep convolution neural network. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) Machine Learning, Image Processing, Network Security and Data Sciences, pp. 84–92. Springer, Singapore (2020)

    Chapter  Google Scholar 

  71. Roy, A.: AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning (2020)

  72. Sharif, S., Mohammed, N., Momen, S., Mansoor, N.: Classification of Bangla compound characters using a HOG-CNN hybrid model. In: Proceedings of the International Conference on Computing and Communication Systems, pp. 403–411 (2018). Springer

  73. Hasan, M.J., Wahid, M.F., Alom, M.S.: Bangla compound character recognition by combining deep convolutional neural network with bidirectional long short-term memory. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–4 (2019). IEEE

  74. Paul, D., Chaudhuri, B.B.: A BLSTM network for printed Bengali OCR system with high accuracy. arXiv preprint arXiv:1908.08674 (2019)

  75. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010). JMLR Workshop and Conference Proceedings

  76. Rahman, M.A., Tabassum, N., Paul, M., Pal, R., Islam, M.K.: BN-HTRd: A Benchmark Dataset for Document Level Offline Bangla Handwritten Text Recognition (HTR) and Line Segmentation. arXiv (2022). https://doi.org/10.48550/ARXIV.2206.08977. https://arxiv.org/abs/2206.08977

  77. Mridha, M.F., Ohi, A.Q., Ali, M.A., Emon, M.I., Kabir, M.M.: Banglawriting: a multi-purpose offline Bangla handwriting dataset. Data Brief. 34, 106633 (2021). https://doi.org/10.1016/j.dib.2020.106633

    Article  CAS  PubMed  Google Scholar 

  78. Banik, M., Rifat, M.J.R., Nahar, J., Hasan, N., Rahman, F.: Okkhor: a synthetic corpus of Bangla printed characters. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Proceedings of the Future Technologies Conference (FTC) 2020, vol. 1, pp. 693–711. Springer, Cham (2021)

  79. Roark, B., Wolf-Sonkin, L., Kirov, C., Mielke, S.J., Johny, C., Demirsahin, I., Hall, K.: Processing South Asian languages written in the Latin script: the Dakshina dataset. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2413–2423. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.294

  80. Al Mumin, M.A., Shoeb, A.A.M., Selim, M.R., Iqbal, M.Z.: Sumono: a representative modern Bengali corpus. SUST J. Sci. Technol. 21(1), 78–86 (2014)

    Google Scholar 

  81. Biswas, E.: Bangla Largest Newspaper Dataset. Kaggle (2021). https://doi.org/10.34740/KAGGLE/DSV/1857507. https://www.kaggle.com/dsv/1857507

  82. Ahmed, M.F., Mahmud, Z., Biash, Z.T., Ryen, A.A.N., Hossain, A., Ashraf, F.B.: Bangla Online Comments Dataset. Mendeley Data (2021). https://doi.org/10.17632/9xjx8twk8p.1. https://data.mendeley.com/datasets/9xjx8twk8p/1

  83. Farahmand, A., Sarrafzadeh, H., Shanbehzadeh, J.: Document image noises and removal methods (2013)

  84. Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016). https://doi.org/10.1109/CVPR.2016.245

  85. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)

    Article  MathSciNet  Google Scholar 

  86. Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information (2020). https://doi.org/10.3390/info11020125

    Article  Google Scholar 

  87. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE Computer Society, Los Alamitos, CA, USA (2015). https://doi.org/10.1109/ICCV.2015.123

  88. Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017)

  89. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Download references

Acknowledgements

This research was supported in part by the Enhancement of Bangla Language in ICT through Research & Development (EBLICT) Project, under the Ministry of ICT, the Government of Bangladesh.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fuad Rahman.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Results on an external testing partition

To evaluate the generalizability of the models in accordance with the training data, in this section, we conducted a testing experiment on extrinsic data. The data we used is primarily subjected to be used for an OCR project by the Government of Bangladesh. Due to the confidentiality of the assignment, we cannot publish these external testing sets. However, we report the results in Table 13 to stress the performance of the models trained on completely synthetic data. We tested on entirely unseen data that were not seen by the model in training or validation and were accumulated from domains unknown to the model.

Table 13 Performance of the X2 model when tested on external testing sets

The word-level data used for this testing has been mainly comprised of three types of documents. Firstly, the Computer Composed test set consists of data written using computers. Examples of Computer Composed documents can be government notices, letters, etc. Next, we use the Letterpress test set containing data published from press-based printing, where some common examples are books and posters. The third testing set is the Typewriter dataset which consists of data composed using a typewriter device mostly used for legal documents in Bangladesh.

In Fig. 14, we show examples of the data we used for testing. We can observe the natural noise that exists in the samples, making it more challenging for the model to predict correctly. This is mainly because natural noises are difficult to recreate synthetically. In Table 13, we report the results of the evaluation. For this testing, we pick our CRNN-VDS model with VDS Character Representation mainly due to its competitive performance in other reported test sets with real data (Protocol II-V).

Fig. 14
figure 14

Samples of words from Computer Composed, Letterpress and Typewriter data used for extrinsic testing

From Table 13, we can observe that the CRNN-VDS model has achieved a WRR of 79.03% on the computer-composed dataset with around five hundred thousand data which we believe is a significant accomplishment considering that the model has not seen data of this domain and being trained on completely synthetic data. The success in recognition is due to our synthetic data generation process. We have meticulously selected almost all the open-source and popular Bangla fonts and generated images of different lengths. During the training process, we have also added noises that reflect some of the real-world noise such as blur, salt-and-pepper, etc. Figure 13 shows some of the input images with added noise used during training. These images capture the visual diversity present in real-world computer-composed documents and thus are able to train the model well enough to obtain high accuracy on real-world unseen test sets.

The CRNN-VDS model has also shown inadequate performances on the Letterpress and Typewriter testing sets, achieving a WRR of 57.86% and 28.05% respectively. The sub-standard performance is mainly due to the introduction of unique noises in these datasets that are, in some cases, even different in terms of the color schemes of the background. Also, the synthetic data the model was trained on mostly mimicked computer-composed data and not letterpress or typewriter data as the texts from those domains usually have different fonts, paper textures, and noises unique to those domains. Finally, due to this evaluation, we can make estimations of the performance of the synthetically trained CRNN-VDS model in production circumstances.

Appendix B: Algorithm for the extraction methods

figure g
figure h
figure i
figure j

In traditional sequential models, Unicode texts are broken down into single characters and each character is used as a label. Our Novel VDS and ADS methods aim to keep the consonant clusters unbroken by keeping all the participating consonants of a cluster together as its label instead of using multiple labels to represent it. VDS and ADS are rule-based methods that are used to extract labels from a text for training or testing purposes for the CRNN-VDS and CRNN-ADS architectures. Algorithms 1–2 represents the VDS method and 34 represents the ADS method.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, K., Hossain, M.S., Saha, P.K. et al. A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR. IJDAR 27, 73–95 (2024). https://doi.org/10.1007/s10032-023-00446-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-023-00446-7

Keywords

Navigation