A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

Roy, Koushik; Hossain, Md Sazzad; Saha, Pritom Kumar; Rohan, Shadman; Ashrafi, Imranul; Rezwan, Ifty Mohammad; Rahman, Fuad; Hossain, B. M. Mainul; Kabir, Ahmedul; Mohammed, Nabeel

doi:10.1007/s10032-023-00446-7

A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

Original Paper
Published: 05 August 2023

Volume 27, pages 73–95, (2024)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

276 Accesses
Explore all metrics

Abstract

Bangla Optical Character Recognition (OCR) poses a unique challenge due to the presence of hundreds of diverse conjunct characters formed by the combination of two or more letters. In this paper, we propose two novel grapheme representation methods that improve the recognition of these conjunct characters and the overall performance of OCR in Bangla. We have utilized the popular Convolutional Recurrent Neural Network architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Experimental results proved the effectiveness of our novel approach. Furthermore, our synthetically generated training dataset and the test protocols are made available to serve as benchmarks for future Bangla OCR research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 10

Borno: Bangla Handwritten Character Recognition Using a Multiclass Convolutional Neural Network

A Large Multi-target Dataset of Common Bengali Handwritten Graphemes

An Improved Method to Recognize Bengali Handwritten Characters Using CNN

Notes

References

Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)
Article Google Scholar
Congdon, P.: Bayesian Statistical Modelling. Wiley Series in Probability and Statistics, Wiley (2006). https://doi.org/10.1002/9780470035948
Book Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Article PubMed Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Feng, X., Yao, H., Zhang, S.: Focal CTC loss for Chinese optical character recognition on unbalanced datasets. Complexity 2019 (2019)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:2005.13044 (2020)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z
Article MathSciNet Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article PubMed Google Scholar
Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: Guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020)
Rifat, M.J.R., Banik, M., Hasan, N., Nahar, J., Rahman, F.: A novel machine annotated balanced Bangla OCR corpus. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) Comput. Vis. Image Process., pp. 149–160. Springer, Singapore (2021)
Chapter Google Scholar
Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Anal. Appl. 16(3), 431–446 (2013)
Article MathSciNet Google Scholar
Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE International Conference on Image Processing, pp. 2609–2612 (2011). IEEE
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010). IEEE
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: European Conference on Computer Vision, pp. 497–511 (2014). Springer
Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid hmm maxout models. arXiv preprint arXiv:1310.1811 (2013)
Gordo, A.: Supervised mid-level features for word image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2956–2964 (2015)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783 (2010). Springer
Mishra, A., Alahari, K., Jawahar, C.: Image retrieval using textual cues. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3040–3047 (2013)
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633 (2007). IEEE
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article CAS PubMed Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013). IEEE
Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019). https://doi.org/10.1109/TPAMI.2018.2848939
Article PubMed Google Scholar
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5086–5094 (2017). https://doi.org/10.1109/ICCV.2017.543
Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11959–11969 (2020). https://doi.org/10.1109/CVPR42600.2020.01198
Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015). IEEE
Feng, X., Yao, H., Qi, Y., Zhang, J., Zhang, S.: Scene text recognition via transformer. arXiv preprint arXiv:2003.08077 (2020)
Atienza, R.: Vision transformer for fast and efficient scene text recognition. In: Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, vol. 16, pp. 319–334 (2021). Springer
Wu, J., Peng, Y., Zhang, S., Qi, W., Zhang, J.: Masked vision-language transformers for scene text recognition. arXiv preprint arXiv:2211.04785 (2022)
Wang, P., Da, C., Yao, C.: Multi-granularity prediction for scene text recognition. In: Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 339–355 (2022). Springer
Xie, X., Fu, L., Zhang, Z., Wang, Z., Bai, X.: Toward understanding wordart: corner-guided transformer for scene text recognition. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 303–321 (2022). Springer
Aberdam, A., Ganz, R., Mazor, S., Litman, R.: Multimodal semi-supervised learning for text recognition. arXiv preprint arXiv:2205.03873 (2022)
Yang, M., Liao, M., Lu, P., Wang, J., Zhu, S., Luo, H., Tian, Q., Bai, X.: Reading and writing: discriminative and generative modeling for self-supervised text recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4214–4223 (2022)
Chu, X., Wang, Y.: IterVM: iterative vision modeling module for scene text recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1393–1399 (2022). IEEE
Du, Y., Chen, Z., Jia, C., Yin, X., Zheng, T., Li, C., Du, Y., Jiang, Y.-G.: Svtr: scene text recognition with a single visual model. arXiv preprint arXiv:2205.00159 (2022)
Zheng, C., Li, H., Rhee, S.-M., Han, S., Han, J.-J., Wang, P.: Pushing the performance limit of scene text recognizer without human annotation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14116–14125 (2022)
Chammas, E., Mokbel, C., Likforman-Sulem, L.: Handwriting recognition of historical documents with few labeled data. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 43–48 (2018). IEEE
Kišš, M., Hradiš, M., Beneš, K., Buchal, P., Kula, M.: SoftCTC—Semi-Supervised Learning for Text Recognition using Soft Pseudo-labels. arXiv (2022). arXiv:2212.02135
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogn. 108, 107482 (2020). https://doi.org/10.1016/j.patcog.2020.107482
Article Google Scholar
Maillette de Buy Wenniger, G., Schomaker, L., Way, A.: No padding please: efficient neural handwriting recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 355–362 (2019). https://doi.org/10.1109/ICDAR.2019.00064
Kass, D., Vats, E.: AttentionHTR: handwritten text recognition based on attention encoder–decoder networks. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 507–522. Springer, Cham (2022)
Chapter Google Scholar
Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: Scrabblegan: Semi-supervised varying length handwritten text generation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4323–4332 (2020). https://doi.org/10.1109/CVPR42600.2020.00438
Souibgui, M.A., Fornés, A., Kessentini, Y., Megyesi, B.: Few shots are all you need: a progressive learning approach for low resource handwritten text recognition. Pattern Recogn. Lett. 160, 43–49 (2022). https://doi.org/10.1016/j.patrec.2022.06.003
Article ADS Google Scholar
Rahman, A., Kaykobad, M.: A complete Bengali OCR: a novel hybrid approach to handwritten Bengali character recognition. J. Comput. Inf. Technol. 6(4), 395–413 (1998)
Google Scholar
Pal, U., Chaudhuri, B.B.: OCR in Bangla: an Indo-Bangladeshi language. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3—Conference C: Signal Processing (Cat. No.94CH3440-5), vol. 2, pp. 269–2732 (1994). https://doi.org/10.1109/ICPR.1994.576917
Sattar, M., Rahman, S.: An experimental investigation on Bangla character recognition system. Bangladesh Comput. Soc. J. 4(1), 1–4 (1989)
Google Scholar
Rahman, A.F.R., Fairhurst, M.: Multi-prototype classification: improved modelling of the variability of handwritten data using statistical clustering algorithms. Electron. Lett. 33(14), 1208–1210 (1997)
Article ADS Google Scholar
Pal, U.: On the development of an optical character recognition (OCR) system for printed Bangla script. PhD thesis, Indian Statistical Institute, Calcutta (1997)
Chaudhuri, B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)
Article ADS Google Scholar
Rahman, A.F.R., Fairhurst, M.C.: A new hybrid approach in combining multiple experts to recognise handwritten numerals. Pattern Recogn. Lett. 18(8), 781–790 (1997)
Article ADS Google Scholar
Rahman, A.F.R., Rahman, R., Fairhurst, M.C.: Recognition of handwritten Bengali characters: a novel multistage approach. Pattern Recogn. 35(5), 997–1006 (2002)
Article ADS Google Scholar
Mahmud, J.U., Raihan, M.F., Rahman, C.M.: A complete OCR system for continuous Bengali characters. In: TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, vol. 4, pp. 1372–1376 (2003). IEEE
Kamruzzaman, J., Aziz, S.: Improved machine recognition for Bangla characters. In: International Conference on Electrical and Computer Engineering 2004, pp. 557–560 (2004). ICECE 2004 Conference Secretariat, Bangladesh of Engineering and Technology
Alam, M.M., Kashem, M.A.: A complete Bangla OCR system for printed characters. JCIT 1(01), 30–35 (2010)
Google Scholar
Ahmed, S., Kashem, M.A.: Enhancing the character segmentation accuracy of Bangla OCR using BPNN. Int. J. Sci. Res. (IJSR) ISSN (Online), 2319–7064 (2013)
Chowdhury, A.A., Ahmed, E., Ahmed, S., Hossain, S., Rahman, C.M.: Optical character recognition of Bangla characters using neural network: a better approach. In: 2nd ICEE (2002)
Ahmed, S., Sakib, A.N., Ishtiaque Mahmud, M., Belali, H., Rahman, S.: The anatomy of Bangla OCR system for printed texts using back propagation neural network. Glob. J. Comput. Sci. Technol. (2012)
Afroge, S., Ahmed, B., Hossain, A.: Bangla optical character recognition through segmentation using curvature distance and multilayer perceptron algorithm. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 253–257 (2017). IEEE
Hossain, S.A., Tabassum, T.: Neural net based complete character recognition scheme for Bangla printed text books. In: 16th International Conference on Computer and Information Technology, pp. 71–75 (2014). IEEE
Pramanik, R., Bag, S.: Shape decomposition-based handwritten compound character recognition for Bangla OCR. J. Vis. Commun. Image Represent. 50, 123–134 (2018)
Article Google Scholar
Ghosh, R., Vamshi, C., Kumar, P.: RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recogn. 92, 203–218 (2019)
Article ADS Google Scholar
Purkaystha, B., Datta, T., Islam, M.S.: Bengali handwritten character recognition using deep convolutional neural network. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–5 (2017). IEEE
Islam, M.S., Rahman, M.M., Rahman, M.H., Rivolta, M.W., Aktaruzzaman, M.: Ratnet: a deep learning model for Bengali handwritten characters recognition. Multimed. Tools Appl. 81, 10631–10651 (2022). https://doi.org/10.1007/s11042-022-12070-4
Article Google Scholar
Maity, S., Dey, A., Chowdhury, A., Banerjee, A.: Handwritten Bengali character recognition using deep convolution neural network. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) Machine Learning, Image Processing, Network Security and Data Sciences, pp. 84–92. Springer, Singapore (2020)
Chapter Google Scholar
Roy, A.: AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning (2020)
Sharif, S., Mohammed, N., Momen, S., Mansoor, N.: Classification of Bangla compound characters using a HOG-CNN hybrid model. In: Proceedings of the International Conference on Computing and Communication Systems, pp. 403–411 (2018). Springer
Hasan, M.J., Wahid, M.F., Alom, M.S.: Bangla compound character recognition by combining deep convolutional neural network with bidirectional long short-term memory. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–4 (2019). IEEE
Paul, D., Chaudhuri, B.B.: A BLSTM network for printed Bengali OCR system with high accuracy. arXiv preprint arXiv:1908.08674 (2019)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010). JMLR Workshop and Conference Proceedings
Rahman, M.A., Tabassum, N., Paul, M., Pal, R., Islam, M.K.: BN-HTRd: A Benchmark Dataset for Document Level Offline Bangla Handwritten Text Recognition (HTR) and Line Segmentation. arXiv (2022). https://doi.org/10.48550/ARXIV.2206.08977. https://arxiv.org/abs/2206.08977
Mridha, M.F., Ohi, A.Q., Ali, M.A., Emon, M.I., Kabir, M.M.: Banglawriting: a multi-purpose offline Bangla handwriting dataset. Data Brief. 34, 106633 (2021). https://doi.org/10.1016/j.dib.2020.106633
Article CAS PubMed Google Scholar
Banik, M., Rifat, M.J.R., Nahar, J., Hasan, N., Rahman, F.: Okkhor: a synthetic corpus of Bangla printed characters. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Proceedings of the Future Technologies Conference (FTC) 2020, vol. 1, pp. 693–711. Springer, Cham (2021)
Roark, B., Wolf-Sonkin, L., Kirov, C., Mielke, S.J., Johny, C., Demirsahin, I., Hall, K.: Processing South Asian languages written in the Latin script: the Dakshina dataset. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2413–2423. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.294
Al Mumin, M.A., Shoeb, A.A.M., Selim, M.R., Iqbal, M.Z.: Sumono: a representative modern Bengali corpus. SUST J. Sci. Technol. 21(1), 78–86 (2014)
Google Scholar
Biswas, E.: Bangla Largest Newspaper Dataset. Kaggle (2021). https://doi.org/10.34740/KAGGLE/DSV/1857507. https://www.kaggle.com/dsv/1857507
Ahmed, M.F., Mahmud, Z., Biash, Z.T., Ryen, A.A.N., Hossain, A., Ashraf, F.B.: Bangla Online Comments Dataset. Mendeley Data (2021). https://doi.org/10.17632/9xjx8twk8p.1. https://data.mendeley.com/datasets/9xjx8twk8p/1
Farahmand, A., Sarrafzadeh, H., Shanbehzadeh, J.: Document image noises and removal methods (2013)
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016). https://doi.org/10.1109/CVPR.2016.245
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)
Article MathSciNet Google Scholar
Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information (2020). https://doi.org/10.3390/info11020125
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE Computer Society, Los Alamitos, CA, USA (2015). https://doi.org/10.1109/ICCV.2015.123
Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Download references

Acknowledgements

This research was supported in part by the Enhancement of Bangla Language in ICT through Research & Development (EBLICT) Project, under the Ministry of ICT, the Government of Bangladesh.

Author information

Koushik Roy, Md Sazzad Hossain, Pritom Kumar Saha have contributed equally to this work.

Authors and Affiliations

Apurba Technologies, Dhaka, Bangladesh
Md Sazzad Hossain, Pritom Kumar Saha & Fuad Rahman
Apurba-NSU R &D Lab, North South University, Dhaka, 1229, Bangladesh
Koushik Roy, Shadman Rohan, Imranul Ashrafi, Ifty Mohammad Rezwan & Nabeel Mohammed
Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh
B. M. Mainul Hossain & Ahmedul Kabir

Authors

Koushik Roy
View author publications
You can also search for this author in PubMed Google Scholar
Md Sazzad Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Pritom Kumar Saha
View author publications
You can also search for this author in PubMed Google Scholar
Shadman Rohan
View author publications
You can also search for this author in PubMed Google Scholar
Imranul Ashrafi
View author publications
You can also search for this author in PubMed Google Scholar
Ifty Mohammad Rezwan
View author publications
You can also search for this author in PubMed Google Scholar
Fuad Rahman
View author publications
You can also search for this author in PubMed Google Scholar
B. M. Mainul Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Ahmedul Kabir
View author publications
You can also search for this author in PubMed Google Scholar
Nabeel Mohammed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuad Rahman.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Results on an external testing partition

To evaluate the generalizability of the models in accordance with the training data, in this section, we conducted a testing experiment on extrinsic data. The data we used is primarily subjected to be used for an OCR project by the Government of Bangladesh. Due to the confidentiality of the assignment, we cannot publish these external testing sets. However, we report the results in Table 13 to stress the performance of the models trained on completely synthetic data. We tested on entirely unseen data that were not seen by the model in training or validation and were accumulated from domains unknown to the model.

Table 13 Performance of the X2 model when tested on external testing sets

Full size table

The word-level data used for this testing has been mainly comprised of three types of documents. Firstly, the Computer Composed test set consists of data written using computers. Examples of Computer Composed documents can be government notices, letters, etc. Next, we use the Letterpress test set containing data published from press-based printing, where some common examples are books and posters. The third testing set is the Typewriter dataset which consists of data composed using a typewriter device mostly used for legal documents in Bangladesh.

In Fig. 14, we show examples of the data we used for testing. We can observe the natural noise that exists in the samples, making it more challenging for the model to predict correctly. This is mainly because natural noises are difficult to recreate synthetically. In Table 13, we report the results of the evaluation. For this testing, we pick our CRNN-VDS model with VDS Character Representation mainly due to its competitive performance in other reported test sets with real data (Protocol II-V).

From Table 13, we can observe that the CRNN-VDS model has achieved a WRR of 79.03% on the computer-composed dataset with around five hundred thousand data which we believe is a significant accomplishment considering that the model has not seen data of this domain and being trained on completely synthetic data. The success in recognition is due to our synthetic data generation process. We have meticulously selected almost all the open-source and popular Bangla fonts and generated images of different lengths. During the training process, we have also added noises that reflect some of the real-world noise such as blur, salt-and-pepper, etc. Figure 13 shows some of the input images with added noise used during training. These images capture the visual diversity present in real-world computer-composed documents and thus are able to train the model well enough to obtain high accuracy on real-world unseen test sets.

The CRNN-VDS model has also shown inadequate performances on the Letterpress and Typewriter testing sets, achieving a WRR of 57.86% and 28.05% respectively. The sub-standard performance is mainly due to the introduction of unique noises in these datasets that are, in some cases, even different in terms of the color schemes of the background. Also, the synthetic data the model was trained on mostly mimicked computer-composed data and not letterpress or typewriter data as the texts from those domains usually have different fonts, paper textures, and noises unique to those domains. Finally, due to this evaluation, we can make estimations of the performance of the synthetically trained CRNN-VDS model in production circumstances.

Appendix B: Algorithm for the extraction methods

In traditional sequential models, Unicode texts are broken down into single characters and each character is used as a label. Our Novel VDS and ADS methods aim to keep the consonant clusters unbroken by keeping all the participating consonants of a cluster together as its label instead of using multiple labels to represent it. VDS and ADS are rule-based methods that are used to extract labels from a text for training or testing purposes for the CRNN-VDS and CRNN-ADS architectures. Algorithms 1–2 represents the VDS method and 3–4 represents the ADS method.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Roy, K., Hossain, M.S., Saha, P.K. et al. A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR. IJDAR 27, 73–95 (2024). https://doi.org/10.1007/s10032-023-00446-7

Download citation

Received: 04 August 2022
Revised: 18 March 2023
Accepted: 10 June 2023
Published: 05 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10032-023-00446-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

Abstract

Access this article

Similar content being viewed by others

Borno: Bangla Handwritten Character Recognition Using a Multiclass Convolutional Neural Network

A Large Multi-target Dataset of Common Bengali Handwritten Graphemes

An Improved Method to Recognize Bengali Handwritten Characters Using CNN

Notes

References

Acknowledgements