skip to main content
research-article

A Deep OCR for Degraded Bangla Documents

Published:25 August 2022Publication History
Skip Abstract Section

Abstract

Despite the significant success of document image analysis techniques, efficient Optical Character Recognition (OCR) of degraded document images still remains an open problem. Although a body of work has been reported on degraded document recognition for English language, only little attention has been paid to Indic scripts. In this work, we focus on developing a degraded OCR for Bangla, a major Indian language. In general, an OCR system includes segmentation of the foreground text part from the background followed by recognition of the extracted text. The text segmentation module aims to assign the foreground or background label to each pixel of the document image. In this paper, we present a new OCR system which is particularly suitable for degraded quality Bangla document images. The contribution is two fold. In the first phase, we use a semi-supervised Markov Random Field (MRF)-based Generative Adversarial Network (GAN) model (which we call MRF-GAN) for foreground segmentation of texts from degraded text. In the proposed MRF-GAN, we extend the concept of GAN to a multitask learning mechanism where discriminator-classifier networks differentiate between real/fake images and also assign a foreground or background label to each pixel. In the second phase, we propose to use a new encoder-decoder based recognizer that incorporates an attention-based character to a word prediction model, which has the capability of minimizing Word Error Rate (WER). We optimize this network using a Multitask based Transfer Learning scheme (MTTL). We perform experiments on a publicly available degraded Bangla document image dataset as well as on a new degraded printed Hindi document image dataset, which has been created as a part of the present study. Results of the experimentations demonstrate the efficacy of the proposed OCR.

REFERENCES

  1. [1] Avadesh Meduri and Goyal Navneet. 2018. Optical character recognition for Sanskrit using convolution neural networks. In DAS. 447452.Google ScholarGoogle Scholar
  2. [2] Chaudhuri B. B. and Pal U.. 1997. An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). In ICDAR. 10111015.Google ScholarGoogle Scholar
  3. [3] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  4. [4] Banerjee S., Mullick K., and Bhattacharya U.. 2013. A robust approach to extraction of texts from camera captured images. In Proc. of the 5th International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2013). 5358.Google ScholarGoogle Scholar
  5. [5] Baviskar D., Ahirrao S., and Kotecha K.. 2021. Multi-layout unstructured invoice documents dataset: A dataset for template-free invoice processing and its evaluation using AI approaches. IEEE Access 9 (2021), 101494101512.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Baviskar D., Ahirrao S., Potdar V., and Kotecha K.. 2021. Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions. IEEE Access (2021).Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bhattacharya U., Shridhar M., Parui S. K., Sen P. K., and Chaudhuri B. B.. 2012. Offline recognition of handwritten Bangla characters: An efficient two-stage approach. Pattern Analysis and Applications 15, 4 (2012), 445458.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Biswas B., Bhattacharya U., and Chaudhuri B. B.. 2014. A global-to-local approach to binarization of degraded document images. In 22nd International Conference on Pattern Recognition (ICPR 2014). 30083013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Biswas C., Mukherjee P. S., Ghosh K., Bhattacharya U., and Parui S. K.. 2018. A hybrid deep architecture for robust recognition of text lines of degraded printed documents. In ICPR. 31743179.Google ScholarGoogle Scholar
  10. [10] Blake A., Rother C., Brown M. A., Pérez P., and Torr P. H. S.. 2004. Interactive image segmentation using an adaptive GMMRF Model. In ECCV. 428441.Google ScholarGoogle Scholar
  11. [11] Boykov Y. and Jolly M-P. 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV. 105112.Google ScholarGoogle Scholar
  12. [12] Boykov Y. and Kolmogorov V.. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. PAMI 26, 9 (2004), 11241137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Boykov Y., Veksler O., and Zabih R.. 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. PAMI 23, 11 (2001), 12221239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Calvo-Zaragoza J. and Gallego A.-J.. 2019. A selectional auto-encoder approach for document image binarization. PR 86 (2019), 3747.Google ScholarGoogle Scholar
  15. [15] Chakraborty Bappaditya, Shaw Bikash, Aich Jayanta, Bhattacharya Ujjwal, and Parui Swapan Kumar. 2018. Does deeper network lead to better accuracy: A case study on handwritten Devanagari characters. In Proc. of 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 411416.Google ScholarGoogle Scholar
  16. [16] Chan William, Jaitly Navdeep, Le Quoc, and Vinyals Oriol. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In ICASSP. 49604964.Google ScholarGoogle Scholar
  17. [17] Chowdhury A. R., Bhattacharya U., and Parui S. K.. 2012. Text detection of two major Indian scripts in natural scene images. In Camera-Based Document Analysis and Recognition, Lecture Notes in Computer Science, Vol. 7139. 4257.Google ScholarGoogle Scholar
  18. [18] Cortes Corinna and Vapnik Vladimir. 1995. Support-vector networks. Mach. Learn. 20, 3 (Sept. 1995), 273297.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Devika R., Vairavasundaram S., C S. J. Mahenthar,. Varadarajan V., and Kotecha K.. 2021. A deep learning model based on BERT and sentence transformer for semantic keyphrase extraction on big social data. IEEE Access (2021).Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Dutta S., Sankaran N., Sankar K. P., and Jawahar C. V.. 2012. Robust recognition of degraded documents using character n-grams. In DAS. 130134.Google ScholarGoogle Scholar
  21. [21] Giotis A. P., Sfikas G., Gatos B., and Nikou C.. 2017. A survey of document image word spotting techniques. PR 68 (2017), 310332.Google ScholarGoogle Scholar
  22. [22] Goodfellow Ian. 2016. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016).Google ScholarGoogle Scholar
  23. [23] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. In NIPS. 26722680.Google ScholarGoogle Scholar
  24. [24] Graves A., Fernández S., Gomez F., and Schmidhuber J.. 2006. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In ICML. 369376.Google ScholarGoogle Scholar
  25. [25] Howe N. R.. 2011. A Laplacian energy for document binarization. In ICDAR. 610.Google ScholarGoogle Scholar
  26. [26] Husain Hisham, Nock Richard, and Williamson Robert C.. 2019. A primal-dual link between GANs and autoencoders. In NIPS. 415424.Google ScholarGoogle Scholar
  27. [27] Im Daniel Jiwoong, Kim Chris Dongjoo, Jiang Hui, and Memisevic Roland. 2016. Generating images with recurrent adversarial networks. arXiv preprint arXiv:1602.05110 (2016).Google ScholarGoogle Scholar
  28. [28] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A.. 2017. Image-to-image translation with conditional adversarial networks. In CVPR. 11251134.Google ScholarGoogle Scholar
  29. [29] Jawahar C. V., Kumar M. N. S. S. K. Pavan, and Kiran S. S. Ravi. 2003. A bilingual OCR for Hindi-Telugu documents and its applications. In ICDAR. 408412.Google ScholarGoogle Scholar
  30. [30] Jia F., Shi C., He K., Wang C., and Xiao B.. 2018. Degraded document image binarization using structural symmetry of strokes. PR 74 (2018), 225240.Google ScholarGoogle Scholar
  31. [31] Jino P. J., Balakrishnan K., and Bhattacharya U.. 2017. Offline handwritten Malayalam word recognition using a deep architecture. In Proc. of 7th Int. Conf. on Soft Computing for Problem Solving (SocProS), Vol. 1. 913925.Google ScholarGoogle Scholar
  32. [32] Gao C. Peng, K. Wu, Z., and Wen X.. 2013. Text window denoising autoencoder: Building deep architecture for Chinese word segmentation. In Natural Language Processing and Chinese Computing. Communications in Computer and Information Science, Vol. 400. 112.Google ScholarGoogle Scholar
  33. [33] Kapur J. N., Sahoo P. K., and Wong A. K. C.. 1985. A new method for gray-level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing 29, 3 (1985), 273285.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Kuk J. Gap and Cho N. I.. 2009. Feature based binarization of document images degraded by uneven light condition. In ICDAR. 748752.Google ScholarGoogle Scholar
  35. [35] Kumar M., Jindal M. K., Sharma R. K., al. et2017. Offline handwritten Gurmukhi character recognition: Analytical study of different transformations. Proceedings of the National Academy of Sciences, India, Section A: Physical Sciences 87 (2017), 137143.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Kumar M., Jindal M. K., Sharma R. K., al. et2020. Performance evaluation of classifiers for the recognition of offline handwritten Gurmukhi characters and numerals: A study. Artificial Intelligence Review 53 (2020), 20752097.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Kumar M., Jindal M. K., R. K. Sharma, Jindal S. R., and Singh H.. 2021. Improved recognition results of offline handwritten Gurumukhi characters using hybrid features and adaptive boosting. Soft Computing 25, 17 (2021), 1158911601.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Kumar Munish, Jindal Manish Kumar, Sharma Rajendra Kumar, and Jindal Simpel Rani. 2019. Character and numeral recognition for non-Indic and Indic scripts: A survey. Artificial Intelligence Review 52, 4 (2019), 22352261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Kumar Munish, Jindal M. K., Sharma R. K., and RaniJindal Simpel. 2018. Performance comparison of several feature selection techniques for offline handwritten character recognition. In Proc. of International Conference on Research in Intelligent and Computing in Engineering (RICE). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Kumar Munish, Jindal Simpel Rani, Jindal Manish Kumar, and Lehal Gurpreet Singh. 2019. Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Processing Letters 50, 1 (2019), 4356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Lahiri Avisek, Ayush Kumar, Biswas Prabir Kumar, and Mitra Pabitra. 2017. Generative adversarial learning for reducing manual annotation in semantic segmentation on large scale miscroscopy images: Automated vessel segmentation in retinal fundus image as test case. In CVPR Workshops. 4248.Google ScholarGoogle Scholar
  42. [42] Lavrenko V., Rath T. M., and Manmatha R.. 2004. Holistic word recognition for handwritten historical documents. In 1st International Workshop on Document Image Analysis for Libraries. 278287.Google ScholarGoogle Scholar
  43. [43] Lee Chen-Yu and Osindero Simon. 2016. Recursive recurrent nets with attention modeling for OCR in the wild. In CVPR. 22312239.Google ScholarGoogle Scholar
  44. [44] Luc Pauline, Couprie Camille, Chintala Soumith, and Verbeek Jakob. 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).Google ScholarGoogle Scholar
  45. [45] Ly Nam Tuan, Nguyen Cuong Tuan, and Nakagawa Masaki. 2019. An attention-based end-to-end model for multiple text lines recognition in Japanese historical documents. In ICDAR. 629634.Google ScholarGoogle Scholar
  46. [46] Milyaev S., Barinova O., Novikova T., Kohli P., and Lempitsky V. S.. 2013. Image binarization for end-to-end text understanding in natural images. In ICDAR. 128132.Google ScholarGoogle Scholar
  47. [47] Mishra A., Alahari K., and Jawahar C. V.. 2011. An MRF model for binarization of natural scene text. In ICDAR. 1116.Google ScholarGoogle Scholar
  48. [48] Mullick K., Banerjee S., and Bhattacharya U.. 2015. An efficient line segmentation approach for handwritten Bangla document image. In Proc. of ICAPR. 16.Google ScholarGoogle Scholar
  49. [49] Mushtaq F., Misgar M. M., Kumar M., and Khurana S. S.. 2021. UrduDeepNet: Offline handwritten Urdu character recognition using deep neural network. Neural Computing and Applications (2021), 124.Google ScholarGoogle Scholar
  50. [50] Nafchi H. Ziaei, Moghaddam R. F., and Cheriet M.. 2014. Phase-based binarization of ancient document images: Model and applications. IEEE Trans. on Image Processing 23, 7 (2014), 29162930.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Narang Sonika, Jindal M. K., and Kumar Munish. 2019. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā 44, 6 (2019), 18.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Narang Sonika Rani, Jindal Manish Kumar, Ahuja Shruti, and Kumar Munish. 2020. On the recognition of Devanagari ancient handwritten characters using SIFT and Gabor features. Soft Computing 24, 22 (2020), 1727917289.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Narang S. R., Kumar M., and Jindal M. K.. 2021. DeepNetDevanagari: A deep learning model for Devanagari ancient character recognition. Multimedia Tools and Applications 80, 13 (2021), 2067120686.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Nhat V. Q., Kim S.-H., Yang H. J., and Lee G.. 2016. An MRF model for binarization of music scores with complex background. PRL 69 (2016), 8895.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Niblack Wayne. 1985. An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed, Denmark.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Otsu N.. 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics 9, 1 (1979), 6266.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Peng Xujun, Setlur Srirangaraj, Govindaraju Venu, and Sitaram Ramachandrula. 2010. Markov random field based binarization for hand-held devices captured document images. In ICVGIP. 7176.Google ScholarGoogle Scholar
  58. [58] Rothacker L., Fink G. A., Banerjee P., Bhattacharya U., and Chaudhuri B. B.. 2013. Bag-of-features HMMs for segmentation-free Bangla word spotting. In Proceedings of the 4th International Workshop on Multilingual OCR (Washington, D.C., USA) (MOCR’13). Article 5, 5 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Rother C., Kolmogorov V., and Blake A.. 2004. “GrabCut” – interactive foreground extraction using iterated cuts. ACM Trans. on Graphics 23, 3 (2004), 309314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Sari T., Kefali A., and Bahi H.. 2016. Structural feature-based evaluation method of binarization techniques for word retrieval in the degraded Arabic document images. IJDAR 19, 1 (2016), 3147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Sauvola J. J. and Pietikäinen M.. 2000. Adaptive document image binarization. PR 33, 2 (2000), 225236.Google ScholarGoogle Scholar
  62. [62] Shaw B., Bhattacharya U., and Parui S. K.. 2015. Offline handwritten Devanagari word recognition : Information fusion at feature and classifier levels. In Proc. of ACPR. IEEE, 720724.Google ScholarGoogle Scholar
  63. [63] Singh H., Sharma R. K., Singh V. P., and Kumar M.. 2021. Recognition of online handwritten Gurmukhi characters using recurrent neural network classifier. Soft Computing 25, 8 (2021), 63296338.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Souly Nasim, Spampinato Concetto, and Shah Mubarak. 2017. Semi supervised semantic segmentation using generative adversarial network. In ICCV. 56885696.Google ScholarGoogle Scholar
  65. [65] Su B., Lu S., and Tan C. L.. 2013. Robust document image binarization technique for degraded document images. IEEE Trans. on Image Processing 22, 4 (2013), 14081417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Tang Y., Peng L., Xu Q., Wang Y., and Furuhata A.. 2016. CNN based transfer learning for historical Chinese character recognition. In DAS. 2529.Google ScholarGoogle Scholar
  67. [67] Tensmeyer C. and Martinez T.. 2017. Document image binarization with fully convolutional neural networks. In ICDAR. 99104.Google ScholarGoogle Scholar
  68. [68] Tian S., Bhattacharya U., Lu S., Su B., Wang Q., Wei X., Lu Y., and Tan C. L.. 2016. Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognition 51 (2016), 125134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Tolstikhin Ilya, Bousquet Olivier, Gelly Sylvain, and Schoelkopf Bernhard. 2017. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558 (2017).Google ScholarGoogle Scholar
  70. [70] Vo Q. N., Kim S.-H., Yang H. J., and Lee G.. 2018. Binarization of degraded document images based on hierarchical deep supervised network. PR 74 (2018), 568586.Google ScholarGoogle Scholar
  71. [71] Wang Jianfeng and Hu Xiaolin. 2017. Gated recurrent convolution neural network for OCR. In NIPS. 334343.Google ScholarGoogle Scholar
  72. [72] Wolf C. and Doermann D. S.. 2002. Binarization of low quality text using a Markov random field model. In ICPR. 160163.Google ScholarGoogle Scholar
  73. [73] Yan Ziang, Yan Chengzhe, and Zhang Changshui. 2017. Rare Chinese character recognition by radical extraction network. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 924929.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Zhang Pengchuan, Liu Qiang, Zhou Dengyong, Xu Tao, and He Xiaodong. 2017. On the discrimination-generalization tradeoff in GANs. arXiv preprint arXiv:1711.02771 (2017).Google ScholarGoogle Scholar

Index Terms

  1. A Deep OCR for Degraded Bangla Documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 5
      September 2022
      486 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3533669
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 August 2022
      • Online AM: 20 April 2022
      • Revised: 1 January 2022
      • Accepted: 1 January 2022
      • Received: 1 July 2021
      Published in tallip Volume 21, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format