Abstract
Despite the significant success of document image analysis techniques, efficient Optical Character Recognition (OCR) of degraded document images still remains an open problem. Although a body of work has been reported on degraded document recognition for English language, only little attention has been paid to Indic scripts. In this work, we focus on developing a degraded OCR for Bangla, a major Indian language. In general, an OCR system includes segmentation of the foreground text part from the background followed by recognition of the extracted text. The text segmentation module aims to assign the foreground or background label to each pixel of the document image. In this paper, we present a new OCR system which is particularly suitable for degraded quality Bangla document images. The contribution is two fold. In the first phase, we use a semi-supervised Markov Random Field (MRF)-based Generative Adversarial Network (GAN) model (which we call MRF-GAN) for foreground segmentation of texts from degraded text. In the proposed MRF-GAN, we extend the concept of GAN to a multitask learning mechanism where discriminator-classifier networks differentiate between real/fake images and also assign a foreground or background label to each pixel. In the second phase, we propose to use a new encoder-decoder based recognizer that incorporates an attention-based character to a word prediction model, which has the capability of minimizing Word Error Rate (WER). We optimize this network using a Multitask based Transfer Learning scheme (MTTL). We perform experiments on a publicly available degraded Bangla document image dataset as well as on a new degraded printed Hindi document image dataset, which has been created as a part of the present study. Results of the experimentations demonstrate the efficacy of the proposed OCR.
- [1] . 2018. Optical character recognition for Sanskrit using convolution neural networks. In DAS. 447–452.Google Scholar
- [2] . 1997. An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). In ICDAR. 1011–1015.Google Scholar
- [3] . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- [4] . 2013. A robust approach to extraction of texts from camera captured images. In Proc. of the 5th International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2013). 53–58.Google Scholar
- [5] . 2021. Multi-layout unstructured invoice documents dataset: A dataset for template-free invoice processing and its evaluation using AI approaches. IEEE Access 9 (2021), 101494–101512.Google ScholarCross Ref
- [6] . 2021. Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions. IEEE Access (2021).Google ScholarCross Ref
- [7] . 2012. Offline recognition of handwritten Bangla characters: An efficient two-stage approach. Pattern Analysis and Applications 15, 4 (2012), 445–458.Google ScholarDigital Library
- [8] . 2014. A global-to-local approach to binarization of degraded document images. In 22nd International Conference on Pattern Recognition (ICPR 2014). 3008–3013.Google ScholarDigital Library
- [9] . 2018. A hybrid deep architecture for robust recognition of text lines of degraded printed documents. In ICPR. 3174–3179.Google Scholar
- [10] . 2004. Interactive image segmentation using an adaptive GMMRF Model. In ECCV. 428–441.Google Scholar
- [11] . 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV. 105–112.Google Scholar
- [12] . 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. PAMI 26, 9 (2004), 1124–1137.Google ScholarDigital Library
- [13] . 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. PAMI 23, 11 (2001), 1222–1239.Google ScholarDigital Library
- [14] . 2019. A selectional auto-encoder approach for document image binarization. PR 86 (2019), 37–47.Google Scholar
- [15] . 2018. Does deeper network lead to better accuracy: A case study on handwritten Devanagari characters. In Proc. of 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 411–416.Google Scholar
- [16] . 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In ICASSP. 4960–4964.Google Scholar
- [17] . 2012. Text detection of two major Indian scripts in natural scene images. In Camera-Based Document Analysis and Recognition, Lecture Notes in Computer Science, Vol. 7139. 42–57.Google Scholar
- [18] . 1995. Support-vector networks. Mach. Learn. 20, 3 (
Sept. 1995), 273–297.Google ScholarCross Ref - [19] . 2021. A deep learning model based on BERT and sentence transformer for semantic keyphrase extraction on big social data. IEEE Access (2021).Google ScholarCross Ref
- [20] . 2012. Robust recognition of degraded documents using character n-grams. In DAS. 130–134.Google Scholar
- [21] . 2017. A survey of document image word spotting techniques. PR 68 (2017), 310–332.Google Scholar
- [22] . 2016. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016).Google Scholar
- [23] . 2014. Generative adversarial nets. In NIPS. 2672–2680.Google Scholar
- [24] . 2006. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In ICML. 369–376.Google Scholar
- [25] . 2011. A Laplacian energy for document binarization. In ICDAR. 6–10.Google Scholar
- [26] . 2019. A primal-dual link between GANs and autoencoders. In NIPS. 415–424.Google Scholar
- [27] . 2016. Generating images with recurrent adversarial networks. arXiv preprint arXiv:1602.05110 (2016).Google Scholar
- [28] . 2017. Image-to-image translation with conditional adversarial networks. In CVPR. 1125–1134.Google Scholar
- [29] . 2003. A bilingual OCR for Hindi-Telugu documents and its applications. In ICDAR. 408–412.Google Scholar
- [30] . 2018. Degraded document image binarization using structural symmetry of strokes. PR 74 (2018), 225–240.Google Scholar
- [31] . 2017. Offline handwritten Malayalam word recognition using a deep architecture. In Proc. of 7th Int. Conf. on Soft Computing for Problem Solving (SocProS), Vol. 1. 913–925.Google Scholar
- [32] . 2013. Text window denoising autoencoder: Building deep architecture for Chinese word segmentation. In Natural Language Processing and Chinese Computing. Communications in Computer and Information Science, Vol. 400. 1–12.Google Scholar
- [33] . 1985. A new method for gray-level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing 29, 3 (1985), 273–285.Google ScholarCross Ref
- [34] . 2009. Feature based binarization of document images degraded by uneven light condition. In ICDAR. 748–752.Google Scholar
- [35] 2017. Offline handwritten Gurmukhi character recognition: Analytical study of different transformations. Proceedings of the National Academy of Sciences, India, Section A: Physical Sciences 87 (2017), 137–143.Google ScholarCross Ref
- [36] 2020. Performance evaluation of classifiers for the recognition of offline handwritten Gurmukhi characters and numerals: A study. Artificial Intelligence Review 53 (2020), 2075–2097.Google ScholarCross Ref
- [37] . 2021. Improved recognition results of offline handwritten Gurumukhi characters using hybrid features and adaptive boosting. Soft Computing 25, 17 (2021), 11589–11601.Google ScholarDigital Library
- [38] . 2019. Character and numeral recognition for non-Indic and Indic scripts: A survey. Artificial Intelligence Review 52, 4 (2019), 2235–2261.Google ScholarDigital Library
- [39] . 2018. Performance comparison of several feature selection techniques for offline handwritten character recognition. In Proc. of International Conference on Research in Intelligent and Computing in Engineering (RICE). IEEE, 1–6.Google ScholarCross Ref
- [40] . 2019. Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Processing Letters 50, 1 (2019), 43–56.Google ScholarDigital Library
- [41] . 2017. Generative adversarial learning for reducing manual annotation in semantic segmentation on large scale miscroscopy images: Automated vessel segmentation in retinal fundus image as test case. In CVPR Workshops. 42–48.Google Scholar
- [42] . 2004. Holistic word recognition for handwritten historical documents. In 1st International Workshop on Document Image Analysis for Libraries. 278–287.Google Scholar
- [43] . 2016. Recursive recurrent nets with attention modeling for OCR in the wild. In CVPR. 2231–2239.Google Scholar
- [44] . 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).Google Scholar
- [45] . 2019. An attention-based end-to-end model for multiple text lines recognition in Japanese historical documents. In ICDAR. 629–634.Google Scholar
- [46] . 2013. Image binarization for end-to-end text understanding in natural images. In ICDAR. 128–132.Google Scholar
- [47] . 2011. An MRF model for binarization of natural scene text. In ICDAR. 11–16.Google Scholar
- [48] . 2015. An efficient line segmentation approach for handwritten Bangla document image. In Proc. of ICAPR. 1–6.Google Scholar
- [49] . 2021. UrduDeepNet: Offline handwritten Urdu character recognition using deep neural network. Neural Computing and Applications (2021), 1–24.Google Scholar
- [50] . 2014. Phase-based binarization of ancient document images: Model and applications. IEEE Trans. on Image Processing 23, 7 (2014), 2916–2930.Google ScholarCross Ref
- [51] . 2019. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā 44, 6 (2019), 1–8.Google ScholarCross Ref
- [52] . 2020. On the recognition of Devanagari ancient handwritten characters using SIFT and Gabor features. Soft Computing 24, 22 (2020), 17279–17289.Google ScholarDigital Library
- [53] . 2021. DeepNetDevanagari: A deep learning model for Devanagari ancient character recognition. Multimedia Tools and Applications 80, 13 (2021), 20671–20686.Google ScholarDigital Library
- [54] . 2016. An MRF model for binarization of music scores with complex background. PRL 69 (2016), 88–95.Google ScholarCross Ref
- [55] . 1985. An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed, Denmark.Google ScholarDigital Library
- [56] . 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics 9, 1 (1979), 62–66.Google ScholarCross Ref
- [57] . 2010. Markov random field based binarization for hand-held devices captured document images. In ICVGIP. 71–76.Google Scholar
- [58] . 2013. Bag-of-features HMMs for segmentation-free Bangla word spotting. In Proceedings of the 4th International Workshop on Multilingual OCR (Washington, D.C., USA) (
MOCR’13 ). Article5 , 5 pages.Google ScholarDigital Library - [59] . 2004. “GrabCut” – interactive foreground extraction using iterated cuts. ACM Trans. on Graphics 23, 3 (2004), 309–314.Google ScholarDigital Library
- [60] . 2016. Structural feature-based evaluation method of binarization techniques for word retrieval in the degraded Arabic document images. IJDAR 19, 1 (2016), 31–47.Google ScholarDigital Library
- [61] . 2000. Adaptive document image binarization. PR 33, 2 (2000), 225–236.Google Scholar
- [62] . 2015. Offline handwritten Devanagari word recognition : Information fusion at feature and classifier levels. In Proc. of ACPR. IEEE, 720–724.Google Scholar
- [63] . 2021. Recognition of online handwritten Gurmukhi characters using recurrent neural network classifier. Soft Computing 25, 8 (2021), 6329–6338.Google ScholarCross Ref
- [64] . 2017. Semi supervised semantic segmentation using generative adversarial network. In ICCV. 5688–5696.Google Scholar
- [65] . 2013. Robust document image binarization technique for degraded document images. IEEE Trans. on Image Processing 22, 4 (2013), 1408–1417.Google ScholarDigital Library
- [66] . 2016. CNN based transfer learning for historical Chinese character recognition. In DAS. 25–29.Google Scholar
- [67] . 2017. Document image binarization with fully convolutional neural networks. In ICDAR. 99–104.Google Scholar
- [68] . 2016. Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognition 51 (2016), 125–134.Google ScholarDigital Library
- [69] . 2017. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558 (2017).Google Scholar
- [70] . 2018. Binarization of degraded document images based on hierarchical deep supervised network. PR 74 (2018), 568–586.Google Scholar
- [71] . 2017. Gated recurrent convolution neural network for OCR. In NIPS. 334–343.Google Scholar
- [72] . 2002. Binarization of low quality text using a Markov random field model. In ICPR. 160–163.Google Scholar
- [73] . 2017. Rare Chinese character recognition by radical extraction network. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 924–929.Google ScholarDigital Library
- [74] . 2017. On the discrimination-generalization tradeoff in GANs. arXiv preprint arXiv:1711.02771 (2017).Google Scholar
Index Terms
- A Deep OCR for Degraded Bangla Documents
Recommendations
Development of an Assamese OCR using Bangla OCR
DAR '12: Proceeding of the workshop on Document Analysis and RecognitionThis paper refers to the development of an OCR for the Assamese language by modifying an existing OCR for the Bangla language. This modification is feasible because the Assamese script is similar, except for a few characters, to the Bangla script. The ...
On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis
AFSS '02: Proceedings of the 2002 AFSS International Conference on Fuzzy Systems. Calcutta: Advances in Soft ComputingOptical Character Recognition (OCR) systems show poor performance while processing documents like old books or newspapers, Xerox materials, faxed documents, etc. Such documents are considered as degraded documents. One of the important reasons for poor ...
Comments