Abstract
Optical Character Recognition (OCR) is a process of digitizing an image or document containing text in a machine-readable format. In this paper, we are focusing on targeting only the numeric part with a few special characters in the tables. Many firms dealing in financial information would want to parse data from scanned tables and in some cases, they do not focus on the row labels as they might not change a lot. Only focusing on numeric information may also provide language independence to such firms that deal with documents written in a variety of languages. They can have foreign language experts who can just read row labels and have the OCR extract the numeric data. This makes their collection processes fast. We developed a targeted OCR to save time by processing only important characters and it can also overcome erroneous predictions in case of under segmentation of characters. In this paper, we propose a novel approach which segments the document into blocks of text (each line or word into one block) and classifies each block as numeric or non-numeric using a binary CNN. The process of character level segmentation and classification using capsule networks is then applied only to the blocks which are classified as numeric by the binary CNN.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Christensson, P.: OCR Definition, 16 April 2018. https://techterms.com. Accessed 14 Nov 2018
Sharma, S., Sasi, A., Cheeran, A.: An SVM based character recognition system. In: 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT), pp. 1703–1707 (2017)
Lu, Z.A., Bazzi, I., Kornai, A., Makhoul, J., Natarajan, P. S., Schwartz, R.: A robust, language-independent OCR system. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 3584 (2000)
Bebis, G., Georgiopoulos, M.: Feed-forward neural networks. IEEE Potentials 13(4), 27–31 (1994)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.7665
Simon, M., Rodner, E., Denzler, J.: ImageNet pre-trained models with batch normalization. CoRR, vol. abs/1612.01452 (2016). http://arxiv.org/abs/1612.01452
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015 (2015). http://arxiv.org/abs/1409.1556
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. arXiv e-prints, October 2017
FactSet. FactSet Research Systems: Company Name Inc.: Targets and ratings (n.d.). From FactSet database. Accessed 25 May 2018
Dataset: Characters from computer fonts with 4 variations (Combinations of italic, bold and normal). http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/EnglishFnt.tgz
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Prajapati, P., Thakkar, S., Shah, K. (2020). Targeted Optical Character Recognition: Classification Using Capsule Network. In: Nain, N., Vipparthi, S., Raman, B. (eds) Computer Vision and Image Processing. CVIP 2019. Communications in Computer and Information Science, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-15-4018-9_42
Download citation
DOI: https://doi.org/10.1007/978-981-15-4018-9_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4017-2
Online ISBN: 978-981-15-4018-9
eBook Packages: Computer ScienceComputer Science (R0)