Document Image Analysis Using Deep Multi-modular Features

Jobin, K. V.; Mondal, Ajoy; Jawahar, C. V.

doi:10.1007/s42979-022-01414-4

Document Image Analysis Using Deep Multi-modular Features

Original Research
Published: 15 October 2022

Volume 4, article number 5, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

179 Accesses
2 Citations
Explore all metrics

Abstract

Texture or repeating patterns, discriminative patches, and shapes are the salient features for various document image analysis problems. This article proposes a deep network architecture that independently learns texture patterns, discriminative patches, and shapes to solve various document image analysis tasks. The considered tasks are document image classification, genre identification from book covers, scientific document figure classification, and script identification. The presented network learns global, texture, and discriminative features and combines them judicially based on the nature of the problems to be solved. We compare the performance of the proposed approach with state-of-the-art techniques on multiple publicly available datasets such as Book-Cover, rvl-cdip, cvsi and docfigure. Experiments show that our approach outperforms state-of-the-art for the genre and document figure classifications and obtains comparable results for document image and script classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment

Article Open access 22 November 2021

Automatic image caption generation using deep learning

Article 01 June 2023

Notes

http://www.cs.cmu.edu/~aharley/rvl- cdip/.

References

Das A, Roy S, Bhattacharya U, Parui SK. Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: ICPR 2018.
Afzal MZ, Kolsch A, Ahmed S, Liwicki M. Cutting the error by half investigation of very deep cnn and advanced training strategies for document image classification. In: ICDAR 2017.
Jobin K, Mondal A, Jawahar C. Docfigure: a dataset for scientific document figure classification. In: GREC 2019.
Ubul K, Tursun G, Aysa A, Impedovo D, Pirlo G, Yibulayin T. Script identification of multi-script documents: a survey. IEEE Access. 2017.
Torkkola K. Discriminative features for text document classification. Formal Pattern Anal Appl. 2004;6(4):301–8.
Jiang H, Pan Z, Hu P. Discriminative learning of generative models: large margin multinomial mixture models for document classification. Pattern Anal Appl. 2015;18(3):535–51
Soleimani H, Miller DJ. Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification. Pattern Anal Appl. 2019;22(2):299–309
Iwana BK, Rizvi STR, Ahmed S, Dengel A, Uchida S. Judging a book by its cover. 2016.
Singh AK, Mishra A, Dabral P, Jawahar CV. A simple and effective solution for script identification in the wild. Pattern Recognit. 2016. p. 428–33
Shi B, Bai X, Yao C. Script identification in the wild via discriminative convolutional neural network. Pattern Recognit. 2016;52:448–58
Liu L, Wang Z, Qiu T, Chen Q, Lu Y, Suen CY. Document image classification: progress over two decades. Neurocomputing. 2021;453:223–40.
Harley AW, Ufkes A, Derpanis KG. Evaluation of deep convolutional nets for document image classification and retrieval. In: ICDAR 2015.
Tensmeyer C, Martinez T. Analysis of convolutional neural networks for document image classification. In: ICDAR 2017.
Csurka G, Larlus D, Gordo A, Almazan J. What is the right way to represent document images? 2016.
Wang Y, Morariu VI, Davis LS. Learning a discriminative filter bank within a cnn for fine-grained recognition. In: CVPR 2018.
Zheng H, Fu J, Zha Z-J, Luo J. Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: CVPR 2019.
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence 2017.
Sarkhel R, Nandi A. Deterministic routing between layout abstractions for multi-scale classification of visually rich documents. In: IJCAI 2019.
Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M. Layoutlm: pre-training of text and layout for document image understanding. In: ACM SIGKDD international conference on knowledge discovery & data mining 2020.
Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. 2018.
Dauphinee T, Patel N, Rashidi M. Modular multimodal architecture for document classification. 2019.
Appalaraju S, Jasani B, Kota BU, Xie Y, Manmatha R. Docformer: end-to-end transformer for document understanding. In: ICCV 2021.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: NIPS; 2012. p. 1097–105.
Zujovic J, Gandy L, Friedman S, Pardo B, Pappas TN. Classifying paintings by artistic genre: an analysis of features & classifiers. In: International workshop on multimedia signal processing 2009.
Chiang H, Ge Y, Wu C. Classification of book genres by cover and title. Computer science: class report; 2015.
Biradar GR, Raagini J, Varier A, Sudhir M. Classification of book genres using book cover and title. In: International conference on intelligent systems and green technology (ICISGT) 2019.
Lucieri A, Sabir H, Siddiqui SA, Rizvi STR, Iwana BK, Uchida S, Dengel A, Ahmed S. Benchmarking deep learning models for classification of book covers. SN computer science 2020.
Liu Y, Lu X, Qin Y, Tang Z, Xu J. Review of chart recognition in document images. In: VDA 2013.
Zhou YP, Tan CL. Hough technique for bar charts detection and recognition in document images. In: ICIP 2000.
Zhou YP, Tan CL. Bar charts recognition using hough based syntactic segmentation. In: ICTAD 2000.
Zhou Y, Tan CL. Learning-based scientific chart recognition. In: IWGR 2001.
Prasad VSN, Siddiquie B, Golbeck J, Davis LS. Classifying computer generated charts. In: CBMI 2007.
Savva M, Kong N, Chhajta A, Fei-Fei L, Agrawala M, Heer J. Revision: automated classification, analysis and redesign of chart images. In: User interface software and technology 2011.
Kavasidis I, Palazzo S, Spampinato C, Pino C, Giordano D, Giuffrida D, Messina P. A saliency-based convolutional neural network for table and chart detection in digitized documents. 2018.
Tang B, Liu X, Lei J, Song M, Tao D, Sun S, Dong F. Deepchart: combining deep convolutional networks and deep belief networks in chart classification. Signal Process. 2016.
Siegel N, Horvitz Z, Levin R, Divvala S, Farhadi A. Figureseer: parsing result-figures in research papers. In: ECCV 2016.
Aletras N, Mittal A. Labeling topics with images using a neural network. In: European conference on information retrieval 2017.
Charbonnier J, Sohmen L, Rothman J, Rohden B, Wartena C. Noa: a search engine for reusable scientific images beyond the life sciences. In: European conference on information retrieval 2018.
Shijian L, Tan CL. Script and language identification in noisy and degraded document images. In: IEEE Transactions on PAMI 2007.
Zhou L, Lu Y, Tan CL. Bangla/English script identification based on analysis of connected component profiles. In: International workshop on document analysis systems 2006.
Sharma N, Pal U, Blumenstein M. A study on word-level multi-script identification from video frames. In: 2014 international joint conference on neural networks (IJCNN) 2014.
Mei J, Dai L, Shi B, Bai X. Scene text script identification with convolutional recurrent neural networks. In: ICPR 2016.
Lu L, Yi Y, Huang F, Wang K, Wang Q. Integrating local cnn and global cnn for script identification in natural scene images. IEEE Access. 2019;7:52669–79.
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U. Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recognit. 2019;85:172–84.
Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K. Lwsinet: a deep learning-based approach towards video script identification. Multim Tools Appl. 2021;80(19):29095–128.
Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: delving deep into convolutional nets 2014. arXiv:1405.3531
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: CVPR 2016.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: CVPR 2016.
Cimpoi M, Maji S, Vedaldi A. Deep filter banks for texture recognition and segmentation. In: CVPR 2015.
Zhang H, Xue J, Dana K. Deep ten: texture encoding network. In: CVPR 2017.
Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: CVPR 2017.
Loshchilov I, Hutter F. Sgdr: stochastic gradient descent with warm restarts. 2016.
Li P, Gu J, Kuen J, Morariu VI, Zhao H, Jain R, Manjunatha V, Liu H. Selfdoc: self-supervised document representation learning. In: CVPR 2021.
Bao H, Dong L, Wei F, Wang W, Yang N, Liu X, Wang Y, Gao J, Piao S, Zhou M, et al. Unilmv2: Pseudo-masked language models for unified language model pre-training. In: International conference on machine learning 2020. PMLR.
Xu Y, Xu Y, Lv T, Cui L, Wei F, Wang G, Lu Y, Florencio D, Zhang C, Che W, et al. Layoutlmv2: multi-modal pre-training for visually-rich document understanding. 2020.
LeCun Y, Bottou L, Bengio Y, Haffner P, et al. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE. 1998.
Tang B, Liu X, Lei J, Song M, Tao D, Sun S, Dong F. Deepchart: combining deep convolutional networks and deep belief networks in chart classification. Signal Process. 2016;124:156–61.
Karthikeyani V, Nagarajan S. Machine learning classification algorithms to recognize chart types in portable document format (pdf) files. Int J Comput Appl. 2012;39(2):1–5.
Busch A, Boles WW, Sridharan S. Texture for script identification. IEEE Transactions on PAMI 2005.
Busch A. Multi-font script identification using texture-based features. In: Campilho A, Kamel M (eds) Image analysis and recognition. 2006.
Singhal V, Navin N, Ghosh D. Script-based classification of hand-written text documents in a multilingual environment. In: RIDE-MLIM 2003.
Jaeger S, Ma H, Doermann D. Identifying script on word-level with informational confidence. In: ICDAR 2005.
Pati PB, Ramakrishnan A. Word level multi-script identification. Pattern Recognit Lett. 2008;29(9):1218–29.
Kunte RS, Samuel RDS. On separation of Kannada and English words from a bilingual document employing Gabor features and radial basis function neural network. ICCR 2005.
Philip B, Samuel RS. A novel bilingual OCR for printed Malayalam-English text based on Gabor features and dominant singular values. In: ICDIP 2009.
Rani R, Dhir R, Lehal GS. Script identification of pre-segmented multi-font characters and digits. In: ICDAR 2013.
Chanda S, Franke K, Pal U. Identification of indic scripts on torn-documents. In: ICDAR 2011.
Ukil S, Ghosh S, Md Obaidullah S, Santosh KC, Roy K, Das N. Deep learning for word-level handwritten indic script identification. CoRR 2018.
Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M. ICDAR2015 competition on video script identification (CVSI 2015). In: ICDAR 2015.
Singh AK, Mishra A, Dabral P, Jawahar C. A simple and effective solution for script identification in the wild. In: DASW 2016.

Download references

Funding

One of the authors, Jobin K.V., received a Visvesvaraya Ph.D. fellowship from the government of India.

Author information

Authors and Affiliations

Center for Visual Information Technology, International Institute Information Technology, Hyderabad, Telangana, 500032, India
K. V. Jobin, Ajoy Mondal & C. V. Jawahar

Authors

K. V. Jobin
View author publications
You can also search for this author in PubMed Google Scholar
Ajoy Mondal
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. V. Jobin.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethics approval

This article does no contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jobin, K.V., Mondal, A. & Jawahar, C.V. Document Image Analysis Using Deep Multi-modular Features. SN COMPUT. SCI. 4, 5 (2023). https://doi.org/10.1007/s42979-022-01414-4

Download citation

Received: 13 April 2022
Accepted: 15 September 2022
Published: 15 October 2022
DOI: https://doi.org/10.1007/s42979-022-01414-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Document Image Analysis Using Deep Multi-modular Features

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment

Automatic image caption generation using deep learning

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Document Image Analysis Using Deep Multi-modular Features

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment

Automatic image caption generation using deep learning

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation