Document Image Classification Method Based on Graph Convolutional Network

Xiong, Yangyang; Dai, Zhongjian; Liu, Yan; Ding, Xiaotian

doi:10.1007/978-3-030-92185-9_26

Yangyang Xiong¹³,
Zhongjian Dai¹³,
Yan Liu¹⁴ &
…
Xiaotian Ding¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13108))

Included in the following conference series:

International Conference on Neural Information Processing

2706 Accesses
1 Citations

Abstract

Automatic and reliable document image classification is an essential part of high-level business intelligence. Previous studies mainly focus on applying Convolutional Neural Network (CNN)-based methods like GoogLeNet, VGG, ResNet, etc. These methods only rely on visual information of images but textual and layout features are ignored, thereby their performances in document image classification tasks are limited. Using multi-modal content can improve classification performances since most document images found in business systems carry explicit semantic and layout information. This paper presents an innovative method based on the Graph Convolutional Network (GCN) to learn multiple input image features, including visual, textual, and positional features. Compared with the CNN-based methods, the proposed approach can make full use of the multi-modal features of the document image to lead the model competitive with other state-of-the-art methods with much fewer parameters. In addition, the proposed model does not require large-scale pre-training. Experiments show that the proposed method achieves an accuracy of 93.45% on the popular RVL-CDIP document image dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding, pp. 1–16 (2020)
Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF) original publication. Comput. Vis. Image Underst. 110, 346–359 (2008)
Article Google Scholar
Low, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Confernce Computing Vision, pp. 2564–2571 (2011)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference Learning Representation ICLR 2017 - Conference Track Proceedings, pp. 1–14 (2017)
Google Scholar
Barbu, E., Héroux, P., Adam, S., Trupin, É.: Using bags of symbols for automatic indexing of graphical document image databases. In: Liu, W., Lladós, J. (eds.) GREC 2005. LNCS, vol. 3926, pp. 195–205. Springer, Heidelberg (2006). https://doi.org/10.1007/11767978_18
Chapter Google Scholar
Kumar, J., Prasad, R., Cao, H., Abd-Almageed, W., Doermann, D., Natarajan, P.: Shape codebook based handwritten and machine printed text zone extraction. In: ProcSPIE (2011). https://doi.org/10.1117/12.876725
Kang, L., Kumar, J., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for document image classification. In: 2014 22nd International Conference on Pattern Recognition, p. 3168–3172 (2014)
Google Scholar
Afzal, M.Z., Capobianco, S., Malik, M.I., Marinai, S., Breuel, T.M., Dengel, A., et al.: Deepdocclassifier: document classification with deep convolutional neural network. In: Proceedings of International Conference Document Analysis and Recognition, ICDAR, pp. 1111–1115. IEEE (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017)
Article Google Scholar
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings - 30th IEEE Conference oComputer Vision Pattern Recognition, CVPR 2017, 2017-January, pp. 2642–2651 (2017)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–304 (2017)
Article Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, pp. 10544–10553 (2019)
Google Scholar
Noce, L., Gallo, I., Zamberletti, A., Calefati, A.: Embedded textual content for document image classification with convolutional neural networks. In: DocEng 2016 - Proceedings 2016 ACM Symposium Document Engineering, pp. 165–73 (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference North America Chapter Association Computer Linguistius Human Language Technology - Proceedings Conference, pp. 1:4171–1:4186 (2019)
Google Scholar
Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, 2015-November, pp. 991–995 (2015)
Google Scholar
Bao, H., Dong, L., Wei, F., Wang, W., Yang, N., Liu, X., et al.: Unilmv2: pseudo-masked language models for unified language model pre-Training. In: 37th International Conference on Machine Learning, ICML 2020, Part F16814, pp. 619–629 (2020)
Google Scholar
Tensmeyer, C., Martinez, T.: Analysis of convolutional neural networks for document image classification. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, vol. 1, pp. 388–393 (2017)
Google Scholar
Afzal, M.Z., Kolsch, A., Ahmed, S., Liwicki, M.: Cutting the error by half: investigation of very deep cnn and advanced training strategies for document image classification. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, vol. 1, pp. 883–888 (2017)
Google Scholar
Das, A., Roy, S., Bhattacharya, U., Parui, S.K.: Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: Proceedings - International Conference on Pattern Recognition, 2018-August, pp. 3180–3185 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, 100081, China
Yangyang Xiong & Zhongjian Dai
Taikang Insurance Group, Beijing, 100031, China
Yan Liu & Xiaotian Ding

Authors

Yangyang Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Zhongjian Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotian Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Liu .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiong, Y., Dai, Z., Liu, Y., Ding, X. (2021). Document Image Classification Method Based on Graph Convolutional Network. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-92185-9_26
Published: 06 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Document Image Classification Method Based on Graph Convolutional Network