Skip to main content

Document Image Classification Method Based on Graph Convolutional Network

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13108))

Included in the following conference series:

Abstract

Automatic and reliable document image classification is an essential part of high-level business intelligence. Previous studies mainly focus on applying Convolutional Neural Network (CNN)-based methods like GoogLeNet, VGG, ResNet, etc. These methods only rely on visual information of images but textual and layout features are ignored, thereby their performances in document image classification tasks are limited. Using multi-modal content can improve classification performances since most document images found in business systems carry explicit semantic and layout information. This paper presents an innovative method based on the Graph Convolutional Network (GCN) to learn multiple input image features, including visual, textual, and positional features. Compared with the CNN-based methods, the proposed approach can make full use of the multi-modal features of the document image to lead the model competitive with other state-of-the-art methods with much fewer parameters. In addition, the proposed model does not require large-scale pre-training. Experiments show that the proposed method achieves an accuracy of 93.45% on the popular RVL-CDIP document image dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding, pp. 1–16 (2020)

    Google Scholar 

  2. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF) original publication. Comput. Vis. Image Underst. 110, 346–359 (2008)

    Article  Google Scholar 

  3. Low, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  4. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Confernce Computing Vision, pp. 2564–2571 (2011)

    Google Scholar 

  5. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference Learning Representation ICLR 2017 - Conference Track Proceedings, pp. 1–14 (2017)

    Google Scholar 

  6. Barbu, E., Héroux, P., Adam, S., Trupin, É.: Using bags of symbols for automatic indexing of graphical document image databases. In: Liu, W., Lladós, J. (eds.) GREC 2005. LNCS, vol. 3926, pp. 195–205. Springer, Heidelberg (2006). https://doi.org/10.1007/11767978_18

    Chapter  Google Scholar 

  7. Kumar, J., Prasad, R., Cao, H., Abd-Almageed, W., Doermann, D., Natarajan, P.: Shape codebook based handwritten and machine printed text zone extraction. In: ProcSPIE (2011). https://doi.org/10.1117/12.876725

  8. Kang, L., Kumar, J., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for document image classification. In: 2014 22nd International Conference on Pattern Recognition, p. 3168–3172 (2014)

    Google Scholar 

  9. Afzal, M.Z., Capobianco, S., Malik, M.I., Marinai, S., Breuel, T.M., Dengel, A., et al.: Deepdocclassifier: document classification with deep convolutional neural network. In: Proceedings of International Conference Document Analysis and Recognition, ICDAR, pp. 1111–1115. IEEE (2015)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017)

    Article  Google Scholar 

  11. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings - 30th IEEE Conference oComputer Vision Pattern Recognition, CVPR 2017, 2017-January, pp. 2642–2651 (2017)

    Google Scholar 

  12. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–304 (2017)

    Article  Google Scholar 

  13. Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  14. Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, pp. 10544–10553 (2019)

    Google Scholar 

  15. Noce, L., Gallo, I., Zamberletti, A., Calefati, A.: Embedded textual content for document image classification with convolutional neural networks. In: DocEng 2016 - Proceedings 2016 ACM Symposium Document Engineering, pp. 165–73 (2016)

    Google Scholar 

  16. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference North America Chapter Association Computer Linguistius Human Language Technology - Proceedings Conference, pp. 1:4171–1:4186 (2019)

    Google Scholar 

  17. Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, 2015-November, pp. 991–995 (2015)

    Google Scholar 

  18. Bao, H., Dong, L., Wei, F., Wang, W., Yang, N., Liu, X., et al.: Unilmv2: pseudo-masked language models for unified language model pre-Training. In: 37th International Conference on Machine Learning, ICML 2020, Part F16814, pp. 619–629 (2020)

    Google Scholar 

  19. Tensmeyer, C., Martinez, T.: Analysis of convolutional neural networks for document image classification. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, vol. 1, pp. 388–393 (2017)

    Google Scholar 

  20. Afzal, M.Z., Kolsch, A., Ahmed, S., Liwicki, M.: Cutting the error by half: investigation of very deep cnn and advanced training strategies for document image classification. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, vol. 1, pp. 883–888 (2017)

    Google Scholar 

  21. Das, A., Roy, S., Bhattacharya, U., Parui, S.K.: Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: Proceedings - International Conference on Pattern Recognition, 2018-August, pp. 3180–3185 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiong, Y., Dai, Z., Liu, Y., Ding, X. (2021). Document Image Classification Method Based on Graph Convolutional Network. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92185-9_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92184-2

  • Online ISBN: 978-3-030-92185-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics