Abstract
Automatic and reliable document image classification is an essential part of high-level business intelligence. Previous studies mainly focus on applying Convolutional Neural Network (CNN)-based methods like GoogLeNet, VGG, ResNet, etc. These methods only rely on visual information of images but textual and layout features are ignored, thereby their performances in document image classification tasks are limited. Using multi-modal content can improve classification performances since most document images found in business systems carry explicit semantic and layout information. This paper presents an innovative method based on the Graph Convolutional Network (GCN) to learn multiple input image features, including visual, textual, and positional features. Compared with the CNN-based methods, the proposed approach can make full use of the multi-modal features of the document image to lead the model competitive with other state-of-the-art methods with much fewer parameters. In addition, the proposed model does not require large-scale pre-training. Experiments show that the proposed method achieves an accuracy of 93.45% on the popular RVL-CDIP document image dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding, pp. 1–16 (2020)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF) original publication. Comput. Vis. Image Underst. 110, 346–359 (2008)
Low, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Confernce Computing Vision, pp. 2564–2571 (2011)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference Learning Representation ICLR 2017 - Conference Track Proceedings, pp. 1–14 (2017)
Barbu, E., Héroux, P., Adam, S., Trupin, É.: Using bags of symbols for automatic indexing of graphical document image databases. In: Liu, W., Lladós, J. (eds.) GREC 2005. LNCS, vol. 3926, pp. 195–205. Springer, Heidelberg (2006). https://doi.org/10.1007/11767978_18
Kumar, J., Prasad, R., Cao, H., Abd-Almageed, W., Doermann, D., Natarajan, P.: Shape codebook based handwritten and machine printed text zone extraction. In: ProcSPIE (2011). https://doi.org/10.1117/12.876725
Kang, L., Kumar, J., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for document image classification. In: 2014 22nd International Conference on Pattern Recognition, p. 3168–3172 (2014)
Afzal, M.Z., Capobianco, S., Malik, M.I., Marinai, S., Breuel, T.M., Dengel, A., et al.: Deepdocclassifier: document classification with deep convolutional neural network. In: Proceedings of International Conference Document Analysis and Recognition, ICDAR, pp. 1111–1115. IEEE (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017)
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings - 30th IEEE Conference oComputer Vision Pattern Recognition, CVPR 2017, 2017-January, pp. 2642–2651 (2017)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–304 (2017)
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, pp. 10544–10553 (2019)
Noce, L., Gallo, I., Zamberletti, A., Calefati, A.: Embedded textual content for document image classification with convolutional neural networks. In: DocEng 2016 - Proceedings 2016 ACM Symposium Document Engineering, pp. 165–73 (2016)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference North America Chapter Association Computer Linguistius Human Language Technology - Proceedings Conference, pp. 1:4171–1:4186 (2019)
Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, 2015-November, pp. 991–995 (2015)
Bao, H., Dong, L., Wei, F., Wang, W., Yang, N., Liu, X., et al.: Unilmv2: pseudo-masked language models for unified language model pre-Training. In: 37th International Conference on Machine Learning, ICML 2020, Part F16814, pp. 619–629 (2020)
Tensmeyer, C., Martinez, T.: Analysis of convolutional neural networks for document image classification. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, vol. 1, pp. 388–393 (2017)
Afzal, M.Z., Kolsch, A., Ahmed, S., Liwicki, M.: Cutting the error by half: investigation of very deep cnn and advanced training strategies for document image classification. In: Proceedings of International Conference on Document Analysis Recognition, ICDAR, vol. 1, pp. 883–888 (2017)
Das, A., Roy, S., Bhattacharya, U., Parui, S.K.: Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: Proceedings - International Conference on Pattern Recognition, 2018-August, pp. 3180–3185 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xiong, Y., Dai, Z., Liu, Y., Ding, X. (2021). Document Image Classification Method Based on Graph Convolutional Network. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-92185-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)