Abstract
In document analysis, text document images classification is a challenging task in several fields of application, such as archiving old documents, administrative procedures, or security. In this context, visual appearance has been widely used for document classification and considered as a useful and relevant features for the classification. However, visual information is insufficient to achieve higher classification rates, where relevant additional features, including textual features can be leveraged to improve classification results. In this paper, we propose a multi-view deep representation learning which allows combining textual and visual-based information respectively measured through the text and visual document images. The multi-view deep representation learning is designed to find a deeply shared representation between textual and visual features by fusing them into a joint latent space where a classifier model is trained to classify the document images. Our experimental results demonstrate the ability of the proposed model to outperform competitive approaches and to produce promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afzal, M.Z., Kölsch, A., Ahmed, S., Liwicki, M.: Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 883–888. IEEE (2017)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Computat. Linguist. 5, 135–146 (2017)
Chen, S., He, Y., Sun, J., Naoi, S.: Structured document classification by matching local salient features. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp. 653–656. IEEE (2012)
Do, T.H., Ramos Terrades, O., Tabbone, S.: DSD: document sparse-based denoising algorithm. Pattern Anal. Appl. 22(1), 177–186 (2018). https://doi.org/10.1007/s10044-018-0714-3
Fesseha, A., Xiong, S., Emiru, E.D., Diallo, M., Dahou, A.: Text classification based on convolutional neural networks and word embedding for low-resource languages: Tigrinya. Information 12(2), 52 (2021)
Guo, J., et al.: GluonCV and GluonnLP: deep learning in computer vision and natural language processing. J. Mach. Learn. Res. 21(23), 1–7 (2020)
Hanachi, R., Sellami, A., Farah, I.R.: Interpretation of human behavior from multi-modal brain MRI images based on graph deep neural networks and attention mechanism. In: 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021), vol. 12, pp. 56–66. SCITEPRESS (2021)
Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995. IEEE (2015)
Jayasanthi, M., Rajendran, G., Vidhyakar, R.: Independent component analysis with learning algorithm for electrocardiogram feature extraction and classification. Signal Image Video Process. 15, 391–399 (2021)
Kang, L., Kumar, J., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for document image classification. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 3168–3172. IEEE (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Kumar, J., Ye, P., Doermann, D.: Learning document structure for retrieval and classification. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp. 1558–1561. IEEE (2012)
Kumar, J., Ye, P., Doermann, D.: Structural similarity for document image classification and retrieval. Pattern Recogn. Lett. 43, 119–126 (2014)
Noce, L., Gallo, I., Zamberletti, A., Calefati, A.: Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM Symposium on Document Engineering, pp. 165–173 (2016)
Patil, P.B., Ijeri, D.M.: Classification of text documents. In: Chiplunkar, N.N., Fukao, T. (eds.) Advances in Artificial Intelligence and Data Engineering. AISC, vol. 1133, pp. 675–685. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-3514-7_51
Shah, A., Chauhan, Y., Chaudhury, B.: Principal component analysis based construction and evaluation of cryptocurrency index. Expert Syst. Appl. 163, 113796 (2021)
Shin, C., Doermann, D., Rosenfeld, A.: Classification of document pages using structure-based features. Int. J. Doc. Anal. Recogn. (IJDAR) 3(4), 232–247 (2001)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv 2019. arXiv preprint arXiv:1905.11946 (2020)
Tensmeyer, C., Martinez, T.: Analysis of convolutional neural networks for document image classification. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 388–393. IEEE (2017)
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5315–5324 (2017)
Zhang, Y., Roller, S., Wallace, B.: MGNC-CNN: a simple approach to exploiting multiple word embeddings for sentence classification. arXiv preprint arXiv:1603.00968 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sellami, A., Tabbone, S. (2021). EDNets: Deep Feature Learning for Document Image Classification Based on Multi-view Encoder-Decoder Neural Networks. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-86337-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)