Abstract
The majority of key information extraction in document analysis work relies on simple layout scenes with few classes, such as the date and amount on invoices or receipts. However, many document applications entail sophisticated layouts with various types of elements. The mortgage contract agreement, for example, often has over ninety entities, and it is expensive to design template for each document. To this end, we propose an efficient multi-class key information extraction (MCKIE) method based on graph convolutional network. In detail, we design a graph construction strategy to generate a text box graph. Then, MCKIE utilizes the message-passing mechanism to learn node representation by aggregating information from neighborhoods. Besides, we compare the performance of various graph construction methods and verify the effectiveness of MCKIE on the realistic contract document dataset. Extensive experimental results show that the proposed model significantly outperforms other baseline models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boroş, E., et al.: A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 79–84. IEEE (2020)
Carbonell, M., Fornés, A., Villegas, M., Lladós, J.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219–227 (2020)
D’Andecy, V.P., Hartmann, E., Rusinol, M.: Field extraction by hybrid incremental and a-priori structural templates. In: Proceedings of the 13th IAPR International Workshop on Document Analysis Systems (DAS 2018), pp. 251–256. Institute of Electrical and Electronics Engineers Inc. (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 1, pp. 4171–4186. Association for Computational Linguistics (ACL) (2019)
Gemelli, A., Biswas, S., Civitelli, E., Lladós, J., Marinai, S.: Doc2Graph: a task agnostic document understanding framework based on graph neural networks. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022, vol. 13804, pp. 329–344. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25069-9_22
Gui, T., et al.: A lexicon-based graph neural network for chinese ner. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1040–1050 (2019)
Guo, H., Qin, X., Liu, J., Han, J., Liu, J., Ding, E.: Eaten: entity-aware attention for single shot visual text extraction. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. pp. 254–259. IEEE Computer Society (2019)
Hong, T., Kim, D., Ji, M., Hwang, W., Nam, D., Park, S.: Bros: a pre-trained language model focusing on text and layout for better key information extraction from documents. Proc. AAAI Conf. Artif. Intell. 36, 10767–10775 (2022)
Hwang, W., et al.: Post-OCR parsing: building simple and robust parser via bio tagging. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)
Kumar, S., Gupta, R., Khanna, N., Chaudhury, S., Joshi, S.D.: Text extraction and document image segmentation using matched wavelets and MRF model. In: IEEE Transactions on Image Processing, vol. 16, pp. 2117–2128. Institute of Electrical and Electronics Engineers Inc. (2007)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2016), pp. 260–270. Association for Computational Linguistics (ACL) (2016)
Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 2, pp. 32–39. Association for Computational Linguistics (ACL) (2019)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016) - Long Papers. vol. 2, pp. 1064–1074. Association for Computational Linguistics (ACL) (2016)
Medvet, E., Bartoli, A., Davanzo, G.: A probabilistic approach to printed document understanding. Int. J. Document Anal. Recogn. 14(4), 335–347 (2011). https://doi.org/10.1007/s10032-010-0137-1
Qian, Y., Santus, E., Jin, Z., Guo, J., Barzilay, R.: Graphie: a graph-based framework for information extraction. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 1, pp. 751–761. Association for Computational Linguistics (ACL) (2019)
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 13525–13534. IEEE Computer Society (2020)
Rusinol, M., Benkhelfallah, T., Dandecy, V.P.: Field extraction from administrative documents by incremental structural templates. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1100–1104. IEEE Computer Society (2013)
Tang, Z., et al.: Unifying vision, text, and layout for universal document processing. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19254–19264 (2023)
Vaswani, A., et al.: Attention is all you need 30 (2017)
Wei, M., He, Y., Zhang, Q.: Robust layout-aware IE for visually rich documents with pre-trained language models. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), pp. 2367–2376. Association for Computing Machinery, Inc. (2020)
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1192–1200. Association for Computing Machinery (2020)
Xu, Y., et al.: Layoutxlm: multimodal pre-training for multilingual visually-rich document understanding (2021)
Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: Pick: processing key information extraction from documents using improved graph learning-convolutional networks (2020)
Acknowledgements
This work is supported by National Natural Science Foundation of China (No. 61773325), Natural Science Foundation of Xiamen, China (No. 3502Z20227319), Industry-University Cooperation Project of Fujian Science and Technology Department (No. 2021H6035), Fujian Key Technological Innovation and Industrialization Projects (No. 2023XQ023), and Fu-Xia-Quan National Independent Innovation Demonstration Project (No. 2022FX4).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, Z., Xiao, S., Wang, DH., Zhu, S. (2024). MCKIE: Multi-class Key Information Extraction from Complex Documents Based on Graph Convolutional Network. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14431. Springer, Singapore. https://doi.org/10.1007/978-981-99-8540-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-8540-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8539-5
Online ISBN: 978-981-99-8540-1
eBook Packages: Computer ScienceComputer Science (R0)