MCKIE: Multi-class Key Information Extraction from Complex Documents Based on Graph Convolutional Network

Huang, Zhicai; Xiao, Shunxin; Wang, Da-Han; Zhu, Shunzhi

doi:10.1007/978-981-99-8540-1_8

Zhicai Huang^15,17,
Shunxin Xiao^16,17,
Da-Han Wang^16,17 &
…
Shunzhi Zhu^16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14431))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

345 Accesses

Abstract

The majority of key information extraction in document analysis work relies on simple layout scenes with few classes, such as the date and amount on invoices or receipts. However, many document applications entail sophisticated layouts with various types of elements. The mortgage contract agreement, for example, often has over ninety entities, and it is expensive to design template for each document. To this end, we propose an efficient multi-class key information extraction (MCKIE) method based on graph convolutional network. In detail, we design a graph construction strategy to generate a text box graph. Then, MCKIE utilizes the message-passing mechanism to learn node representation by aggregating information from neighborhoods. Besides, we compare the performance of various graph construction methods and verify the effectiveness of MCKIE on the realistic contract document dataset. Extensive experimental results show that the proposed model significantly outperforms other baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boroş, E., et al.: A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 79–84. IEEE (2020)
Google Scholar
Carbonell, M., Fornés, A., Villegas, M., Lladós, J.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219–227 (2020)
Google Scholar
D’Andecy, V.P., Hartmann, E., Rusinol, M.: Field extraction by hybrid incremental and a-priori structural templates. In: Proceedings of the 13th IAPR International Workshop on Document Analysis Systems (DAS 2018), pp. 251–256. Institute of Electrical and Electronics Engineers Inc. (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 1, pp. 4171–4186. Association for Computational Linguistics (ACL) (2019)
Google Scholar
Gemelli, A., Biswas, S., Civitelli, E., Lladós, J., Marinai, S.: Doc2Graph: a task agnostic document understanding framework based on graph neural networks. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022, vol. 13804, pp. 329–344. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25069-9_22
Gui, T., et al.: A lexicon-based graph neural network for chinese ner. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1040–1050 (2019)
Google Scholar
Guo, H., Qin, X., Liu, J., Han, J., Liu, J., Ding, E.: Eaten: entity-aware attention for single shot visual text extraction. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. pp. 254–259. IEEE Computer Society (2019)
Google Scholar
Hong, T., Kim, D., Ji, M., Hwang, W., Nam, D., Park, S.: Bros: a pre-trained language model focusing on text and layout for better key information extraction from documents. Proc. AAAI Conf. Artif. Intell. 36, 10767–10775 (2022)
Google Scholar
Hwang, W., et al.: Post-OCR parsing: building simple and robust parser via bio tagging. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)
Google Scholar
Kumar, S., Gupta, R., Khanna, N., Chaudhury, S., Joshi, S.D.: Text extraction and document image segmentation using matched wavelets and MRF model. In: IEEE Transactions on Image Processing, vol. 16, pp. 2117–2128. Institute of Electrical and Electronics Engineers Inc. (2007)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2016), pp. 260–270. Association for Computational Linguistics (ACL) (2016)
Google Scholar
Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 2, pp. 32–39. Association for Computational Linguistics (ACL) (2019)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016) - Long Papers. vol. 2, pp. 1064–1074. Association for Computational Linguistics (ACL) (2016)
Google Scholar
Medvet, E., Bartoli, A., Davanzo, G.: A probabilistic approach to printed document understanding. Int. J. Document Anal. Recogn. 14(4), 335–347 (2011). https://doi.org/10.1007/s10032-010-0137-1
Qian, Y., Santus, E., Jin, Z., Guo, J., Barzilay, R.: Graphie: a graph-based framework for information extraction. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 1, pp. 751–761. Association for Computational Linguistics (ACL) (2019)
Google Scholar
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 13525–13534. IEEE Computer Society (2020)
Google Scholar
Rusinol, M., Benkhelfallah, T., Dandecy, V.P.: Field extraction from administrative documents by incremental structural templates. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1100–1104. IEEE Computer Society (2013)
Google Scholar
Tang, Z., et al.: Unifying vision, text, and layout for universal document processing. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19254–19264 (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need 30 (2017)
Google Scholar
Wei, M., He, Y., Zhang, Q.: Robust layout-aware IE for visually rich documents with pre-trained language models. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), pp. 2367–2376. Association for Computing Machinery, Inc. (2020)
Google Scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1192–1200. Association for Computing Machinery (2020)
Google Scholar
Xu, Y., et al.: Layoutxlm: multimodal pre-training for multilingual visually-rich document understanding (2021)
Google Scholar
Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: Pick: processing key information extraction from documents using improved graph learning-convolutional networks (2020)
Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (No. 61773325), Natural Science Foundation of Xiamen, China (No. 3502Z20227319), Industry-University Cooperation Project of Fujian Science and Technology Department (No. 2021H6035), Fujian Key Technological Innovation and Industrialization Projects (No. 2023XQ023), and Fu-Xia-Quan National Independent Innovation Demonstration Project (No. 2022FX4).

Author information

Authors and Affiliations

College of Information and Smart Electromechanical Engineering, Xiamen Huaxia University, Xiamen, 361024, China
Zhicai Huang
School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, China
Shunxin Xiao, Da-Han Wang & Shunzhi Zhu
Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen, 361024, China
Zhicai Huang, Shunxin Xiao, Da-Han Wang & Shunzhi Zhu

Authors

Zhicai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shunxin Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Da-Han Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shunzhi Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Da-Han Wang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Z., Xiao, S., Wang, DH., Zhu, S. (2024). MCKIE: Multi-class Key Information Extraction from Complex Documents Based on Graph Convolutional Network. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14431. Springer, Singapore. https://doi.org/10.1007/978-981-99-8540-1_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-8540-1_8
Published: 25 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8539-5
Online ISBN: 978-981-99-8540-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MCKIE: Multi-class Key Information Extraction from Complex Documents Based on Graph Convolutional Network