Skip to main content

MCKIE: Multi-class Key Information Extraction from Complex Documents Based on Graph Convolutional Network

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14431))

Included in the following conference series:

  • 345 Accesses

Abstract

The majority of key information extraction in document analysis work relies on simple layout scenes with few classes, such as the date and amount on invoices or receipts. However, many document applications entail sophisticated layouts with various types of elements. The mortgage contract agreement, for example, often has over ninety entities, and it is expensive to design template for each document. To this end, we propose an efficient multi-class key information extraction (MCKIE) method based on graph convolutional network. In detail, we design a graph construction strategy to generate a text box graph. Then, MCKIE utilizes the message-passing mechanism to learn node representation by aggregating information from neighborhoods. Besides, we compare the performance of various graph construction methods and verify the effectiveness of MCKIE on the realistic contract document dataset. Extensive experimental results show that the proposed model significantly outperforms other baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boroş, E., et al.: A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 79–84. IEEE (2020)

    Google Scholar 

  2. Carbonell, M., Fornés, A., Villegas, M., Lladós, J.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219–227 (2020)

    Google Scholar 

  3. D’Andecy, V.P., Hartmann, E., Rusinol, M.: Field extraction by hybrid incremental and a-priori structural templates. In: Proceedings of the 13th IAPR International Workshop on Document Analysis Systems (DAS 2018), pp. 251–256. Institute of Electrical and Electronics Engineers Inc. (2018)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 1, pp. 4171–4186. Association for Computational Linguistics (ACL) (2019)

    Google Scholar 

  5. Gemelli, A., Biswas, S., Civitelli, E., Lladós, J., Marinai, S.: Doc2Graph: a task agnostic document understanding framework based on graph neural networks. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022, vol. 13804, pp. 329–344. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25069-9_22

  6. Gui, T., et al.: A lexicon-based graph neural network for chinese ner. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1040–1050 (2019)

    Google Scholar 

  7. Guo, H., Qin, X., Liu, J., Han, J., Liu, J., Ding, E.: Eaten: entity-aware attention for single shot visual text extraction. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. pp. 254–259. IEEE Computer Society (2019)

    Google Scholar 

  8. Hong, T., Kim, D., Ji, M., Hwang, W., Nam, D., Park, S.: Bros: a pre-trained language model focusing on text and layout for better key information extraction from documents. Proc. AAAI Conf. Artif. Intell. 36, 10767–10775 (2022)

    Google Scholar 

  9. Hwang, W., et al.: Post-OCR parsing: building simple and robust parser via bio tagging. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)

    Google Scholar 

  10. Kumar, S., Gupta, R., Khanna, N., Chaudhury, S., Joshi, S.D.: Text extraction and document image segmentation using matched wavelets and MRF model. In: IEEE Transactions on Image Processing, vol. 16, pp. 2117–2128. Institute of Electrical and Electronics Engineers Inc. (2007)

    Google Scholar 

  11. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2016), pp. 260–270. Association for Computational Linguistics (ACL) (2016)

    Google Scholar 

  12. Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 2, pp. 32–39. Association for Computational Linguistics (ACL) (2019)

    Google Scholar 

  13. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016) - Long Papers. vol. 2, pp. 1064–1074. Association for Computational Linguistics (ACL) (2016)

    Google Scholar 

  14. Medvet, E., Bartoli, A., Davanzo, G.: A probabilistic approach to printed document understanding. Int. J. Document Anal. Recogn. 14(4), 335–347 (2011). https://doi.org/10.1007/s10032-010-0137-1

  15. Qian, Y., Santus, E., Jin, Z., Guo, J., Barzilay, R.: Graphie: a graph-based framework for information extraction. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), vol. 1, pp. 751–761. Association for Computational Linguistics (ACL) (2019)

    Google Scholar 

  16. Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 13525–13534. IEEE Computer Society (2020)

    Google Scholar 

  17. Rusinol, M., Benkhelfallah, T., Dandecy, V.P.: Field extraction from administrative documents by incremental structural templates. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1100–1104. IEEE Computer Society (2013)

    Google Scholar 

  18. Tang, Z., et al.: Unifying vision, text, and layout for universal document processing. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19254–19264 (2023)

    Google Scholar 

  19. Vaswani, A., et al.: Attention is all you need 30 (2017)

    Google Scholar 

  20. Wei, M., He, Y., Zhang, Q.: Robust layout-aware IE for visually rich documents with pre-trained language models. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), pp. 2367–2376. Association for Computing Machinery, Inc. (2020)

    Google Scholar 

  21. Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1192–1200. Association for Computing Machinery (2020)

    Google Scholar 

  22. Xu, Y., et al.: Layoutxlm: multimodal pre-training for multilingual visually-rich document understanding (2021)

    Google Scholar 

  23. Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: Pick: processing key information extraction from documents using improved graph learning-convolutional networks (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (No. 61773325), Natural Science Foundation of Xiamen, China (No. 3502Z20227319), Industry-University Cooperation Project of Fujian Science and Technology Department (No. 2021H6035), Fujian Key Technological Innovation and Industrialization Projects (No. 2023XQ023), and Fu-Xia-Quan National Independent Innovation Demonstration Project (No. 2022FX4).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Da-Han Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Z., Xiao, S., Wang, DH., Zhu, S. (2024). MCKIE: Multi-class Key Information Extraction from Complex Documents Based on Graph Convolutional Network. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14431. Springer, Singapore. https://doi.org/10.1007/978-981-99-8540-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8540-1_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8539-5

  • Online ISBN: 978-981-99-8540-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics