Document Information Extraction via Global Tagging

He, Shaojie; Wang, Tianshu; Lu, Yaojie; Lin, Hongyu; Han, Xianpei; Sun, Yingfei; Sun, Le

doi:10.1007/978-981-99-6207-5_9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

272 Accesses

Abstract

Document Information Extraction (DIE) is a crucial task for extracting key information from visually-rich documents. The typical pipeline approach for this task involves Optical Character Recognition (OCR), serializer, Semantic Entity Recognition (SER), and Relation Extraction (RE) modules. However, this pipeline presents significant challenges in real-world scenarios due to issues such as unnatural text order and error propagation between different modules. To address these challenges, we propose a novel tagging-based method – Global TaggeR (GTR), which converts the original sequence labeling task into a token relation classification task. This approach globally links discontinuous semantic entities in complex layouts, and jointly extracts entities and relations from documents. In addition, we design a joint training loss and a joint decoding strategy for SER and RE tasks based on GTR. Our experiments on multiple datasets demonstrate that GTR not only mitigates the issue of text in the wrong order but also improves RE performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021, pp. 973–983. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00103
Cao, H., et al.: GMN: generative multi-modal network for practical document information extraction. In: Carpuat, M., de Marneffe, M., Ruíz, I.V.M. (eds.) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, 10–15 July 2022, pp. 3768–3778. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.naacl-main.276
Cheng, Z., et al.: TRIE++: towards end-to-end information extraction from visually rich documents. CoRR (2022). https://doi.org/10.48550/arXiv.2207.06744
Davis, B.L., Morse, B.S., Price, B.L., Tensmeyer, C., Wigington, C., Morariu, V.I.: End-to-end document recognition and understanding with Dessurt. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision - ECCV 2022 Workshops - Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part IV. LNCS, vol. 13804, pp. 280–296. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25069-9_19
Denk, T.I., Reisswig, C.: BERTgrid: contextualized embedding for 2D document representation and understanding. In: Workshop on Document Intelligence at NeurIPS 2019 (2019). https://openreview.net/forum?id=H1gsGaq9US
Hong, T., Kim, D., Ji, M., Hwang, W., Nam, D., Park, S.: BROS: a pre-trained language model focusing on text and layout for better key information extraction from documents. Proc. AAAI Conf. Artif. Intell. 36(10), 10767–10775 (2022). https://doi.org/10.1609/aaai.v36i10.21322, https://ojs.aaai.org/index.php/AAAI/article/view/21322
Hwang, W., Lee, H., Yim, J., Kim, G., Seo, M.: Cost-effective end-to-end information extraction for semi-structured document images. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event/Punta Cana, Dominican Republic, 7–11 November 2021, pp. 3375–3383. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.emnlp-main.271
Hwang, W., Yim, J., Park, S., Yang, S., Seo, M.: Spatial dependency parsing for semi-structured document information extraction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 330–343. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.findings-acl.28, https://aclanthology.org/2021.findings-acl.28
Jaume, G., Kemal Ekenel, H., Thiran, J.P.: FUNSD: a dataset for form understanding in noisy scanned documents. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 2, pp. 1–6 (2019). https://doi.org/10.1109/ICDARW.2019.10029
Kim, G., et al.: OCR-free document understanding transformer. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022–17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXVIII. LNCS, vol. 13688, pp. 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29
Lee, C., et al.: FormNet: structural encoding beyond sequential modeling in form document information extraction. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), ACL 2022, Dublin, Ireland, 22–27 May 2022, pp. 3735–3754. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.acl-long.260
Li, C., et al.: StructuralLM: structural pre-training for form understanding. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (vol. 1: Long Papers), Virtual Event, 1–6 August 2021, pp. 6309–6318. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.acl-long.493
Li, Q., et al.: Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE ACM Trans. Audio Speech Lang. Process. 30, 520–533 (2022). https://doi.org/10.1109/TASLP.2021.3138670
Li, Y., et al.: Structext: structured text understanding with multi-modal transformers. In: Shen, H.T., et al. (eds.) MM 2021: ACM Multimedia Conference, Virtual Event, China, 20–24 October 2021, pp. 1912–1920. ACM (2021). https://doi.org/10.1145/3474085.3475345
Mathur, P., et al.: LayerDOC: layer-wise extraction of spatial hierarchical structure in visually-rich documents. In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, HI, USA, 2–7 January 2023, pp. 3599–3609. IEEE (2023). https://doi.org/10.1109/WACV56688.2023.00360
Su, J., Lu, Y., Pan, S., Wen, B., Liu, Y.: RoFormer: enhanced transformer with rotary position embedding. CoRR abs/2104.09864 (2021). https://arxiv.org/abs/2104.09864
Su, J., et al.: Global pointer: novel efficient span-based approach for named entity recognition. CoRR abs/2208.03054 (2022). https://doi.org/10.48550/arXiv.2208.03054
Sun, K., Zhang, R., Mensah, S., Mao, Y., Liu, X.: Learning implicit and explicit multi-task interactions for information extraction. ACM Trans. Inf. Syst. 41(2), 27:1–27:29 (2023). https://doi.org/10.1145/3533020
Wang, J., Jin, L., Ding, K.: LiLT: a simple yet effective language-independent layout transformer for structured document understanding. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), ACL 2022, Dublin, Ireland, 22–27 May 2022, pp. 7747–7757. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.acl-long.534
Wei, M., He, Y., Zhang, Q.: Robust layout-aware IE for visually rich documents with pre-trained language models. In: Huang, J.X., et al. (eds.) Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Virtual Event, China, 25–30 July 2020, pp. 2367–2376. ACM (2020). https://doi.org/10.1145/3397271.3401442
Wu, Z., Ying, C., Zhao, F., Fan, Z., Dai, X., Xia, R.: Grid tagging scheme for aspect-oriented fine-grained opinion extraction. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2576–2585. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.234, https://aclanthology.org/2020.findings-emnlp.234
Xu, Y., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 1: Long Papers), pp. 2579–2591. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.201, https://aclanthology.org/2021.acl-long.201
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2020, pp. 1192–1200. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394486.3403172
Xu, Y., et al.: XFUND: a benchmark dataset for multilingual visually rich form understanding. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 3214–3224. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.findings-acl.253, https://aclanthology.org/2022.findings-acl.253
Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In: 25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event/Milan, Italy, 10–15 January 2021, pp. 4363–4370. IEEE (2020). https://doi.org/10.1109/ICPR48806.2021.9412927
Zhang, Z., et al.: Layout-aware information extraction for document-grounded dialogue: dataset, method and demonstration. In: Magalhães, J., et al. (eds.) MM 2022: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022, pp. 7252–7260. ACM (2022). https://doi.org/10.1145/3503161.3548765

Download references

Acknowledgements

We sincerely thank the reviewers for their insightful comments and valuable suggestions. This research work is supported by the National Natural Science Foundation of China under Grants no. U1936207, 62122077 and 62106251.

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, 101408, China
Shaojie He & Yingfei Sun
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Shaojie He, Tianshu Wang, Yaojie Lu, Hongyu Lin, Xianpei Han & Le Sun

Authors

Shaojie He
View author publications
You can also search for this author in PubMed Google Scholar
Tianshu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yaojie Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xianpei Han
View author publications
You can also search for this author in PubMed Google Scholar
Yingfei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Le Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hongyu Lin or Xianpei Han .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Bing Qin
Fudan University, Shanghai, China
Xipeng Qiu
School of Computing and Information, Singapore Management University, Singapore, Singapore
Jiang Jing
Institute of Software, Chinese Academy of Sciences, Beijing, China
Xianpei Han
Beijing Language and Culture University, Beijing, China
Gaoqi Rao
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Yubo Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, S. et al. (2023). Document Information Extraction via Global Tagging. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-6207-5_9
Published: 20 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics