Robustness Evaluation of Transformer-Based Form Field Extractors via Form Attacks

Xue, Le; Gao, Mingfei; Chen, Zeyuan; Xiong, Caiming; Xu, Ran

doi:10.1007/978-3-031-41679-8_10

Le Xue¹¹,
Mingfei Gao¹¹,
Zeyuan Chen¹¹,
Caiming Xiong¹¹ &
…
Ran Xu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14188))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1451 Accesses
3 Citations

Abstract

We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks. We introduce 14 novel form transformations to evaluate the vulnerability of the state-of-the-art field extractors against form attacks from both OCR level and form level, including OCR location/order rearrangement, form background manipulation and form field-value augmentation. We conduct robustness evaluation using real invoices and receipts, and perform comprehensive research analysis. Experimental results suggest that the evaluated models are very susceptible to form perturbations such as the variation of field-values ($\sim $ 15% drop in F1 score), the disarrangement of input text order($\sim $ 15% drop in F1 score) and the disruption of the neighboring words of field-values($\sim $ 10% drop in F1 score). Guided by the analysis, we make recommendations to improve the design of field extractors and the process of data collection. Code will be available at here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Curse of Small Domains: New Attacks on Format-Preserving Encryption

Semi-structured data protection scheme based on robust watermarking

Article Open access 16 March 2020

A Generalized Format Preserving Encryption Framework Using MDS Matrices

Article 01 March 2019

Notes

1.
We utilize the implementation of the string typos provided in https://pypi.org/project/typo/.
2.
We generate word synonyms using WordNet Interface (https://www.nltk.org/howto/wordnet.html).
3.
https://faker.readthedocs.io/en/master/.
4.
https://api.einstein.ai/signup.
5.
https://github.com/tesseract-ocr/tesseract.
6.
We observe that different orders of transformations in a combination result in ignorable differences.

References

Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 993–1003 (2021)
Google Scholar
Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP (2013)
Google Scholar
Denk, T.I., Reisswig, C.: BertGrid: contextualized embedding for 2d document representation and understanding. In: NeurIPS Workshop (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Google Scholar
Goel, K., et al.: Robustness gym: Unifying the NLP evaluation landscape. arXiv preprint arXiv:2101.04840 (2021)
Hong, T., Kim, D., Ji, M., Hwang, W., Nam, D., Park, S.: Bros: a pre-trained language model focusing on text and layout for better key information extraction from documents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10767–10775 (2022)
Google Scholar
Huang, Z., et al.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: ICDAR (2019)
Google Scholar
Katti, A.R., et al.: CharGrid: towards understanding 2d documents. In: EMNLP (2018)
Google Scholar
Kiela, D., et al.: Dynabench: rethinking benchmarking in NLP. In: NAACL (2021)
Google Scholar
Kim, G., et al.: OCR-free document understanding transformer. In: European Conference on Computer Vision. Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp. 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Ma, E.: NLP augmentation (2019). http://github.com/makcedward/nlpaug
Majumder, B.P., Potti, N., Tata, S., Wendt, J.B., Zhao, Q., Najork, M.: Representation learning for information extraction from form-like documents. In: ACL (2020)
Google Scholar
Morris, J.X., Lifland, E., Yoo, J.Y., Qi, Y.: TextAttack: a framework for adversarial attacks in natural language processing. arXiv preprint arXiv:2005.05909 (2020)
Palm, R.B., Laws, F., Winther, O.: Attend, copy, parse end-to-end information extraction from documents. In: ICDAR (2019)
Google Scholar
Salman, H., Ilyas, A., Engstrom, L., Kapoor, A., Madry, A.: Do adversarially robust imagenet models transfer better? In: NeurIPS (2020)
Google Scholar
Santurkar, S., Tsipras, D., Madry, A.: Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859 (2020)
Schuster, D., et al.: Intellix - end-user trained information extraction for document archiving. In: ICDAR (2013)
Google Scholar
Schuster, D., et al.: Intellix-end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. IEEE (2013)
Google Scholar
Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., Schmidt, L.: Measuring robustness to natural distribution shifts in image classification. In: NeurIPS (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, Z., Zhan, M., Liu, X., Liang, D.: DocStruct: a multimodal method to extract hierarchy structure in document for general form understanding. In: EMNLP (2020)
Google Scholar
Wu, T., Ribeiro, M.T., Heer, J., Weld, D.S.: Errudite: Scalable, reproducible, and testable error analysis. In: ACL (2019)
Google Scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: KDD (2020)
Google Scholar
Zeng, G., et al.: OpenAttack: an open-source textual adversarial attack toolkit. arXiv preprint arXiv:2009.09191 (2020)
Zhang, Z., et al.: Layout-aware information extraction for document-grounded dialogue: dataset, method and demonstration. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7252–7260 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Salesforce AI, Palo Alto, USA
Le Xue, Mingfei Gao, Zeyuan Chen, Caiming Xiong & Ran Xu

Authors

Le Xue
View author publications
You can also search for this author in PubMed Google Scholar
Mingfei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zeyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Caiming Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Ran Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Le Xue .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, L., Gao, M., Chen, Z., Xiong, C., Xu, R. (2023). Robustness Evaluation of Transformer-Based Form Field Extractors via Form Attacks. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-41679-8_10
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41678-1
Online ISBN: 978-3-031-41679-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Robustness Evaluation of Transformer-Based Form Field Extractors via Form Attacks