Abstract
We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks. We introduce 14 novel form transformations to evaluate the vulnerability of the state-of-the-art field extractors against form attacks from both OCR level and form level, including OCR location/order rearrangement, form background manipulation and form field-value augmentation. We conduct robustness evaluation using real invoices and receipts, and perform comprehensive research analysis. Experimental results suggest that the evaluated models are very susceptible to form perturbations such as the variation of field-values (\(\sim \) 15% drop in F1 score), the disarrangement of input text order(\(\sim \) 15% drop in F1 score) and the disruption of the neighboring words of field-values(\(\sim \) 10% drop in F1 score). Guided by the analysis, we make recommendations to improve the design of field extractors and the process of data collection. Code will be available at here.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We utilize the implementation of the string typos provided in https://pypi.org/project/typo/.
- 2.
We generate word synonyms using WordNet Interface (https://www.nltk.org/howto/wordnet.html).
- 3.
- 4.
- 5.
- 6.
We observe that different orders of transformations in a combination result in ignorable differences.
References
Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 993–1003 (2021)
Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP (2013)
Denk, T.I., Reisswig, C.: BertGrid: contextualized embedding for 2d document representation and understanding. In: NeurIPS Workshop (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Goel, K., et al.: Robustness gym: Unifying the NLP evaluation landscape. arXiv preprint arXiv:2101.04840 (2021)
Hong, T., Kim, D., Ji, M., Hwang, W., Nam, D., Park, S.: Bros: a pre-trained language model focusing on text and layout for better key information extraction from documents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10767–10775 (2022)
Huang, Z., et al.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: ICDAR (2019)
Katti, A.R., et al.: CharGrid: towards understanding 2d documents. In: EMNLP (2018)
Kiela, D., et al.: Dynabench: rethinking benchmarking in NLP. In: NAACL (2021)
Kim, G., et al.: OCR-free document understanding transformer. In: European Conference on Computer Vision. Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp. 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Ma, E.: NLP augmentation (2019). http://github.com/makcedward/nlpaug
Majumder, B.P., Potti, N., Tata, S., Wendt, J.B., Zhao, Q., Najork, M.: Representation learning for information extraction from form-like documents. In: ACL (2020)
Morris, J.X., Lifland, E., Yoo, J.Y., Qi, Y.: TextAttack: a framework for adversarial attacks in natural language processing. arXiv preprint arXiv:2005.05909 (2020)
Palm, R.B., Laws, F., Winther, O.: Attend, copy, parse end-to-end information extraction from documents. In: ICDAR (2019)
Salman, H., Ilyas, A., Engstrom, L., Kapoor, A., Madry, A.: Do adversarially robust imagenet models transfer better? In: NeurIPS (2020)
Santurkar, S., Tsipras, D., Madry, A.: Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859 (2020)
Schuster, D., et al.: Intellix - end-user trained information extraction for document archiving. In: ICDAR (2013)
Schuster, D., et al.: Intellix-end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. IEEE (2013)
Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., Schmidt, L.: Measuring robustness to natural distribution shifts in image classification. In: NeurIPS (2020)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Wang, Z., Zhan, M., Liu, X., Liang, D.: DocStruct: a multimodal method to extract hierarchy structure in document for general form understanding. In: EMNLP (2020)
Wu, T., Ribeiro, M.T., Heer, J., Weld, D.S.: Errudite: Scalable, reproducible, and testable error analysis. In: ACL (2019)
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: KDD (2020)
Zeng, G., et al.: OpenAttack: an open-source textual adversarial attack toolkit. arXiv preprint arXiv:2009.09191 (2020)
Zhang, Z., et al.: Layout-aware information extraction for document-grounded dialogue: dataset, method and demonstration. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7252–7260 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xue, L., Gao, M., Chen, Z., Xiong, C., Xu, R. (2023). Robustness Evaluation of Transformer-Based Form Field Extractors via Form Attacks. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-41679-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41678-1
Online ISBN: 978-3-031-41679-8
eBook Packages: Computer ScienceComputer Science (R0)