Enhanced Named Entity Recognition algorithm for financial document verification

Toprak, Ahmet; Turan, Metin

doi:10.1007/s11227-023-05371-4

Enhanced Named Entity Recognition algorithm for financial document verification

Published: 28 May 2023

Volume 79, pages 19431–19451, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ahmet Toprak¹ &
Metin Turan¹

459 Accesses
Explore all metrics

Abstract

Many enterprise systems are document-intensive and require extensive manual verification. The verification process has challenge in terms of time and remaining bugs. A general automatic or semi-automatic document verification system would be useful. However, as the nature of the natural language, the context is an important factor. In this research, the target context is selected to be the financial documents, which have been highly interested recently. An automatic document verification model based on only entities (mostly faced within financial documents) was experimented. The summary report was verified with original documents, such that entities in the summary were searched for matching in the original documents. Verification process success was evaluated by comparison of the named entity algorithms in the literature. The special Kaggle data set ready for this purpose was used for entity matching from the summary within the original documents. The average document verification accuracy of named entity finding algorithms for only financial type documents was 85.36%, where the proposed entity recognition algorithm reached 88.80%. On the other hand, the average document verification time of the experimented algorithms and the developed algorithm is 2.43 and 2.48 s respectively. As a conclusion, when both the BERT-base-cased classification model and rule-based approaches are applied specific to the context, it enhances the entity verification process with an insignificant time cost. Consequently, even we used limited data and rules, it is seen that there exists opportunity to automatize the document verification process with the support of both the BERT-base-cased classification model and rule-based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Named Entity Recognition for Information Extraction from Italian Procurement Documents: A Case Study

Comparison of Named Entity Recognition Methods on Real-World and Highly Imbalanced Business Document Datasets

Research on BERT-Based Text Entity Recognition Model for Customs Anti-smuggling

Data availability

Not Applicable.

Code availability

The code will be available through a GitHub repository.

References

Ando T, Yatsu H, Hisazumi K, et al (2015) Reference model of specifications toward independent verification and validation. In: TENCON 2015–2015 IEEE Region 10 Conference, pp 1–3
Babych B, Hartley A (2003) Improving machine translation quality with automatic Named Entity Recognition. In: Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools, Improving MT Through Other Language Technology Tools, Resource and Tools for Building MT at EACL 2003. Budapest https://aclanthology.org/W03-2201
Bassil Y (2012) A trainable summarizer with knowledge acquired from robust NLP techniques. Int J Res Rev Comput Sci (IJRRCS) 3(1):2079–2557
Google Scholar
Bensefia A, Paquet T, Heutte L (2005) A writer identification and verification system. Pattern Recognit Lett 26(13):2080–2092
Article MATH Google Scholar
Beusekom JV, Shafait F (2011) Distortion measurement for automatic document verification. In: 2011 International Conference on Document Analysis and Recognition, pp 289–293
Cheng P, Erk K (2020) Attending to entities for better text understanding. Proc AAAI Conf Artif Intell 34(5):7554–7561
Google Scholar
Elkasrawi S, Dengel A, Abdelsamad A, et al (2016) What you see is what you get? Automatic image verification for online news content. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp 114–119
Etzioni O, Cafarella M, Downey D et al (2005) Unsupervised named-entity extraction from the Web: an experimental study. Artif Intell 165(1):91–134
Article Google Scholar
Garain U, Halder B (2008) On automatic authenticity verification of printed security documents. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp 706–713
Ghanmi N, Awal AM (2018) A new descriptor for pattern matching: application to identity document verification. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp 375–380
Guo J, Xu G, Cheng X, et al (2009) Named Entity Recognition in query. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, Association for Computing Machinery, New York, pp 267–274
Hamad F, Zraqou J, Maaita A, et al (2015) A secure authentication system for ePassport detection and verification. In: 2015 European Intelligence and Security Informatics Conference, pp 173–176
Hassanpour S, O’Connor MJ, Das AK (2011) A framework for the automatic extraction of rules from online text. In: Bassiliades N, Governatori G, Paschke A (eds) Rule-based reasoning, programming, and applications. Springer, Berlin, Heidelberg, pp 266–280
Chapter Google Scholar
Hnoohom N, Chumuang N, Ketcham M (2015) Thai Handwritten verification system on documents for the investigation. In: 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp 617–622
Itcib (2022) Financial Documents Verification (2022). https://itcib.com/financial-documents-verification.html Accessed 28 Dec
Mollá D, van Zaanen M, Smith D (2006) Named Entity Recognition for question answering. In: Proceedings of the Australasian Language Technology Workshop 2006, Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), Sydney, Australia, pp 51–58. https://aclanthology.org/U06-1009
Mridha MF, Lima AA, Nur K et al (2021) A survey of automatic text summarization: Progress. Process and challenges. IEEE Access 9:156043–156070
Article Google Scholar
Nadeau D, Sekine S (2007) A survey of Named Entity Recognition and classification. Lingvist Investig 30(1):3–26
Article Google Scholar
Naman J (2022) NER dataset. https://www.kaggle.com/namanj27/ner-dataset, Accessed 28 Dec 2022
Pariza S (2022) BBC news summary., 2022. https://www.kaggle.com/pariza/bbc-news-summary, Accessed 28 Dec
Petkova D, Croft WB (2007) Proximity-Based document representation for named entity retrieval. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, Association for Computing Machinery, New York, pp 731–740
Poon H, Domingos P (2009) Unsupervised semantic parsing. In: proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, pp 1–10. https://aclanthology.org/D09-1001
Reddy S, Täckström O, Collins M et al (2016) Transforming dependency structures to logical forms for semantic parsing. Trans Assoc Comput Linguist 4:127–140
Article Google Scholar
Roychoudhury S, Bellarykar N, Kulkarni V (2016) A NLP based framework to support document verification-as-a-service. In: 2016 IEEE 20th International Enterprise Distributed Object Computing Conference (EDOC), pp 1–10
Sampaio P, Santos C, Courtias J (2000) About the semantic verification of SMIL documents. In: 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), vol. 3, pp 1675–1678
Sang E, Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent Named Entity Recognition. Proc Seventh Conf Nat Lang Learn HLT-NAACL 2003:142–147
Google Scholar
Takata Y, Nakamura T, Seki H (2004) Accessibility verification of WWW documents by an automatic guideline verification tool. In: 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, p 10
Techopedia (2022) What does spell checker mean., 2017. https://www.techopedia.com/definition/12396/spell-checker, Accessed 28 Dec
Tolosana R, Vera-Rodriguez R, Ortega-Garcia J et al (2015) Preprocessing and feature selection for improved sensor interoperability in online biometric signature verification. IEEE Access 3:478–489
Article Google Scholar
Wang J-H (2011) Web-based verification on the representativeness of terms extracted from single short documents. In: 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 3, pp 114–117
Wu C-H, Huang C-L, Hsu C-S, et al (2007) Speech retrieval using spoken keyword extraction and semantic verification. TENCON 2007–2007 IEEE Region 10 Conference, pp 1–4
Zhang Z, Han X, Liu Z, et al (2019) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, pp 1441–1451

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Engineering, Istanbul Ticaret University, 34000, Istanbul, Turkey
Ahmet Toprak & Metin Turan

Authors

Ahmet Toprak
View author publications
You can also search for this author inPubMed Google Scholar
Metin Turan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

AT contributed to conceptualization, data collection, formal analysis, software development, validation, visualization, writing an original draft, and writing–review editing. MT contributed to conceptualization, data collection, formal analysis, software development, validation, visualization, writing an original draft, and writing–review editing.

Corresponding author

Correspondence to Ahmet Toprak.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval and consent to participate

Not Applicable.

Human and animal Ethics

Not applicable.

Consent for publication

Yes, we consent this paper to be published in the Journal of Intelligent Information System.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Toprak, A., Turan, M. Enhanced Named Entity Recognition algorithm for financial document verification. J Supercomput 79, 19431–19451 (2023). https://doi.org/10.1007/s11227-023-05371-4

Download citation

Accepted: 03 May 2023
Published: 28 May 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11227-023-05371-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced Named Entity Recognition algorithm for financial document verification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Named Entity Recognition for Information Extraction from Italian Procurement Documents: A Case Study

Comparison of Named Entity Recognition Methods on Real-World and Highly Imbalanced Business Document Datasets

Research on BERT-Based Text Entity Recognition Model for Customs Anti-smuggling

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval and consent to participate

Human and animal Ethics

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now