Skip to main content
Log in

Enhanced Named Entity Recognition algorithm for financial document verification

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Many enterprise systems are document-intensive and require extensive manual verification. The verification process has challenge in terms of time and remaining bugs. A general automatic or semi-automatic document verification system would be useful. However, as the nature of the natural language, the context is an important factor. In this research, the target context is selected to be the financial documents, which have been highly interested recently. An automatic document verification model based on only entities (mostly faced within financial documents) was experimented. The summary report was verified with original documents, such that entities in the summary were searched for matching in the original documents. Verification process success was evaluated by comparison of the named entity algorithms in the literature. The special Kaggle data set ready for this purpose was used for entity matching from the summary within the original documents. The average document verification accuracy of named entity finding algorithms for only financial type documents was 85.36%, where the proposed entity recognition algorithm reached 88.80%. On the other hand, the average document verification time of the experimented algorithms and the developed algorithm is 2.43 and 2.48 s respectively. As a conclusion, when both the BERT-base-cased classification model and rule-based approaches are applied specific to the context, it enhances the entity verification process with an insignificant time cost. Consequently, even we used limited data and rules, it is seen that there exists opportunity to automatize the document verification process with the support of both the BERT-base-cased classification model and rule-based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Not Applicable.

Code availability

The code will be available through a GitHub repository.

References

  1. Ando T, Yatsu H, Hisazumi K, et al (2015) Reference model of specifications toward independent verification and validation. In: TENCON 2015–2015 IEEE Region 10 Conference, pp 1–3

  2. Babych B, Hartley A (2003) Improving machine translation quality with automatic Named Entity Recognition. In: Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools, Improving MT Through Other Language Technology Tools, Resource and Tools for Building MT at EACL 2003. Budapest https://aclanthology.org/W03-2201

  3. Bassil Y (2012) A trainable summarizer with knowledge acquired from robust NLP techniques. Int J Res Rev Comput Sci (IJRRCS) 3(1):2079–2557

    Google Scholar 

  4. Bensefia A, Paquet T, Heutte L (2005) A writer identification and verification system. Pattern Recognit Lett 26(13):2080–2092

    Article  MATH  Google Scholar 

  5. Beusekom JV, Shafait F (2011) Distortion measurement for automatic document verification. In: 2011 International Conference on Document Analysis and Recognition, pp 289–293

  6. Cheng P, Erk K (2020) Attending to entities for better text understanding. Proc AAAI Conf Artif Intell 34(5):7554–7561

    Google Scholar 

  7. Elkasrawi S, Dengel A, Abdelsamad A, et al (2016) What you see is what you get? Automatic image verification for online news content. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp 114–119

  8. Etzioni O, Cafarella M, Downey D et al (2005) Unsupervised named-entity extraction from the Web: an experimental study. Artif Intell 165(1):91–134

    Article  Google Scholar 

  9. Garain U, Halder B (2008) On automatic authenticity verification of printed security documents. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp 706–713

  10. Ghanmi N, Awal AM (2018) A new descriptor for pattern matching: application to identity document verification. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp 375–380

  11. Guo J, Xu G, Cheng X, et al (2009) Named Entity Recognition in query. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, Association for Computing Machinery, New York, pp 267–274

  12. Hamad F, Zraqou J, Maaita A, et al (2015) A secure authentication system for ePassport detection and verification. In: 2015 European Intelligence and Security Informatics Conference, pp 173–176

  13. Hassanpour S, O’Connor MJ, Das AK (2011) A framework for the automatic extraction of rules from online text. In: Bassiliades N, Governatori G, Paschke A (eds) Rule-based reasoning, programming, and applications. Springer, Berlin, Heidelberg, pp 266–280

    Chapter  Google Scholar 

  14. Hnoohom N, Chumuang N, Ketcham M (2015) Thai Handwritten verification system on documents for the investigation. In: 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp 617–622

  15. Itcib (2022) Financial Documents Verification (2022). https://itcib.com/financial-documents-verification.html Accessed 28 Dec

  16. Mollá D, van Zaanen M, Smith D (2006) Named Entity Recognition for question answering. In: Proceedings of the Australasian Language Technology Workshop 2006, Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), Sydney, Australia, pp 51–58. https://aclanthology.org/U06-1009

  17. Mridha MF, Lima AA, Nur K et al (2021) A survey of automatic text summarization: Progress. Process and challenges. IEEE Access 9:156043–156070

    Article  Google Scholar 

  18. Nadeau D, Sekine S (2007) A survey of Named Entity Recognition and classification. Lingvist Investig 30(1):3–26

    Article  Google Scholar 

  19. Naman J (2022) NER dataset. https://www.kaggle.com/namanj27/ner-dataset, Accessed 28 Dec 2022

  20. Pariza S (2022) BBC news summary., 2022. https://www.kaggle.com/pariza/bbc-news-summary, Accessed 28 Dec

  21. Petkova D, Croft WB (2007) Proximity-Based document representation for named entity retrieval. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, Association for Computing Machinery, New York, pp 731–740

  22. Poon H, Domingos P (2009) Unsupervised semantic parsing. In: proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, pp 1–10. https://aclanthology.org/D09-1001

  23. Reddy S, Täckström O, Collins M et al (2016) Transforming dependency structures to logical forms for semantic parsing. Trans Assoc Comput Linguist 4:127–140

    Article  Google Scholar 

  24. Roychoudhury S, Bellarykar N, Kulkarni V (2016) A NLP based framework to support document verification-as-a-service. In: 2016 IEEE 20th International Enterprise Distributed Object Computing Conference (EDOC), pp 1–10

  25. Sampaio P, Santos C, Courtias J (2000) About the semantic verification of SMIL documents. In: 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), vol. 3, pp 1675–1678

  26. Sang E, Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent Named Entity Recognition. Proc Seventh Conf Nat Lang Learn HLT-NAACL 2003:142–147

    Google Scholar 

  27. Takata Y, Nakamura T, Seki H (2004) Accessibility verification of WWW documents by an automatic guideline verification tool. In: 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, p 10

  28. Techopedia (2022) What does spell checker mean., 2017. https://www.techopedia.com/definition/12396/spell-checker, Accessed 28 Dec

  29. Tolosana R, Vera-Rodriguez R, Ortega-Garcia J et al (2015) Preprocessing and feature selection for improved sensor interoperability in online biometric signature verification. IEEE Access 3:478–489

    Article  Google Scholar 

  30. Wang J-H (2011) Web-based verification on the representativeness of terms extracted from single short documents. In: 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 3, pp 114–117

  31. Wu C-H, Huang C-L, Hsu C-S, et al (2007) Speech retrieval using spoken keyword extraction and semantic verification. TENCON 2007–2007 IEEE Region 10 Conference, pp 1–4

  32. Zhang Z, Han X, Liu Z, et al (2019) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, pp 1441–1451

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

AT contributed to conceptualization, data collection, formal analysis, software development, validation, visualization, writing an original draft, and writing–review editing. MT contributed to conceptualization, data collection, formal analysis, software development, validation, visualization, writing an original draft, and writing–review editing.

Corresponding author

Correspondence to Ahmet Toprak.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval and consent to participate

Not Applicable.

Human and animal Ethics

Not applicable.

Consent for publication

Yes, we consent this paper to be published in the Journal of Intelligent Information System.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Toprak, A., Turan, M. Enhanced Named Entity Recognition algorithm for financial document verification. J Supercomput 79, 19431–19451 (2023). https://doi.org/10.1007/s11227-023-05371-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05371-4

Keywords

Navigation