Abstract
Many enterprise systems are document-intensive and require extensive manual verification. The verification process has challenge in terms of time and remaining bugs. A general automatic or semi-automatic document verification system would be useful. However, as the nature of the natural language, the context is an important factor. In this research, the target context is selected to be the financial documents, which have been highly interested recently. An automatic document verification model based on only entities (mostly faced within financial documents) was experimented. The summary report was verified with original documents, such that entities in the summary were searched for matching in the original documents. Verification process success was evaluated by comparison of the named entity algorithms in the literature. The special Kaggle data set ready for this purpose was used for entity matching from the summary within the original documents. The average document verification accuracy of named entity finding algorithms for only financial type documents was 85.36%, where the proposed entity recognition algorithm reached 88.80%. On the other hand, the average document verification time of the experimented algorithms and the developed algorithm is 2.43 and 2.48 s respectively. As a conclusion, when both the BERT-base-cased classification model and rule-based approaches are applied specific to the context, it enhances the entity verification process with an insignificant time cost. Consequently, even we used limited data and rules, it is seen that there exists opportunity to automatize the document verification process with the support of both the BERT-base-cased classification model and rule-based approaches.
Similar content being viewed by others
Data availability
Not Applicable.
Code availability
The code will be available through a GitHub repository.
References
Ando T, Yatsu H, Hisazumi K, et al (2015) Reference model of specifications toward independent verification and validation. In: TENCON 2015–2015 IEEE Region 10 Conference, pp 1–3
Babych B, Hartley A (2003) Improving machine translation quality with automatic Named Entity Recognition. In: Proceedings of the 7th International EAMT Workshop on MT and Other Language Technology Tools, Improving MT Through Other Language Technology Tools, Resource and Tools for Building MT at EACL 2003. Budapest https://aclanthology.org/W03-2201
Bassil Y (2012) A trainable summarizer with knowledge acquired from robust NLP techniques. Int J Res Rev Comput Sci (IJRRCS) 3(1):2079–2557
Bensefia A, Paquet T, Heutte L (2005) A writer identification and verification system. Pattern Recognit Lett 26(13):2080–2092
Beusekom JV, Shafait F (2011) Distortion measurement for automatic document verification. In: 2011 International Conference on Document Analysis and Recognition, pp 289–293
Cheng P, Erk K (2020) Attending to entities for better text understanding. Proc AAAI Conf Artif Intell 34(5):7554–7561
Elkasrawi S, Dengel A, Abdelsamad A, et al (2016) What you see is what you get? Automatic image verification for online news content. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp 114–119
Etzioni O, Cafarella M, Downey D et al (2005) Unsupervised named-entity extraction from the Web: an experimental study. Artif Intell 165(1):91–134
Garain U, Halder B (2008) On automatic authenticity verification of printed security documents. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp 706–713
Ghanmi N, Awal AM (2018) A new descriptor for pattern matching: application to identity document verification. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp 375–380
Guo J, Xu G, Cheng X, et al (2009) Named Entity Recognition in query. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, Association for Computing Machinery, New York, pp 267–274
Hamad F, Zraqou J, Maaita A, et al (2015) A secure authentication system for ePassport detection and verification. In: 2015 European Intelligence and Security Informatics Conference, pp 173–176
Hassanpour S, O’Connor MJ, Das AK (2011) A framework for the automatic extraction of rules from online text. In: Bassiliades N, Governatori G, Paschke A (eds) Rule-based reasoning, programming, and applications. Springer, Berlin, Heidelberg, pp 266–280
Hnoohom N, Chumuang N, Ketcham M (2015) Thai Handwritten verification system on documents for the investigation. In: 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp 617–622
Itcib (2022) Financial Documents Verification (2022). https://itcib.com/financial-documents-verification.html Accessed 28 Dec
Mollá D, van Zaanen M, Smith D (2006) Named Entity Recognition for question answering. In: Proceedings of the Australasian Language Technology Workshop 2006, Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), Sydney, Australia, pp 51–58. https://aclanthology.org/U06-1009
Mridha MF, Lima AA, Nur K et al (2021) A survey of automatic text summarization: Progress. Process and challenges. IEEE Access 9:156043–156070
Nadeau D, Sekine S (2007) A survey of Named Entity Recognition and classification. Lingvist Investig 30(1):3–26
Naman J (2022) NER dataset. https://www.kaggle.com/namanj27/ner-dataset, Accessed 28 Dec 2022
Pariza S (2022) BBC news summary., 2022. https://www.kaggle.com/pariza/bbc-news-summary, Accessed 28 Dec
Petkova D, Croft WB (2007) Proximity-Based document representation for named entity retrieval. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, Association for Computing Machinery, New York, pp 731–740
Poon H, Domingos P (2009) Unsupervised semantic parsing. In: proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, pp 1–10. https://aclanthology.org/D09-1001
Reddy S, Täckström O, Collins M et al (2016) Transforming dependency structures to logical forms for semantic parsing. Trans Assoc Comput Linguist 4:127–140
Roychoudhury S, Bellarykar N, Kulkarni V (2016) A NLP based framework to support document verification-as-a-service. In: 2016 IEEE 20th International Enterprise Distributed Object Computing Conference (EDOC), pp 1–10
Sampaio P, Santos C, Courtias J (2000) About the semantic verification of SMIL documents. In: 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), vol. 3, pp 1675–1678
Sang E, Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent Named Entity Recognition. Proc Seventh Conf Nat Lang Learn HLT-NAACL 2003:142–147
Takata Y, Nakamura T, Seki H (2004) Accessibility verification of WWW documents by an automatic guideline verification tool. In: 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, p 10
Techopedia (2022) What does spell checker mean., 2017. https://www.techopedia.com/definition/12396/spell-checker, Accessed 28 Dec
Tolosana R, Vera-Rodriguez R, Ortega-Garcia J et al (2015) Preprocessing and feature selection for improved sensor interoperability in online biometric signature verification. IEEE Access 3:478–489
Wang J-H (2011) Web-based verification on the representativeness of terms extracted from single short documents. In: 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 3, pp 114–117
Wu C-H, Huang C-L, Hsu C-S, et al (2007) Speech retrieval using spoken keyword extraction and semantic verification. TENCON 2007–2007 IEEE Region 10 Conference, pp 1–4
Zhang Z, Han X, Liu Z, et al (2019) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, pp 1441–1451
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
AT contributed to conceptualization, data collection, formal analysis, software development, validation, visualization, writing an original draft, and writing–review editing. MT contributed to conceptualization, data collection, formal analysis, software development, validation, visualization, writing an original draft, and writing–review editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval and consent to participate
Not Applicable.
Human and animal Ethics
Not applicable.
Consent for publication
Yes, we consent this paper to be published in the Journal of Intelligent Information System.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Toprak, A., Turan, M. Enhanced Named Entity Recognition algorithm for financial document verification. J Supercomput 79, 19431–19451 (2023). https://doi.org/10.1007/s11227-023-05371-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05371-4