Abstract
In this paper, we deal with the task of identifying the probability of misstatements in the annual financial reports of public companies. In particular, we improve the state-of-the-art for financial misstatement detection by training a TabTransformer model with a gated multi-layer perceptron, which encodes and exploits relationships between financial features. We further calibrate a sample-dependent focal loss function to deal with the severe class imbalance in the data and to focus on positive examples that are hard to distinguish. We evaluate the proposed methodology in a realistic setting that preserves the essential characteristics of the task: (a) the imbalanced distribution of classes in the data, (b) the chronological order of data, and (c) the systematic noise in the labels, due to the delay in manually identifying misstatements. The proposed method achieves state-of-the-art results in this setting, compared to recent approaches in the literature. As an additional contribution, we release the dataset to facilitate further research in the field.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The dataset has been compiled by joining publicly available information from [14] with information from the AuditAnalytics database to incorporate the restatement date, in order to simulate the realistic scenario of having noisy labels during training. At the time of writing, the data from [14] were available on GitHub. We are licensed by AuditAnalytics to share data in published research, without sharing explicitly raw data. The restatement dates have been used to flip labels in the training sets and these dates do not appear in the dataset. We, therefore, share the resulting data used in this study but not any raw data from the AuditAnalytics database.
Code availability
Implementation of the model will be available on GitHub upon publication.
Notes
A comparison of financial databases is provided in [29].
References
Hennes KM, Leone AJ, Miller BP (2008) The importance of distinguishing errors from irregularities in restatement research: The case of restatements and ceo/cfo turnover. Account Rev 83(6):1487–1519
Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32(4):995–1003
Kotsiantis S, Koumanakos E, Tzelepis D, Tampakas V (2006) Forecasting fraudulent financial statements using data mining. Int J Comput Intell 3(2):104–110
Bai B, Yen J, Yang X (2008) False financial statements: characteristics of china’s listed companies and cart detecting approach. J Inform Technol Decision Making 7(02):339–359
Deng Q, Mei G (2009) Combining self-organizing map and k-means clustering for detecting fraudulent financial statements. In: 2009 IEEE International conference on granular computing, IEEE, Nanchang, China. IEEE, pp 126–131
Ravisankar P, Ravi V, Rao GR, Bose I (2011) Detection of financial statement fraud and feature selection using data mining techniques. Decis Support Syst 50(2):491–500
Feroz EH, Kwon TM, Pastena VS, Park K (2000) The efficacy of red flags in predicting the sec’s targets: an artificial neural networks approach. Intell Syst Account, Finance Manag 9(3):145–157
Abbasi A, Albrecht C, Vance A, Hansen J (2012) Metafraud: a meta-learning framework for detecting financial fraud. MIS Q 36(4):1293–1327
Using machine learning to detect misstatements (2021) Bertomeu, J., Cheynel, E., Floyd, E., W., P. Rev Acc Stud 26:468–519
Perols J (2011) Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Audit A J Pract Theory 30(2):19–50
Sharma A, Panigrahi PK (2012) A review of financial accounting fraud detection based on data mining techniques. Int J Comput Appl 39(1):11
Zhang C, Cho S, Vasarhelyi M (2022) Explainable artificial intelligence (xai) in auditing. Int J Account Inf Syst 46:100572
Dechow PM, Ge W, Larson CR, Sloan RG (2011) Predicting material accounting misstatements. Contemp Account Res 28(1):17–82
Bao Y, Ke B, Li B, Yu YJ, Zhang J (2020) Detecting accounting fraud in publicly traded us firms using a machine learning approach. J Account Res 58(1):199–235
Zavitsanos E, Mavroeidis D, Bougiatiotis K, Spyropoulou E, Loukas L, Paliouras G (2021) Financial misstatement detection: a realistic evaluation. In: In 2nd ACM International conference on ai in finance (ICAIF’ 21), pp 1–9. Association for Computing Machinery, November 3–5, 2021, Virtual Event, USA
Puttarattanamanee M, Boongasame L, Thammarak K (2023) A comparative study of sentiment analysis methods for detecting fake reviews in e-commerce. High Tech Innov J 4(2):349–363
Hoogs B, Kiehl T, Lacomb C, Senturk D (2007) A genetic algorithm approach to detecting temporal patterns indicative of financial statement fraud. Intell Syst Account Financ Manag Int J 15(1–2):41–56
Kiehl TR, Hoogs BK, LaComb CA, Senturk D (2005) Evolving multi-variate time-series patterns for the discrimination of fraudulent financial filings. In: Genetic and evolutionary computation conference, ACM, Washington, DC, USA. Citeseer, pp 1–8
Chai W, Hoogs BK, Verschueren BT (2006) Fuzzy ranking of financial statements for fraud detection. In: 2006 IEEE International conference on fuzzy systems, IEEE, pp 152–158. IEEE, Vancouver, BC
Liou F-M (2008) Fraudulent financial reporting detection and business failure prediction models: a comparison. Manag Audit J 23(7):650–622
Cecchini M, Aytug H, Koehler GJ, Pathak P (2010) Detecting management fraud in public companies. Manage Sci 56(7):1146–1160
Ata HA, Seyrek IH (2009) The use of data mining techniques in detecting fraudulent financial statements: An application on manufacturing firms. Suleyman demirel university journal of faculty of economics & administrative sciences. 14(2):157–170
Lin C-C, Chiu A-A, Huang SY, Yen DC (2015) Detecting the financial statement fraud: The analysis of the differences between data mining techniques and experts’ judgments. Knowl-Based Syst 89:459–470
Green BP, Choi JH (1997) Assessing the risk of management fraud through neural network technology. Auditing. 16(1):14–28
Fissette M, Vries T (2017) Text mining to detect indications of fraud in annual reports worldwide. In: Benelearn 2017: proceedings of the twenty-sixth benelux conference on machine learning, technische universiteit eindhoven, Eindhoven University of Technology, Eindhoven (the Netherlands), pp 69–71
Craja P, Kim A, Lessmann S (2020) Deep learning for detecting financial statement fraud. Decis Support Syst 139:113421
Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud-a comparative study of machine learning methods. Knowl-Based Syst 128:139–152
Dutta I, Dutta S, Raahemi B (2017) Detecting financial restatements using data mining techniques. Expert Syst Appl 90:374–393
Karpoff J, Koester A, Lee D, Martin G (2012) A critical analysis of databases used in financial misconduct research (working paper). SSRN Electron J
Humpherys SL, Moffitt KC, Burns MB, Burgoon JK, Felix WF (2011) Identification of fraudulent financial statements using linguistic credibility analysis. Decis Support Syst 50(3):585–594
Glancy FH, Yadav SB (2011) A computational model for financial reporting fraud detection. Decis Support Syst 50(3):595–601
Spathis C, Doumpos M, Zopounidis C (2002) Detecting falsified financial statements: a comparative study using multicriteria analysis and multivariate statistical techniques. Account Rev 11(3):509–535
Kaminski KA, Wetzel TS, Guan L (2004) Can financial ratios detect fraudulent financial reporting? Manag Audit 19(1):15–28
Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43
Dyck A, Morse A, Zingales L (2010) Who blows the whistle on corporate fraud? J Financ 65(6):2213–2253
Huang X, Khetan A, Cvitkovic M, Karnin Z (2020) TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv https://doi.org/10.48550/ARXIV.2012.06678
Liu H, Dai Z, So D, Le QV (2021) Pay attention to mlps. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in Neural Information Processing Systems, vol 34, pp 9204–9215. Curran Associates, Inc., Virtual-only Conference. https://proceedings.neurips.cc/paper/2021/file/4cc05b35c2f937c5bd9e7d41d3686fff-Paper.pdf
Cholakov R, Kolev T (2022) The GatedTabTransformer. An enhanced deep learning architecture for tabular modeling. arXiv. https://doi.org/10.48550/ARXIV.2201.00199
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P, Dokania P (2020) Calibrating deep neural networks using focal loss. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33, pp 15288–15299. Curran Associates, Inc., Virtual-only Conference. https://proceedings.neurips.cc/paper/2020/file/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf
H, B, S, K (2020) Topics in financial filings and bankruptcy prediction with distributed representations of textual data. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Virtual Conference, Belgium, pp 306–322
Zavitsanos E, Mavroeidis D, Spyropoulou E, Fergadiotis M, Georgios P (2024) Entrant: A large financial dataset for table understanding. Nature Sci Data 11:876. https://doi.org/10.1038/s41597-024-03605-5
Acknowledgements
The authors would like to acknowledge the financial support of Qualco SA for this research. The opinions of the authors expressed herein do not necessarily state or reflect those of Qualco SA. Qualco SA had no influence in the design of the study, the collection and interpretation of the data, the writing, and the decision to submit the article for publication.
Author information
Authors and Affiliations
Contributions
Research and development of the method was performed by Elias Zavitsanos. Contributions to several parts of the method were made by Dimitrios Kelesis. Manuscript written by all authors. Review by Georgios Paliouras. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zavitsanos, E., Kelesis, D. & Paliouras, G. Calibrating TabTransformer for financial misstatement detection. Appl Intell 55, 3 (2025). https://doi.org/10.1007/s10489-024-05861-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05861-9