Skip to main content

Advertisement

Calibrating TabTransformer for financial misstatement detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we deal with the task of identifying the probability of misstatements in the annual financial reports of public companies. In particular, we improve the state-of-the-art for financial misstatement detection by training a TabTransformer model with a gated multi-layer perceptron, which encodes and exploits relationships between financial features. We further calibrate a sample-dependent focal loss function to deal with the severe class imbalance in the data and to focus on positive examples that are hard to distinguish. We evaluate the proposed methodology in a realistic setting that preserves the essential characteristics of the task: (a) the imbalanced distribution of classes in the data, (b) the chronological order of data, and (c) the systematic noise in the labels, due to the delay in manually identifying misstatements. The proposed method achieves state-of-the-art results in this setting, compared to recent approaches in the literature. As an additional contribution, we release the dataset to facilitate further research in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The dataset has been compiled by joining publicly available information from [14] with information from the AuditAnalytics database to incorporate the restatement date, in order to simulate the realistic scenario of having noisy labels during training. At the time of writing, the data from  [14] were available on GitHub. We are licensed by AuditAnalytics to share data in published research, without sharing explicitly raw data. The restatement dates have been used to flip labels in the training sets and these dates do not appear in the dataset. We, therefore, share the resulting data used in this study but not any raw data from the AuditAnalytics database.

Code availability

Implementation of the model will be available on GitHub upon publication.

Notes

  1. https://www.sec.gov/files/form10-k.pdf

  2. https://www.marketplace.spglobal.com/en/datasets/compustat-fundamentals-(8)

  3. https://sites.google.com/usc.edu/aaerdataset/buy-the-data?authuser=0

  4. https://www.auditanalytics.com

  5. https://www.sec.goc/edgar.shtml

  6. A comparison of financial databases is provided in [29].

  7. https://github.com/izavits/misstatement_data

References

  1. Hennes KM, Leone AJ, Miller BP (2008) The importance of distinguishing errors from irregularities in restatement research: The case of restatements and ceo/cfo turnover. Account Rev 83(6):1487–1519

    Article  Google Scholar 

  2. Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32(4):995–1003

    Article  MATH  Google Scholar 

  3. Kotsiantis S, Koumanakos E, Tzelepis D, Tampakas V (2006) Forecasting fraudulent financial statements using data mining. Int J Comput Intell 3(2):104–110

    Google Scholar 

  4. Bai B, Yen J, Yang X (2008) False financial statements: characteristics of china’s listed companies and cart detecting approach. J Inform Technol Decision Making 7(02):339–359

    Article  MATH  Google Scholar 

  5. Deng Q, Mei G (2009) Combining self-organizing map and k-means clustering for detecting fraudulent financial statements. In: 2009 IEEE International conference on granular computing, IEEE, Nanchang, China. IEEE, pp 126–131

  6. Ravisankar P, Ravi V, Rao GR, Bose I (2011) Detection of financial statement fraud and feature selection using data mining techniques. Decis Support Syst 50(2):491–500

    Article  Google Scholar 

  7. Feroz EH, Kwon TM, Pastena VS, Park K (2000) The efficacy of red flags in predicting the sec’s targets: an artificial neural networks approach. Intell Syst Account, Finance Manag 9(3):145–157

    Article  Google Scholar 

  8. Abbasi A, Albrecht C, Vance A, Hansen J (2012) Metafraud: a meta-learning framework for detecting financial fraud. MIS Q 36(4):1293–1327

    Article  MATH  Google Scholar 

  9. Using machine learning to detect misstatements (2021) Bertomeu, J., Cheynel, E., Floyd, E., W., P. Rev Acc Stud 26:468–519

    Google Scholar 

  10. Perols J (2011) Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Audit A J Pract Theory 30(2):19–50

  11. Sharma A, Panigrahi PK (2012) A review of financial accounting fraud detection based on data mining techniques. Int J Comput Appl 39(1):11

    MATH  Google Scholar 

  12. Zhang C, Cho S, Vasarhelyi M (2022) Explainable artificial intelligence (xai) in auditing. Int J Account Inf Syst 46:100572

    Article  MATH  Google Scholar 

  13. Dechow PM, Ge W, Larson CR, Sloan RG (2011) Predicting material accounting misstatements. Contemp Account Res 28(1):17–82

    Article  Google Scholar 

  14. Bao Y, Ke B, Li B, Yu YJ, Zhang J (2020) Detecting accounting fraud in publicly traded us firms using a machine learning approach. J Account Res 58(1):199–235

    Article  MATH  Google Scholar 

  15. Zavitsanos E, Mavroeidis D, Bougiatiotis K, Spyropoulou E, Loukas L, Paliouras G (2021) Financial misstatement detection: a realistic evaluation. In: In 2nd ACM International conference on ai in finance (ICAIF’ 21), pp 1–9. Association for Computing Machinery, November 3–5, 2021, Virtual Event, USA

  16. Puttarattanamanee M, Boongasame L, Thammarak K (2023) A comparative study of sentiment analysis methods for detecting fake reviews in e-commerce. High Tech Innov J 4(2):349–363

    Google Scholar 

  17. Hoogs B, Kiehl T, Lacomb C, Senturk D (2007) A genetic algorithm approach to detecting temporal patterns indicative of financial statement fraud. Intell Syst Account Financ Manag Int J 15(1–2):41–56

    Article  Google Scholar 

  18. Kiehl TR, Hoogs BK, LaComb CA, Senturk D (2005) Evolving multi-variate time-series patterns for the discrimination of fraudulent financial filings. In: Genetic and evolutionary computation conference, ACM, Washington, DC, USA. Citeseer, pp 1–8

  19. Chai W, Hoogs BK, Verschueren BT (2006) Fuzzy ranking of financial statements for fraud detection. In: 2006 IEEE International conference on fuzzy systems, IEEE, pp 152–158. IEEE, Vancouver, BC

  20. Liou F-M (2008) Fraudulent financial reporting detection and business failure prediction models: a comparison. Manag Audit J 23(7):650–622

    Article  MATH  Google Scholar 

  21. Cecchini M, Aytug H, Koehler GJ, Pathak P (2010) Detecting management fraud in public companies. Manage Sci 56(7):1146–1160

    Article  MATH  Google Scholar 

  22. Ata HA, Seyrek IH (2009) The use of data mining techniques in detecting fraudulent financial statements: An application on manufacturing firms. Suleyman demirel university journal of faculty of economics & administrative sciences. 14(2):157–170

    MATH  Google Scholar 

  23. Lin C-C, Chiu A-A, Huang SY, Yen DC (2015) Detecting the financial statement fraud: The analysis of the differences between data mining techniques and experts’ judgments. Knowl-Based Syst 89:459–470

    Article  MATH  Google Scholar 

  24. Green BP, Choi JH (1997) Assessing the risk of management fraud through neural network technology. Auditing. 16(1):14–28

    MATH  Google Scholar 

  25. Fissette M, Vries T (2017) Text mining to detect indications of fraud in annual reports worldwide. In: Benelearn 2017: proceedings of the twenty-sixth benelux conference on machine learning, technische universiteit eindhoven, Eindhoven University of Technology, Eindhoven (the Netherlands), pp 69–71

  26. Craja P, Kim A, Lessmann S (2020) Deep learning for detecting financial statement fraud. Decis Support Syst 139:113421

    Article  Google Scholar 

  27. Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud-a comparative study of machine learning methods. Knowl-Based Syst 128:139–152

    Article  MATH  Google Scholar 

  28. Dutta I, Dutta S, Raahemi B (2017) Detecting financial restatements using data mining techniques. Expert Syst Appl 90:374–393

    Article  MATH  Google Scholar 

  29. Karpoff J, Koester A, Lee D, Martin G (2012) A critical analysis of databases used in financial misconduct research (working paper). SSRN Electron J

  30. Humpherys SL, Moffitt KC, Burns MB, Burgoon JK, Felix WF (2011) Identification of fraudulent financial statements using linguistic credibility analysis. Decis Support Syst 50(3):585–594

    Article  Google Scholar 

  31. Glancy FH, Yadav SB (2011) A computational model for financial reporting fraud detection. Decis Support Syst 50(3):595–601

    Article  MATH  Google Scholar 

  32. Spathis C, Doumpos M, Zopounidis C (2002) Detecting falsified financial statements: a comparative study using multicriteria analysis and multivariate statistical techniques. Account Rev 11(3):509–535

  33. Kaminski KA, Wetzel TS, Guan L (2004) Can financial ratios detect fraudulent financial reporting? Manag Audit 19(1):15–28

    Article  Google Scholar 

  34. Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43

    Article  MATH  Google Scholar 

  35. Dyck A, Morse A, Zingales L (2010) Who blows the whistle on corporate fraud? J Financ 65(6):2213–2253

    Article  MATH  Google Scholar 

  36. Huang X, Khetan A, Cvitkovic M, Karnin Z (2020) TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv https://doi.org/10.48550/ARXIV.2012.06678

  37. Liu H, Dai Z, So D, Le QV (2021) Pay attention to mlps. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in Neural Information Processing Systems, vol 34, pp 9204–9215. Curran Associates, Inc., Virtual-only Conference. https://proceedings.neurips.cc/paper/2021/file/4cc05b35c2f937c5bd9e7d41d3686fff-Paper.pdf

  38. Cholakov R, Kolev T (2022) The GatedTabTransformer. An enhanced deep learning architecture for tabular modeling. arXiv. https://doi.org/10.48550/ARXIV.2201.00199

  39. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  40. Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P, Dokania P (2020) Calibrating deep neural networks using focal loss. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33, pp 15288–15299. Curran Associates, Inc., Virtual-only Conference. https://proceedings.neurips.cc/paper/2020/file/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf

  41. H, B, S, K (2020) Topics in financial filings and bankruptcy prediction with distributed representations of textual data. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Virtual Conference, Belgium, pp 306–322

  42. Zavitsanos E, Mavroeidis D, Spyropoulou E, Fergadiotis M, Georgios P (2024) Entrant: A large financial dataset for table understanding. Nature Sci Data 11:876. https://doi.org/10.1038/s41597-024-03605-5

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the financial support of Qualco SA for this research. The opinions of the authors expressed herein do not necessarily state or reflect those of Qualco SA. Qualco SA had no influence in the design of the study, the collection and interpretation of the data, the writing, and the decision to submit the article for publication.

Author information

Authors and Affiliations

Authors

Contributions

Research and development of the method was performed by Elias Zavitsanos. Contributions to several parts of the method were made by Dimitrios Kelesis. Manuscript written by all authors. Review by Georgios Paliouras. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elias Zavitsanos.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zavitsanos, E., Kelesis, D. & Paliouras, G. Calibrating TabTransformer for financial misstatement detection. Appl Intell 55, 3 (2025). https://doi.org/10.1007/s10489-024-05861-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05861-9

Keywords