Calibrating TabTransformer for financial misstatement detection

Zavitsanos, Elias; Kelesis, Dimitrios; Paliouras, Georgios

doi:10.1007/s10489-024-05861-9

Calibrating TabTransformer for financial misstatement detection

Published: 18 November 2024

Volume 55, article number 3, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Elias Zavitsanos ORCID: orcid.org/0000-0002-2417-3307¹,
Dimitrios Kelesis¹ &
Georgios Paliouras¹

126 Accesses
Explore all metrics

Abstract

In this paper, we deal with the task of identifying the probability of misstatements in the annual financial reports of public companies. In particular, we improve the state-of-the-art for financial misstatement detection by training a TabTransformer model with a gated multi-layer perceptron, which encodes and exploits relationships between financial features. We further calibrate a sample-dependent focal loss function to deal with the severe class imbalance in the data and to focus on positive examples that are hard to distinguish. We evaluate the proposed methodology in a realistic setting that preserves the essential characteristics of the task: (a) the imbalanced distribution of classes in the data, (b) the chronological order of data, and (c) the systematic noise in the labels, due to the delay in manually identifying misstatements. The proposed method achieves state-of-the-art results in this setting, compared to recent approaches in the literature. As an additional contribution, we release the dataset to facilitate further research in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A synthetic data set to benchmark anti-money laundering methods

Article Open access 28 September 2023

On the Use of a Sequential Deep Learning Scheme for Financial Fraud Detection

Design of XGBoost prediction model for financial operation fraud of listed companies

Article 16 August 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The dataset has been compiled by joining publicly available information from [14] with information from the AuditAnalytics database to incorporate the restatement date, in order to simulate the realistic scenario of having noisy labels during training. At the time of writing, the data from [14] were available on GitHub. We are licensed by AuditAnalytics to share data in published research, without sharing explicitly raw data. The restatement dates have been used to flip labels in the training sets and these dates do not appear in the dataset. We, therefore, share the resulting data used in this study but not any raw data from the AuditAnalytics database.

Code availability

Implementation of the model will be available on GitHub upon publication.

Notes

https://www.sec.gov/files/form10-k.pdf
https://www.marketplace.spglobal.com/en/datasets/compustat-fundamentals-(8)
https://sites.google.com/usc.edu/aaerdataset/buy-the-data?authuser=0
https://www.auditanalytics.com
https://www.sec.goc/edgar.shtml
A comparison of financial databases is provided in [29].
https://github.com/izavits/misstatement_data

References

Hennes KM, Leone AJ, Miller BP (2008) The importance of distinguishing errors from irregularities in restatement research: The case of restatements and ceo/cfo turnover. Account Rev 83(6):1487–1519
Article Google Scholar
Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32(4):995–1003
Article MATH Google Scholar
Kotsiantis S, Koumanakos E, Tzelepis D, Tampakas V (2006) Forecasting fraudulent financial statements using data mining. Int J Comput Intell 3(2):104–110
Google Scholar
Bai B, Yen J, Yang X (2008) False financial statements: characteristics of china’s listed companies and cart detecting approach. J Inform Technol Decision Making 7(02):339–359
Article MATH Google Scholar
Deng Q, Mei G (2009) Combining self-organizing map and k-means clustering for detecting fraudulent financial statements. In: 2009 IEEE International conference on granular computing, IEEE, Nanchang, China. IEEE, pp 126–131
Ravisankar P, Ravi V, Rao GR, Bose I (2011) Detection of financial statement fraud and feature selection using data mining techniques. Decis Support Syst 50(2):491–500
Article Google Scholar
Feroz EH, Kwon TM, Pastena VS, Park K (2000) The efficacy of red flags in predicting the sec’s targets: an artificial neural networks approach. Intell Syst Account, Finance Manag 9(3):145–157
Article Google Scholar
Abbasi A, Albrecht C, Vance A, Hansen J (2012) Metafraud: a meta-learning framework for detecting financial fraud. MIS Q 36(4):1293–1327
Article MATH Google Scholar
Using machine learning to detect misstatements (2021) Bertomeu, J., Cheynel, E., Floyd, E., W., P. Rev Acc Stud 26:468–519
Google Scholar
Perols J (2011) Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Audit A J Pract Theory 30(2):19–50
Sharma A, Panigrahi PK (2012) A review of financial accounting fraud detection based on data mining techniques. Int J Comput Appl 39(1):11
MATH Google Scholar
Zhang C, Cho S, Vasarhelyi M (2022) Explainable artificial intelligence (xai) in auditing. Int J Account Inf Syst 46:100572
Article MATH Google Scholar
Dechow PM, Ge W, Larson CR, Sloan RG (2011) Predicting material accounting misstatements. Contemp Account Res 28(1):17–82
Article Google Scholar
Bao Y, Ke B, Li B, Yu YJ, Zhang J (2020) Detecting accounting fraud in publicly traded us firms using a machine learning approach. J Account Res 58(1):199–235
Article MATH Google Scholar
Zavitsanos E, Mavroeidis D, Bougiatiotis K, Spyropoulou E, Loukas L, Paliouras G (2021) Financial misstatement detection: a realistic evaluation. In: In 2nd ACM International conference on ai in finance (ICAIF’ 21), pp 1–9. Association for Computing Machinery, November 3–5, 2021, Virtual Event, USA
Puttarattanamanee M, Boongasame L, Thammarak K (2023) A comparative study of sentiment analysis methods for detecting fake reviews in e-commerce. High Tech Innov J 4(2):349–363
Google Scholar
Hoogs B, Kiehl T, Lacomb C, Senturk D (2007) A genetic algorithm approach to detecting temporal patterns indicative of financial statement fraud. Intell Syst Account Financ Manag Int J 15(1–2):41–56
Article Google Scholar
Kiehl TR, Hoogs BK, LaComb CA, Senturk D (2005) Evolving multi-variate time-series patterns for the discrimination of fraudulent financial filings. In: Genetic and evolutionary computation conference, ACM, Washington, DC, USA. Citeseer, pp 1–8
Chai W, Hoogs BK, Verschueren BT (2006) Fuzzy ranking of financial statements for fraud detection. In: 2006 IEEE International conference on fuzzy systems, IEEE, pp 152–158. IEEE, Vancouver, BC
Liou F-M (2008) Fraudulent financial reporting detection and business failure prediction models: a comparison. Manag Audit J 23(7):650–622
Article MATH Google Scholar
Cecchini M, Aytug H, Koehler GJ, Pathak P (2010) Detecting management fraud in public companies. Manage Sci 56(7):1146–1160
Article MATH Google Scholar
Ata HA, Seyrek IH (2009) The use of data mining techniques in detecting fraudulent financial statements: An application on manufacturing firms. Suleyman demirel university journal of faculty of economics & administrative sciences. 14(2):157–170
MATH Google Scholar
Lin C-C, Chiu A-A, Huang SY, Yen DC (2015) Detecting the financial statement fraud: The analysis of the differences between data mining techniques and experts’ judgments. Knowl-Based Syst 89:459–470
Article MATH Google Scholar
Green BP, Choi JH (1997) Assessing the risk of management fraud through neural network technology. Auditing. 16(1):14–28
MATH Google Scholar
Fissette M, Vries T (2017) Text mining to detect indications of fraud in annual reports worldwide. In: Benelearn 2017: proceedings of the twenty-sixth benelux conference on machine learning, technische universiteit eindhoven, Eindhoven University of Technology, Eindhoven (the Netherlands), pp 69–71
Craja P, Kim A, Lessmann S (2020) Deep learning for detecting financial statement fraud. Decis Support Syst 139:113421
Article Google Scholar
Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud-a comparative study of machine learning methods. Knowl-Based Syst 128:139–152
Article MATH Google Scholar
Dutta I, Dutta S, Raahemi B (2017) Detecting financial restatements using data mining techniques. Expert Syst Appl 90:374–393
Article MATH Google Scholar
Karpoff J, Koester A, Lee D, Martin G (2012) A critical analysis of databases used in financial misconduct research (working paper). SSRN Electron J
Humpherys SL, Moffitt KC, Burns MB, Burgoon JK, Felix WF (2011) Identification of fraudulent financial statements using linguistic credibility analysis. Decis Support Syst 50(3):585–594
Article Google Scholar
Glancy FH, Yadav SB (2011) A computational model for financial reporting fraud detection. Decis Support Syst 50(3):595–601
Article MATH Google Scholar
Spathis C, Doumpos M, Zopounidis C (2002) Detecting falsified financial statements: a comparative study using multicriteria analysis and multivariate statistical techniques. Account Rev 11(3):509–535
Kaminski KA, Wetzel TS, Guan L (2004) Can financial ratios detect fraudulent financial reporting? Manag Audit 19(1):15–28
Article Google Scholar
Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43
Article MATH Google Scholar
Dyck A, Morse A, Zingales L (2010) Who blows the whistle on corporate fraud? J Financ 65(6):2213–2253
Article MATH Google Scholar
Huang X, Khetan A, Cvitkovic M, Karnin Z (2020) TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv https://doi.org/10.48550/ARXIV.2012.06678
Liu H, Dai Z, So D, Le QV (2021) Pay attention to mlps. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in Neural Information Processing Systems, vol 34, pp 9204–9215. Curran Associates, Inc., Virtual-only Conference. https://proceedings.neurips.cc/paper/2021/file/4cc05b35c2f937c5bd9e7d41d3686fff-Paper.pdf
Cholakov R, Kolev T (2022) The GatedTabTransformer. An enhanced deep learning architecture for tabular modeling. arXiv. https://doi.org/10.48550/ARXIV.2201.00199
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Article Google Scholar
Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P, Dokania P (2020) Calibrating deep neural networks using focal loss. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33, pp 15288–15299. Curran Associates, Inc., Virtual-only Conference. https://proceedings.neurips.cc/paper/2020/file/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf
H, B, S, K (2020) Topics in financial filings and bankruptcy prediction with distributed representations of textual data. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Virtual Conference, Belgium, pp 306–322
Zavitsanos E, Mavroeidis D, Spyropoulou E, Fergadiotis M, Georgios P (2024) Entrant: A large financial dataset for table understanding. Nature Sci Data 11:876. https://doi.org/10.1038/s41597-024-03605-5
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the financial support of Qualco SA for this research. The opinions of the authors expressed herein do not necessarily state or reflect those of Qualco SA. Qualco SA had no influence in the design of the study, the collection and interpretation of the data, the writing, and the decision to submit the article for publication.

Author information

Authors and Affiliations

Institute of Informatics and Telecommunications, NCSR “Demokritos”, Patriarhou Gregoriou and Neapoleos St., Aghia Paraskevi, 15341, Attica, Greece
Elias Zavitsanos, Dimitrios Kelesis & Georgios Paliouras

Authors

Elias Zavitsanos
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Kelesis
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Paliouras
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Research and development of the method was performed by Elias Zavitsanos. Contributions to several parts of the method were made by Dimitrios Kelesis. Manuscript written by all authors. Review by Georgios Paliouras. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elias Zavitsanos.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zavitsanos, E., Kelesis, D. & Paliouras, G. Calibrating TabTransformer for financial misstatement detection. Appl Intell 55, 3 (2025). https://doi.org/10.1007/s10489-024-05861-9

Download citation

Accepted: 06 November 2024
Published: 18 November 2024
DOI: https://doi.org/10.1007/s10489-024-05861-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Calibrating TabTransformer for financial misstatement detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A synthetic data set to benchmark anti-money laundering methods

On the Use of a Sequential Deep Learning Scheme for Financial Fraud Detection

Design of XGBoost prediction model for financial operation fraud of listed companies

Data availability

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Calibrating TabTransformer for financial misstatement detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A synthetic data set to benchmark anti-money laundering methods

On the Use of a Sequential Deep Learning Scheme for Financial Fraud Detection

Design of XGBoost prediction model for financial operation fraud of listed companies

Explore related subjects

Data availability

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation