Abstract
Governmental adminstrative domains can potentially benefit from a wide variety of currently available big data analysis methods. The tax administration is such an area that requires massive data processing to identify hidden patterns and trends of possible tax evasion. The use of supervised methods can be effective in these cases, but the lack of available labeled data limits their practical application in real-world scenarios. An alternative is the use of unsupervised methods, which have potential benefits in certain cases. In this sense, unsupervised methods are considered to be feasible as a decision support tool in tax evasion risk management systems. This paper proposes an unsupervised approach to identify signs of tax evasion by detecting, possible, tax underreporting. The proposed strategy is evaluated on a data set associated with individual income tax statistics of the United States. The results achieved are considered to be useful in decision-making and preventive actions on cases reported as suspicious.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. ACM SIGMOD Rec. 28(2), 49–60 (1999)
Bai, L., Liang, J.: A categorical data clustering framework on graph representation. Pattern Recogn. 128, 108694 (2022)
Center, T.P.: The state of state (and local) tax policy (2023). https://www.taxpolicycenter.org/briefing-book/how-do-state-and-local-corporate-income-taxes-work. Accessed 3 Mar 2023
De Roux, D., Perez, B., Moreno, A., Villamil, M.D.P., Figueroa, C.: Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 215–222 (2018)
Devassy, B.M., George, S.: Dimensionality reduction and visualisation of hyperspectral ink data using t-SNE. Forensic Sci. Int. 311, 110194 (2020)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics, 857–871 (1971)
IRS: Individual income tax statistics data set (2023). https://www.irs.gov/pub/irs-soi/19zpallnoagi.csv. Accessed 1 Mar 2023
Kassa, E.T.: Factors influencing taxpayers to engage in tax evasion: evidence from Woldia City administration micro, small, and large enterprise taxpayers. J. Innov. Entrepreneurship 10(1), 1–16 (2021)
Mehta, P., Mathews, J., Bisht, D., Suryamukhi, K., Kumar, S., Babu, C.S.: Detecting tax evaders using TrustRank and spectral clustering. In: Abramowicz, W., Klein, G. (eds.) BIS 2020. LNBIP, vol. 389, pp. 169–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53337-3_13
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. (CSUR) 54(2), 1–38 (2021)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. Disc. 2, 169–194 (1998)
Savić, M., Atanasijević, J., Jakovetić, D., Krejić, N.: Tax evasion risk management using a hybrid unsupervised outlier detection method. Expert Syst. Appl. 193, 116409 (2022)
Schultz, M., Tropmann-Frick, M.: Autoencoder neural networks versus external auditors: detecting unusual journal entries in financial statement audits. In: Hawaii International Conference on System Sciences (2020)
Vâlsan, C., Druică, E., Ianole-Călin, R.: State capacity and tolerance towards tax evasion: first evidence from Romania. Adm. Sci. 10(2), 33 (2020)
Vanhoeyveld, J., Martens, D., Peeters, B.: Value-added tax fraud detection with scalable anomaly detection techniques. Appl. Soft Comput. 86, 105895 (2020). https://doi.org/10.1016/j.asoc.2019.105895, https://www.sciencedirect.com/science/article/pii/S1568494619306763
Wang, G., Ma, J., Chen, G.: Attentive statement fraud detection: distinguishing multimodal financial data with fine-grained attention. Decis. Support Syst., 113913 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Herrera-Semenets, V., Bustio-Martínez, L., González-Ordiano, J.Á., van den Berg, J. (2025). Tax Underreporting Detection Using an Unsupervised Learning Approach. In: Martínez-Villaseñor, L., Ochoa-Ruiz, G. (eds) Advances in Soft Computing. MICAI 2024. Lecture Notes in Computer Science(), vol 15247. Springer, Cham. https://doi.org/10.1007/978-3-031-75543-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-75543-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75542-2
Online ISBN: 978-3-031-75543-9
eBook Packages: Computer ScienceComputer Science (R0)