Tax Underreporting Detection Using an Unsupervised Learning Approach

Herrera-Semenets, Vitali; Bustio-Martínez, Lázaro; González-Ordiano, Jorge Ángel; van den Berg, Jan

doi:10.1007/978-3-031-75543-9_2

Vitali Herrera-Semenets⁹,
Lázaro Bustio-Martínez¹⁰,
Jorge Ángel González-Ordiano¹¹ &
…
Jan van den Berg¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15247))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

147 Accesses

Abstract

Governmental adminstrative domains can potentially benefit from a wide variety of currently available big data analysis methods. The tax administration is such an area that requires massive data processing to identify hidden patterns and trends of possible tax evasion. The use of supervised methods can be effective in these cases, but the lack of available labeled data limits their practical application in real-world scenarios. An alternative is the use of unsupervised methods, which have potential benefits in certain cases. In this sense, unsupervised methods are considered to be feasible as a decision support tool in tax evasion risk management systems. This paper proposes an unsupervised approach to identify signs of tax evasion by detecting, possible, tax underreporting. The proposed strategy is evaluated on a data set associated with individual income tax statistics of the United States. The results achieved are considered to be useful in decision-making and preventive actions on cases reported as suspicious.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using a Data-Driven Model to Predict Taxpayers Filing False Returns: A Case of Zambia Revenue Authority

Sammon Mapping-Based Gradient Boosted Trees for Tax Crime Prediction in the City of São Paulo

Indicators for Smart Cities: Tax Illicit Analysis Through Data Mining

References

Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. ACM SIGMOD Rec. 28(2), 49–60 (1999)
Article Google Scholar
Bai, L., Liang, J.: A categorical data clustering framework on graph representation. Pattern Recogn. 128, 108694 (2022)
Article Google Scholar
Center, T.P.: The state of state (and local) tax policy (2023). https://www.taxpolicycenter.org/briefing-book/how-do-state-and-local-corporate-income-taxes-work. Accessed 3 Mar 2023
De Roux, D., Perez, B., Moreno, A., Villamil, M.D.P., Figueroa, C.: Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 215–222 (2018)
Google Scholar
Devassy, B.M., George, S.: Dimensionality reduction and visualisation of hyperspectral ink data using t-SNE. Forensic Sci. Int. 311, 110194 (2020)
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Google Scholar
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics, 857–871 (1971)
Google Scholar
IRS: Individual income tax statistics data set (2023). https://www.irs.gov/pub/irs-soi/19zpallnoagi.csv. Accessed 1 Mar 2023
Kassa, E.T.: Factors influencing taxpayers to engage in tax evasion: evidence from Woldia City administration micro, small, and large enterprise taxpayers. J. Innov. Entrepreneurship 10(1), 1–16 (2021)
Article MathSciNet Google Scholar
Mehta, P., Mathews, J., Bisht, D., Suryamukhi, K., Kumar, S., Babu, C.S.: Detecting tax evaders using TrustRank and spectral clustering. In: Abramowicz, W., Klein, G. (eds.) BIS 2020. LNBIP, vol. 389, pp. 169–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53337-3_13
Chapter Google Scholar
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. (CSUR) 54(2), 1–38 (2021)
Article Google Scholar
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. Disc. 2, 169–194 (1998)
Article Google Scholar
Savić, M., Atanasijević, J., Jakovetić, D., Krejić, N.: Tax evasion risk management using a hybrid unsupervised outlier detection method. Expert Syst. Appl. 193, 116409 (2022)
Article Google Scholar
Schultz, M., Tropmann-Frick, M.: Autoencoder neural networks versus external auditors: detecting unusual journal entries in financial statement audits. In: Hawaii International Conference on System Sciences (2020)
Google Scholar
Vâlsan, C., Druică, E., Ianole-Călin, R.: State capacity and tolerance towards tax evasion: first evidence from Romania. Adm. Sci. 10(2), 33 (2020)
Article Google Scholar
Vanhoeyveld, J., Martens, D., Peeters, B.: Value-added tax fraud detection with scalable anomaly detection techniques. Appl. Soft Comput. 86, 105895 (2020). https://doi.org/10.1016/j.asoc.2019.105895, https://www.sciencedirect.com/science/article/pii/S1568494619306763
Wang, G., Ma, J., Chen, G.: Attentive statement fraud detection: distinguishing multimodal financial data with fine-grained attention. Decis. Support Syst., 113913 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Technologies Application Center (CENATAV), La Habana, Cuba
Vitali Herrera-Semenets
Departamento de Estudios en Ingeniería para la Innovación, Universidad Iberoamericana Ciudad de México, Mexico City, Mexico
Lázaro Bustio-Martínez
Instituto de Investigación Aplicada y Tecnología, Universidad Iberoamericana Ciudad de México, Mexico City, Mexico
Jorge Ángel González-Ordiano
Intelligent Systems Department, Delft University of Technology, Delft, Netherlands
Jan van den Berg

Authors

Vitali Herrera-Semenets
View author publications
You can also search for this author in PubMed Google Scholar
Lázaro Bustio-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Ángel González-Ordiano
View author publications
You can also search for this author in PubMed Google Scholar
Jan van den Berg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vitali Herrera-Semenets .

Editor information

Editors and Affiliations

Universidad Panamericana, Mexico City, Distrito Federal, Mexico
Lourdes Martínez-Villaseñor
Instituto Tecnológico y de Estudios Superiores de Monterrey, Jalisco, Mexico
Gilberto Ochoa-Ruiz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herrera-Semenets, V., Bustio-Martínez, L., González-Ordiano, J.Á., van den Berg, J. (2025). Tax Underreporting Detection Using an Unsupervised Learning Approach. In: Martínez-Villaseñor, L., Ochoa-Ruiz, G. (eds) Advances in Soft Computing. MICAI 2024. Lecture Notes in Computer Science(), vol 15247. Springer, Cham. https://doi.org/10.1007/978-3-031-75543-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-75543-9_2
Published: 17 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75542-2
Online ISBN: 978-3-031-75543-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tax Underreporting Detection Using an Unsupervised Learning Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using a Data-Driven Model to Predict Taxpayers Filing False Returns: A Case of Zambia Revenue Authority

Sammon Mapping-Based Gradient Boosted Trees for Tax Crime Prediction in the City of São Paulo

Indicators for Smart Cities: Tax Illicit Analysis Through Data Mining

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Tax Underreporting Detection Using an Unsupervised Learning Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using a Data-Driven Model to Predict Taxpayers Filing False Returns: A Case of Zambia Revenue Authority

Sammon Mapping-Based Gradient Boosted Trees for Tax Crime Prediction in the City of São Paulo

Indicators for Smart Cities: Tax Illicit Analysis Through Data Mining

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation