research-article

Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem

Authors:
Suraya Masrom

Universiti Teknologi MARA, Perak Branch Malaysia

Universiti Teknologi MARA, Perak Branch Malaysia
View Profile

,
Rahayu Abdul Rahman

Universiti Teknologi MARA, Perak Branch Tapah Campus Malaysia

Universiti Teknologi MARA, Perak Branch Tapah Campus Malaysia
View Profile

,
Norhayati Baharun

Universiti Teknologi MARA, Perak Branch Malaysia

Universiti Teknologi MARA, Perak Branch Malaysia
View Profile

,
Abdullah Sani Abd Rahman

University Teknologi PETRONAS, Perak, Malaysia

University Teknologi PETRONAS, Perak, Malaysia
View Profile

ICEIT 2020: Proceedings of the 2020 9th International Conference on Educational and Information TechnologyFebruary 2020Pages 139–143https://doi.org/10.1145/3383923.3383942

Published:23 April 2020Publication History

ICEIT 2020: Proceedings of the 2020 9th International Conference on Educational and Information Technology

Pages 139–143

ABSTRACT

Dealing with real application datasets often derive a stumbling block for machine learning algorithms to produce good results in solving either prediction or classification problems. Imbalance dataset is the major reason for this problem associated with missing values, small dimension of data size and very skewed data distribution. This paper demonstrates an empirical study that used Automated Machine Learning (AML) based on Genetic Programming (GP) named as AML TPOT. This is a very recent AML developed as an open source Python library and reported as a promising model by a few of researchers who have tested the algorithm. Nevertheless, most of the works on the AML TPOT were conducted on a set of common or benchmark datasets for machine learning testing. In this paper, the focus is on real and deviant dataset, which were collected according to the tax avoidance of the Government-Link Company in Malaysia. Comparison of the AML performances that tested on the dataset with different GP parameters setting is provided. Thus, this paper provides a fundamental knowledge on the experimental design and finding that will be useful for the AML based GP future improvement.

References

Olson, R. S., Bartley, N., Urbanowicz, R. J., & Moore, J. H. 2016. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. 485--492.Google Scholar
Gijsbers, P., Vanschoren, J. and Olson, R. S. 2017. Layered TPOT: Speeding up tree-based pipeline optimization. in CEUR Workshop Proceedings, vol. 1998.Google Scholar
Suganuma, M., Shirakawa, S. and Nagao, T. 2017. A Genetic Programming Approach to Designing Convolutional Neural Network Architectures, in Proceedings of the Genetic and Evolutionary Computation Conference, 497--504.Google Scholar
Stanley, K. O. and Miikkulainen, R. 2002. Evolving neural networks through augmenting topologies. Evol. Comput., 10, 2, 99--127.Google Scholar
Liu, L. and Shao, L. 2013, Learning discriminative representations from RGB-D video data. in Proceedings of Twenty-Third International Joint Conference on Artificial Intelligence.Google Scholar
Shao, L., Liu, L. and Li, X. 2014. Feature Learning for Image Classification Via Multiobjective Genetic Programming. IEEE Trans. Neural Networks Learn. Syst., 25, 7, 1359--1371.Google Scholar
Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F. and Leyton-Brown, K., 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal Mach. Learn. Res., 18, 1, 826--830.Google Scholar
Thornton, C., Hutter, F., Hoos, H. H. and Leyton-Brown, K. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 847--855.Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F. 2015. Efficient and robust automated machine learning. in Advances in neural information processing systems, 2962--2970.Google Scholar
Brochu, E., Cora, V.M. and De Freitas, N., 2010. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599.Google Scholar
Biem, A., Butrico, M., Feblowitz, M., Klinger, T., Malitsky, Y., Ng, K., Perer, A., Reddy, C., Riabov, A., Samulowitz, H. and Sow, D., 2015. Towards cognitive automation of data science. in Proceeding of Twenty-Ninth AAAI Conference on Artificial Intelligence. 4268--4269.Google Scholar
Khurana, U., Parthasarathy, S. and Turaga, D. S. 2014. READ: Rapid data Exploration, Analysis and Discovery. in EDBT. 612--615.Google Scholar
Wistuba, M., Schilling, N. and Schmidt-Thieme, L. 2017. Automatic Frankensteining: Creating complex ensembles autonomously. in Proceedings of the 2017 SIAM International Conference on Data Mining. 741--749.Google Scholar
Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A. and Moore, J.H. 2016. March. Automating biomedical data science through tree-based pipeline optimization. In Proceedings of European Conference on the Applications of Evolutionary Computation. 123--137.Google Scholar
Kinnear, K.E., Langdon, W.B., Spector, L., Angeline, P.J. and O'Reilly, U.M. eds., 1999. Advances in genetic programming (Vol. 3). MIT press.Google Scholar
Affenzeller, M., Wagner, S., Winkler, S. and Beham, A., 2009. Genetic algorithms and genetic programming: modern concepts and practical applications. Chapman and Hall/CRC.Google Scholar
Langdon, W. B., and Harman, M. 2014. Optimizing existing software with genetic programming. IEEE Transactions on Evolutionary Computation. 19(1), 118--135.Google Scholar
Liu, L., Shao, L., Li, X. and Lu, K., 2015. Learning spatiotemporal representations for action recognition: A genetic programming approach. IEEE transactions on cybernetics. 46(1), pp.158--170.Google Scholar
Chen, S., Chen, X., Cheng, Q., & Shevlin, T. 2010. Are family firms more tax aggressive than non-family firms?. Journal of Financial Economics. 95(1), 41--61.Google Scholar
Wahab, A., Aswadi, E., Ariff, A.M., Madah Marzuki, M. and Mohd Sanusi, S., 2017. Political connections, corporate governance, and tax aggressiveness in Malaysia. Asian Review of Accounting. 25(3), 1--54.Google Scholar
Lismont, J., Cardinaels, E., Bruynseels, L., De Groote, S., Baesens, B., Lemahieu, W., & Vanthienen, J. (2018). Predicting tax avoidance by means of social network analytics. Decision Support Systems. 108, 13--24.Google Scholar
Rahman, R.A, Masrom, S. and Omar, N. 2019. Tax Avoidance Detection Based on Machine Learning of Malaysian Government-Linked Companies, 2(1), 535--541.Google Scholar

Index Terms

Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Genetic programming

Recommendations

Genetic Programming with a Genetic Algorithm for Feature Construction and Selection

The use of machine learning techniques to automatically analyse data for information is becoming increasingly widespread. In this paper we primarily examine the use of Genetic Programming and a Genetic Algorithm to pre-process data before it is ...
Read More
An automated ensemble learning framework using genetic programming for image classification
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

An ensemble consists of multiple learners and can achieve a better generalisation performance than a single learner. Genetic programming (GP) has been applied to construct ensembles using different strategies such as bagging and boosting. However, no GP-...
Read More
Classification of multiclass datasets using genetic programming
RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems

This paper¹ proposes an approach which uses optimized genetic programming (GP) with a new fitness function for multiclass dataset classification. In place of defining static thresholds as boundaries to differentiate between multiple classes, our work ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICEIT 2020: Proceedings of the 2020 9th International Conference on Educational and Information Technology
February 2020
268 pages
ISBN:9781450375085
DOI:10.1145/3383923

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Classification
Genetic Programming
Machine Learning
Tax Avoidance
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 161
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem

ICEIT 2020: Proceedings of the 2020 9th International Conference on Educational and Information Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Genetic Programming with a Genetic Algorithm for Feature Construction and Selection

An automated ensemble learning framework using genetic programming for image classification

Classification of multiclass datasets using genetic programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem

ICEIT 2020: Proceedings of the 2020 9th International Conference on Educational and Information Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Genetic Programming with a Genetic Algorithm for Feature Construction and Selection

An automated ensemble learning framework using genetic programming for image classification

Classification of multiclass datasets using genetic programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media