ABSTRACT
Dealing with real application datasets often derive a stumbling block for machine learning algorithms to produce good results in solving either prediction or classification problems. Imbalance dataset is the major reason for this problem associated with missing values, small dimension of data size and very skewed data distribution. This paper demonstrates an empirical study that used Automated Machine Learning (AML) based on Genetic Programming (GP) named as AML TPOT. This is a very recent AML developed as an open source Python library and reported as a promising model by a few of researchers who have tested the algorithm. Nevertheless, most of the works on the AML TPOT were conducted on a set of common or benchmark datasets for machine learning testing. In this paper, the focus is on real and deviant dataset, which were collected according to the tax avoidance of the Government-Link Company in Malaysia. Comparison of the AML performances that tested on the dataset with different GP parameters setting is provided. Thus, this paper provides a fundamental knowledge on the experimental design and finding that will be useful for the AML based GP future improvement.
- Olson, R. S., Bartley, N., Urbanowicz, R. J., & Moore, J. H. 2016. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. 485--492.Google Scholar
- Gijsbers, P., Vanschoren, J. and Olson, R. S. 2017. Layered TPOT: Speeding up tree-based pipeline optimization. in CEUR Workshop Proceedings, vol. 1998.Google Scholar
- Suganuma, M., Shirakawa, S. and Nagao, T. 2017. A Genetic Programming Approach to Designing Convolutional Neural Network Architectures, in Proceedings of the Genetic and Evolutionary Computation Conference, 497--504.Google Scholar
- Stanley, K. O. and Miikkulainen, R. 2002. Evolving neural networks through augmenting topologies. Evol. Comput., 10, 2, 99--127.Google Scholar
- Liu, L. and Shao, L. 2013, Learning discriminative representations from RGB-D video data. in Proceedings of Twenty-Third International Joint Conference on Artificial Intelligence.Google Scholar
- Shao, L., Liu, L. and Li, X. 2014. Feature Learning for Image Classification Via Multiobjective Genetic Programming. IEEE Trans. Neural Networks Learn. Syst., 25, 7, 1359--1371.Google Scholar
- Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F. and Leyton-Brown, K., 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal Mach. Learn. Res., 18, 1, 826--830.Google Scholar
- Thornton, C., Hutter, F., Hoos, H. H. and Leyton-Brown, K. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 847--855.Google Scholar
- Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F. 2015. Efficient and robust automated machine learning. in Advances in neural information processing systems, 2962--2970.Google Scholar
- Brochu, E., Cora, V.M. and De Freitas, N., 2010. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599.Google Scholar
- Biem, A., Butrico, M., Feblowitz, M., Klinger, T., Malitsky, Y., Ng, K., Perer, A., Reddy, C., Riabov, A., Samulowitz, H. and Sow, D., 2015. Towards cognitive automation of data science. in Proceeding of Twenty-Ninth AAAI Conference on Artificial Intelligence. 4268--4269.Google Scholar
- Khurana, U., Parthasarathy, S. and Turaga, D. S. 2014. READ: Rapid data Exploration, Analysis and Discovery. in EDBT. 612--615.Google Scholar
- Wistuba, M., Schilling, N. and Schmidt-Thieme, L. 2017. Automatic Frankensteining: Creating complex ensembles autonomously. in Proceedings of the 2017 SIAM International Conference on Data Mining. 741--749.Google Scholar
- Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A. and Moore, J.H. 2016. March. Automating biomedical data science through tree-based pipeline optimization. In Proceedings of European Conference on the Applications of Evolutionary Computation. 123--137.Google Scholar
- Kinnear, K.E., Langdon, W.B., Spector, L., Angeline, P.J. and O'Reilly, U.M. eds., 1999. Advances in genetic programming (Vol. 3). MIT press.Google Scholar
- Affenzeller, M., Wagner, S., Winkler, S. and Beham, A., 2009. Genetic algorithms and genetic programming: modern concepts and practical applications. Chapman and Hall/CRC.Google Scholar
- Langdon, W. B., and Harman, M. 2014. Optimizing existing software with genetic programming. IEEE Transactions on Evolutionary Computation. 19(1), 118--135.Google Scholar
- Liu, L., Shao, L., Li, X. and Lu, K., 2015. Learning spatiotemporal representations for action recognition: A genetic programming approach. IEEE transactions on cybernetics. 46(1), pp.158--170.Google Scholar
- Chen, S., Chen, X., Cheng, Q., & Shevlin, T. 2010. Are family firms more tax aggressive than non-family firms?. Journal of Financial Economics. 95(1), 41--61.Google Scholar
- Wahab, A., Aswadi, E., Ariff, A.M., Madah Marzuki, M. and Mohd Sanusi, S., 2017. Political connections, corporate governance, and tax aggressiveness in Malaysia. Asian Review of Accounting. 25(3), 1--54.Google Scholar
- Lismont, J., Cardinaels, E., Bruynseels, L., De Groote, S., Baesens, B., Lemahieu, W., & Vanthienen, J. (2018). Predicting tax avoidance by means of social network analytics. Decision Support Systems. 108, 13--24.Google Scholar
- Rahman, R.A, Masrom, S. and Omar, N. 2019. Tax Avoidance Detection Based on Machine Learning of Malaysian Government-Linked Companies, 2(1), 535--541.Google Scholar
Index Terms
- Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem
Recommendations
Genetic Programming with a Genetic Algorithm for Feature Construction and Selection
The use of machine learning techniques to automatically analyse data for information is becoming increasingly widespread. In this paper we primarily examine the use of Genetic Programming and a Genetic Algorithm to pre-process data before it is ...
An automated ensemble learning framework using genetic programming for image classification
GECCO '19: Proceedings of the Genetic and Evolutionary Computation ConferenceAn ensemble consists of multiple learners and can achieve a better generalisation performance than a single learner. Genetic programming (GP) has been applied to construct ensembles using different strategies such as bagging and boosting. However, no GP-...
Classification of multiclass datasets using genetic programming
RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent SystemsThis paper1 proposes an approach which uses optimized genetic programming (GP) with a new fitness function for multiclass dataset classification. In place of defining static thresholds as boundaries to differentiate between multiple classes, our work ...
Comments