Abstract
Evolutionary induction of decision trees is an emerging alternative to greedy top-down approaches. Its growing popularity results from good prediction performance and less complex output trees. However, one of the major drawbacks associated with the application of evolutionary algorithms is the tree induction time, especially for large-scale data. In the paper, we design and implement a graphics processing unit (GPU)-based parallelization of evolutionary induction of decision trees. We apply a Compute Unified Device Architecture programming model, which supports general-purpose computation on a GPU (GPGPU). The selection and genetic operators are performed sequentially on a CPU, while the evaluation process for the individuals in the population is parallelized. The data-parallel approach is applied, and thus, the parts of a dataset are spread over GPU cores. Each core processes the assigned chunk of the data. Finally, the results from all GPU cores are merged and the sought tree metrics are sent to the CPU. Computational performance of the proposed approach is validated experimentally on artificial and real-life datasets. A comparison with the traditional CPU version shows that evolutionary induction of decision trees supported by GPGPU can be accelerated significantly (even up to 800 times) and allows for processing of much larger datasets.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alba E, Tomassini M (2002) Parallelism and evolutionary algorithms. IEEE Trans Evol Comput 6(5):443–462
Anderson DT, Luke RH, Keller JM (2008) Speedup of fuzzy clustering through stream processing on graphics processing units. IEEE Trans Fuzzy Syst 16:1101–1106
Bacardit J, Llora X (2013) Large-scale data mining using genetics-based machine learning. WIREs Data Min Knowl Discov 3:37–61
Barros RC, Basgalupp MP, Carvalho AC, Freitas AA (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans SMC C 42(3):291–312
Blake C, Keogh E, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth Int. Group, Belmont
Bull L, Studley M, Bagnall A, Whittley I (2007) Learning classifier system ensembles with rule-sharing. IEEE Trans Evol Comput 11:496–502
Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16:187–202
Cano A, Olmo JL, Ventura S (2013) Parallel multi-objective ant programming for classification using GPUs. J Parallel Distrib Comput 73:713–728
Cano A, Luna JM, Ventura S (2013) High performance evaluation of evolutionary-mined association rules on GPUs. J Supercomput 66(3):1438–1461
Cano A, Luna JM, Ventura S (2014) Parallel evaluation of Pittsburgh rule-based classifiers on GPUs. Neurocomputing 126:45–57
Cano A, Ventura S (2014) GPU-parallel subtree interpreter for genetic programming. In: Proceedings of GECCO’14, pp 887–894
Cano A, Luna JM, Ventura S (2015) Speeding up multiple instance learning classification rules on GPUs. Knowl Inf Syst 44(1):127–145
Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms. Kluwer Academic, Norwell
Chitty DM (2012) Fast parallel genetic programming: multi-core CPU versus many-core GPU. Soft Comput 16:1795–1814
Chitty DM (2016) Improving the performance of GPU-based genetic programming through exploitation of on-chip memory. Soft Comput 20(2):661–680
Crepinsek M, Liu S, Mernik M (2013) Exploration and exploitation in evolutionary algorithms: a survey. ACM Comput Surv 45(3):35:1–35:33
Czajkowski M, Kretowski M (2014) Evolutionary induction of global model trees with specialized operators and memetic extensions. Inf Sci 288:153–173
Czajkowski M, Czerwonka M, Kretowski M (2015) Cost-sensitive global model trees applied to loan charge-off forecasting. Decis Support Syst 74:55–66
Czajkowski M, Jurczuk K, Kretowski M (2015) A parallel approach for evolutionary induced decision trees. MPI+OpenMP implementation. In: Proceedings of ICAISC’15. Lecture notes in computer science, vol 9119, pp 340–349
Esposito F, Malerba D, Semeraro G (1997) A comparative analysis of methods for pruning decision trees. IEEE Trans Pattern Anal Mach Intell 19(5):476–491
Fabris F, Krohling RA (2012) A co-evolutionary differential evolution algorithm for solving min-max optimization problems implemented on GPU using C-CUDA. Expert Syst Appl 39(12):10324–10333
Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (1996) Advances in knowledge discovery and data mining. AAAI Press, Palo Alto
Franco MA, Krasnogor N, Bacardit J (2010) Speeding up the evaluation of evolutionary learning systems using GPGPUs. In: Proceedings of GECCO 10. ACM, New York, pp 1039–1046
Franco MA, Bacardit J (2016) Large-scale experimental evaluation of GPU strategies for evolutionary machine learning. Inf Sci 330:385–402
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer, Secaucus
Grahn H, Lavesson N, Lapajne MH, Slat D (2011) CudaRF: a CUDA-based implementation of random forests. In: Proceedings of IEEE/ACS, pp 95–101
Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing. Addison-Wesley, Reading
Grześ M, Kretowski M (2007) Decision tree approach to microarray data analysis. Biocybern Biomed Eng 27(3):29–42
Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1):15–17
Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29(2):119–127
Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283
Kretowski M (2004) An evolutionary algorithm for oblique decision tree induction. In: Proceedings of ICAISC’04. Lecture notes in computer science, vol 3070, pp 432–437
Kretowski M, Grześ M (2005) Global learning of decision trees by an evolutionary algorithm. In: Saeed K, Pejaś J (eds) Information processing and security systems. Springer, US, pp 401–410. http://link.springer.com/chapter/10.1007%2F0-387-26325-X_36
Kretowski M, Grześ M (2007) Evolutionary induction of mixed decision trees. Int J Data Wareh Min 3(4):68–82
Langdon WB (2011) Graphics processing units and genetic programming: an overview. Soft Comput 15:1657–1699
Langdon WB (2013) Large-scale bioinformatics data mining with parallel genetic programming on graphics processing units. In: Tsutsui S, Collet P (eds) Massively parallel evolutionary computation on GPGPUs, Springer, Berlin, Heidelberg, pp 311–347
Llora X (2002) Genetics-based machine learning using fine-grained parallelism for data mining. Ph.D. Thesis. Barcelona, Ramon Llull University
Lo WT, Chang YS, Sheu RK, Chiu CC, Yuan SM (2014) CUDT: a CUDA based decision tree algorithm. Sci World J 1–12. http://www.hindawi.com/journals/tswj/2014/745640/
Loh W (2014) Fifty years of classification and regression trees. Int Stat Rev 83(3):329–348
Luong TV, Melab N, Talbi E (2010) GPU-based island model for evolutionary algorithms. In: Proceedings of GECCO ’10. ACM, New York, pp 1089–1096
Maitre O, Kruger F, Querry S, Lachiche N, Collet P (2012) EASEA: specification and execution of evolutionary algorithms on GPGPU. Soft Comput 16:261–279
Marron D, Bifet A, Morales GF (2014) Random forests of very fast decision trees on GPU for mining evolving big data streams. In: Proceedings of ECAI, pp 615–620
Michalewicz Z (1996) Genetic algorithms \(+\) data structures \(=\) evolution programs, 3rd edn. Springer, Berlin
Nasridonov A, Lee Y, Park YH (2014) Decision tree construction on GPU: ubiquitous parallel computing approach. Computing 96(5):403–413
NVIDIA (2015) CUDA C programming guide. Technical report. https://docs.nvidia.com/cuda/cuda-c-programming-guide/
NVIDIA (2015) CUDA C best practices guide in CUDA toolkit. Technical report. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/
Oh KS, Jung K (2014) GPU implementation of neural networks. Pattern Recogn 37(6):1311–1314
Oiso M, Matsumura Y, Yasuda T, Ohkura K (2011) Implementing genetic algorithms to CUDA environment using data parallelization. Tech Gaz 18(4):511–517
Quinlan JR (1992) Learning with continuous classes. In: Proceedings of AI’92, World Scientific, pp 343–348
Rokach L, Maimon OZ (2005) Top–down induction of decision trees classifiers—a survey. IEEE Trans SMC C 35(4):476–487
Rokach L, Maimon OZ (2008) Data mining with decision trees: theory and application. Mach Percept Artif Intell 69. http://www.worldscientific.com/worldscibooks/10.1142/6604
Soca N, Blengio JL, Pedemonte M, Ezzatti P (2010) PUGACE, a cellular evolutionary algorithm framework on GPUs. In: Proceedings of IEEE congress on evolutionary computation (CEC), pp 1–8
Strnad D, Nerat A (2016) Parallel construction of classification trees on a GPU. Concurr Comput Pract Exp 28(5):1417–1436
Tsutsui S, Collet P (2013) Massively parallel evolutionary computation on GPGPUs. Springer, Berlin
Veronese L, Krohling R (2010) Differential evolution algorithm on the GPU with C-CUDA: In: Proceedings of IEEE congress on evolutionary computation (CEC), pp 1–7
Wilt N (2013) Cuda handbook: a comprehensive guide to GPU programming. Addison-Wesley, Reading
Woodward JR (2003) GA or GP? That is not the question. In: Proceedings of IEEE CEC, pp 1056–1063
Yuen D, Wang L, Chi X, Johnsson L, Ge W (2013) GPU solutions to multi-scale problems in science and engineering. Springer, Berlin
Zhu W (2011) Nonlinear optimization with a massively parallel evolution strategy–pattern search algorithm on graphics hardware. Appl Soft Comput 11:1770–1781
Acknowledgments
This work was supported by the Grants W/WI/2/2014 (first author) and S/WI/2/2013 (third author) from Bialystok University of Technology founded by Ministry of Science and Higher Education as well as by the Polish National Science Center and a Grant allocated on the basis of decision 2013/09/N/ST6/04083 (second author).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Jurczuk, K., Czajkowski, M. & Kretowski, M. Evolutionary induction of a decision tree for large-scale data: a GPU-based approach. Soft Comput 21, 7363–7379 (2017). https://doi.org/10.1007/s00500-016-2280-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2280-1