What Are the Limits of Evolutionary Induction of Decision Trees?

Jurczuk, Krzysztof; Reska, Daniel; Kretowski, Marek

doi:10.1007/978-3-319-99259-4_37

What Are the Limits of Evolutionary Induction of Decision Trees?

Krzysztof Jurczuk¹⁹,
Daniel Reska¹⁹ &
Marek Kretowski¹⁹

Conference paper
First Online: 21 August 2018

1330 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11102))

Abstract

For typical assessment of applying machine learning or data mining techniques, accuracy and interpretability are usually the most important elements. However, when the analyst is faced with real contemporary big data problems, scalability and efficiency become crucial factors. Parallel and distributed processing support is often an indispensable component of operational solutions.

In the paper, we investigate the applicability of evolutionary induction of decision trees to large-scale data. We focus on the existing Global Decision Tree system, which searches the tree structure and tests in one run of an evolutionary algorithm. Evolved individuals are not encoded, so the specialized genetic operators and their application schemes are used. As in most evolutionary data mining systems, every fitness evaluation needs processing the whole training dataset. For high-dimensional datasets, this operation is very time consuming and to overcome this deficiency, two acceleration solutions, based on the most promising, latest approaches (NVIDIA CUDA and Apache Spark) are presented. The fitness calculations are delegated, while the core evolution is unchanged. In the experimental part, among others, we identify what are dataset dimensions which can be efficiently processed in the fixed time interval.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A candidate threshold for the given attribute is defined as the midpoint between such a successive pair of objects in the sequence sorted by the increasing value of the attribute, in which the objects are characterized by different classes.

References

NVIDIA Developer Zone - CUDA Toolkit Documentation (2018). https://docs.nvidia.com/cuda/cuda-c-programming-guide/
Barros, R.C., Basgalupp, M.P., De Carvalho, A.C., Freitas, A.A.: A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(3), 291–312 (2012)
Article Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
MATH Google Scholar
Cano, A.: A survey on graphic processing unit computing for large-scale data mining. WIREs: Data Min. Knowl. Discov. 8(1), e1232 (2018)
Google Scholar
Chitty, D.: Improving the performance of GPU-based genetic programming through exploitation of on-chip memory. Soft Comput. 20(2), 661–680 (2016)
Article Google Scholar
Czajkowski, M., Kretowski, M.: Evolutionary induction of global model trees with specialized operators and memetic extensions. Inf. Sci. 288, 153–173 (2014)
Article Google Scholar
Deng, C., Tan, X., Dong, X., Tan, Y.: A parallel version of differential evolution based on resilient distributed datasets model. In: Gong, M., Pan, L., Song, T., Tang, K., Zhang, X. (eds.) BIC-TA 2015. CCIS, vol. 562, pp. 84–93. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-49014-3_8
Chapter Google Scholar
Ferranti, A., Marcelloni, F., Segatori, A., Antonelli, M., Ducange, P.: A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data. Inf. Sci. 415–416, 319–340 (2017)
Article Google Scholar
Fonseca, A., Cabral, B.: Prototyping a GPGPU neural network for deep-learning big data analysis. Big Data Res. 8, 50–56 (2017)
Article Google Scholar
Funika, W., Koperek, P.: Towards a scalable distributed fitness evaluation service. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 493–502. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_46
Chapter Google Scholar
Jinjing, L., Qingkui, C., Bocheng, L.: Classification and disease probability prediction via machine learning programming based on multi-gpu cluster mapreduce system. J. Supercomput. 73(5), 1782–1809 (2017)
Article Google Scholar
Jurczuk, K., Czajkowski, M., Kretowski, M.: Evolutionary induction of a decision tree for large-scale data: a GPU-based approach. Soft Comput. 21(24), 7363–7379 (2017)
Article Google Scholar
Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39(4), 261–283 (2013)
Article Google Scholar
Koza, J.R.: Concept formation and decision tree induction using the genetic programming paradigm. In: Schwefel, H.-P., Männer, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 124–128. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0029742
Chapter Google Scholar
Kretowski, M., Grzes, M.: Evolutionary induction of mixed decision trees. Int. J. Data Warehous. Min. (IJDWM) 3(4), 68–82 (2007)
Article Google Scholar
Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Heidelberg (1996). https://doi.org/10.1007/978-3-662-03315-9
Book MATH Google Scholar
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Discov. 2(4), 345–389 (1998)
Article Google Scholar
Pulgar-Rubio, F.J., Rivera-Rivas, A.J., Pérez-Godoy, M.D., González, P., Carmona, C.J., del Jesus, M.J.: MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a MapReduce solutioon. Knowl.-Based Syst. 117, 70–78 (2017)
Article Google Scholar
Reska, D., Jurczuk, K., Kretowski, M.: Evolutionary induction of classification trees on spark. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2018. LNCS (LNAI), vol. 10841, pp. 514–523. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91253-0_48
Chapter Google Scholar
Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers-a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 35(4), 476–487 (2005)
Article Google Scholar
Storti, D., Yurtoglu, M.: CUDA for Engineers : An Introduction to High-Performance Parallel Computing. Addison-Wesley, New York (2016)
Google Scholar
Teijeiro, D., Pardo, X.C., González, P., Banga, J.R., Doallo, R.: Implementing parallel differential evolution on spark. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9598, pp. 75–90. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31153-1_6
Chapter Google Scholar
Yuen, D., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y.: GPU Solutions to Multi-scale Problems in Science and Engineering. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-16405-7
Book Google Scholar
Zaharia, M.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the grant S/WI/2/18 from BUT founded by Polish Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Faculty of Computer Science, Bialystok University of Technology, Wiejska 45a, 15-351, Bialystok, Poland
Krzysztof Jurczuk, Daniel Reska & Marek Kretowski

Authors

Krzysztof Jurczuk
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Reska
View author publications
You can also search for this author in PubMed Google Scholar
Marek Kretowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krzysztof Jurczuk .

Editor information

Editors and Affiliations

Inria Saclay, Palaiseau, France
Anne Auger
University of Coimbra, Coimbra, Portugal
Carlos M. Fonseca
University of Coimbra, Coimbra, Portugal
Nuno Lourenço
University of Coimbra, Coimbra, Portugal
Penousal Machado
University of Coimbra, Coimbra, Portugal
Luís Paquete
Colorado State University, Fort Collins, Colorado, USA
Darrell Whitley

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jurczuk, K., Reska, D., Kretowski, M. (2018). What Are the Limits of Evolutionary Induction of Decision Trees?. In: Auger, A., Fonseca, C., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds) Parallel Problem Solving from Nature – PPSN XV. PPSN 2018. Lecture Notes in Computer Science(), vol 11102. Springer, Cham. https://doi.org/10.1007/978-3-319-99259-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-99259-4_37
Published: 21 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99258-7
Online ISBN: 978-3-319-99259-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics