Abstract
State-of-the-art decision tree algorithms are top-down induction heuristics which greedily partition the attribute space by iteratively choosing the best split on an individual attribute. Despite their attractive performance in terms of runtime, simple examples, such as the XOR-Problem, point out that these heuristics often fail to find the best classification rules if there are strong interactions between two or more attributes from the given datasets. In this context, we present a branch and bound based decision tree algorithm to identify optimal bivariate axis-aligned splits according to a given impurity measure. In contrast to a univariate split that can be found in linear time, such an optimal cross-split has to consider every combination of values for every possible selection of pairs of attributes which leads to a combinatorial optimization problem that is quadratic in the number of values and attributes. To overcome this complexity, we use a branch and bound procedure, a well known technique from combinatorial optimization, to divide the solution space into several sets and to detect the optimal cross-splits in a short amount of time. These cross splits can either be used directly to construct quaternary decision trees or they can be used to select only the better one of the individual splits. In the latter case, the outcome is a binary decision tree with a certain sense of foresight for correlated attributes. We test both of these variants on various datasets of the UCI Machine Learning Repository and show that cross-splits can consistently produce smaller decision trees than state-of-the-art methods with comparable accuracy. In some cases, our algorithm produces considerably more accurate trees due to the ability of drawing more elaborate decisions than univariate induction algorithms.
Similar content being viewed by others
References
Abreu, N.G.C.F.M., et al.: Analise do perfil do cliente recheio e desenvolvimento de um sistema promocional. Ph.D thesis (2011)
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: A public domain dataset for human activity recognition using smartphones. In: ESANN (2013)
Bhatt, R., Dhall, A.: Skin segmentation dataset. UCI Machine Learning Repository
Breiman, L.: Some properties of splitting criteria. Mach. Learn. 24(1), 41–47 (1996)
Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. The Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis. https://books.google.de/books?id=JwQx-WOmSyQC (1984)
Brodley, C.E., Utgoff, P.E.: Multivariate decision trees. Mach Learn 19(1), 45–77 (1995)
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Łukasik, S., Żak, S.: Complete gradient clustering algorithm for features analysis of x-ray images. In: Information Technologies in Biomedicine, pp 15–24. Springer (2010)
Cicalese, F., Laber, E.: Approximation algorithms for clustering via weighted impurity measures (2018)
Coppersmith, D., Hong, S.J., Hosking, J.R.: Partitioning nominal attributes in decision trees. Data Min. Knowl. Disc. 3(2), 197–217 (1999)
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support. Syst. 47 (4), 547–553 (2009)
Czerniak, J., Zarzycki, H.: Application of rough sets in the presumptive diagnosis of urinary system diseases. In: Sołdek, J., Drobiazgiewicz, L. (eds.) Artificial Intelligence and Security in Computing Systems, pp 41–51. Springer, Boston (2003)
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M.: Heart disease databases (1988)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning, vol. 1. Springer Series in Statistics, New York (2001)
Gil, D., Girela, J.L., De Juan, J., Gomez-Torres, M.J., Johnsson, M.: Predicting seminal quality with artificial intelligence methods. Expert Syst. Appl. 39 (16), 12564–12573 (2012)
Laurent, H., Rivest, R.L.: Constructing optimal binary decision trees is np-complete. Inform. Process. Lett. 5(1), 15–17 (1976)
Little, M.A., McSharry, P.E., Roberts, S.J., Costello, D.A., Moroz, I.M.: Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed. Eng. Online 6(1), 23 (2007)
Mangasarian, O.L., Street, W.N., Wolberg, W.H.: Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4), 570–577 (1995)
Mingers, J.: An empirical comparison of pruning methods for decision tree induction. Mach. Learn. 4(2), 227–243 (1989)
Mirzamomen, Z., Fekri, M.N., Kangavari, M.: Cross split decision trees for pattern classification. In: 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE), pp 240–245. IEEE (2015)
Murthy, S.K.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining Knowl. Discov. 2(4), 345–389 (1998)
Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trees. J. Artif. Intell. Res. 2, 1–32 (1994)
Murthy, S.K., Kasif, S., Salzberg, S., Beigel, R.: Oc1: A randomized algorithm for building oblique decision trees. In: Proceedings of AAAI, vol. 93, pp 322–327. Citeseer (1993)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Quinlan, J.R.: C4. 5: Programs for Machine Learning. Elsevier (2014)
Thrun, S.B., Bala, J., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., Jong, K.D., Dzeroski, S., Fahlman, S.E., Fisher, D., Hamann, R., Kaufman, K., Keller, S., Kononenko, I., Kreuziger, J., Michalski, R., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., Welde, W.V.D., Wenzel, W., Wnek, J., Zhang, J.: The monk’s problems a performance comparison of different learning algorithms. Tech rep (1991)
Tsanas, A., Xifara, A.: Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012)
Yeh, I.C., Lien, C.h.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)
Yeh, I.C., Yang, K.J., Ting, T.M.: Knowledge discovery on rfm model using bernoulli sequence. Expert Syst. Appl. 36(3, Part 2), 5866–5871 (2009). https://doi.org/10.1016/j.eswa.2008.07.018. http://www.sciencedirect.com/science/article/pii/S0957417408004508
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bollwein, F., Dahmen, M. & Westphal, S. A branch & bound algorithm to determine optimal cross-splits for decision tree induction. Ann Math Artif Intell 88, 291–311 (2020). https://doi.org/10.1007/s10472-019-09684-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-019-09684-0