Classification trees with soft splits optimized for ranking

Dvořák, Jakub

doi:10.1007/s00180-019-00867-1

Classification trees with soft splits optimized for ranking

Original Paper
Published: 04 February 2019

Volume 34, pages 763–786, (2019)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Jakub Dvořák¹

321 Accesses
2 Citations
Explore all metrics

Abstract

We consider softening of splits in classification trees generated from multivariate numerical data. This methodology improves the quality of the ranking of the test cases measured by the AUC. Several ways to determine softening parameters are introduced and compared including softening algorithm present in the standard methods C4.5 and C5.0. In the first part of the paper, a few settings of softening determined only from ranges of training data in the tree branches are explored. The trees softened with these settings are used to study the effect of using the Laplace correction together with soft splits. In a later part we introduce methods which employ maximization of the classifier’s performance on the training set over the domain of the softening parameters. The non-linear optimization algorithm Nelder–Mead is used and various target functions are considered. The target function evaluating the AUC on the training set is compared with functions summing over training cases some transformation of the error of score. Several data sets from the UCI repository are used in experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consistency of Probabilistic Classifier Trees

Addressing Local Class Imbalance in Balanced Datasets with Dynamic Impurity Decision Trees

Multivariate Predictive Clustering Trees for Classification

References

Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
MATH Google Scholar
Carter C, Catlett J (1987) Assessing credit card applications using machine learning. IEEE Expert 2(3):71–79
Article Google Scholar
Chen M, Ludwig SA (2013) Fuzzy decision tree using soft discretization and a genetic algorithm based feature selection method. In: 2013 World congress on nature and biologically inspired computing (NaBIC). IEEE, pp 238–244
Clémençon S, Depecker M, Vayatis N (2013) Ranking forests. J Mach Learn Res 14(1):39–73
MathSciNet MATH Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Article Google Scholar
Hüllermeier E, Vanderlooy S (2009) Why fuzzy decision trees are good rankers. Trans Fuzzy Syst 17(6):1233–1244
Article Google Scholar
Janikow CZ, Kawa K (2005) Fuzzy decision tree FID. In: Proceedings of NAFIPS, pp 379–384
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214
Article Google Scholar
Kumar GK, Viswanath P, Rao AA (2016) Ensemble of randomized soft decision trees for robust classification. Sādhanā 41(3):273–282
MathSciNet MATH Google Scholar
Leisch F, Dimitriadou E (2009) mlbench: Machine Learning Benchmark Problems. R package version 1.1-6
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Google Scholar
Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Sciences, Irvine. http://archive.ics.uci.edu/ml. Accessed 3 Feb 2016
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Article MathSciNet MATH Google Scholar
Norouzi M, Collins MD, Johnson M, Fleet DJ, Kohli P (2015) Efficient non-greedy optimization of decision trees. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. MIT Press Cambridge, pp 1729–1737
Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 138(2):221–254
Article MathSciNet Google Scholar
Otero FE, Freitas AA, Johnson CG (2012) Inducing decision trees with an ant colony optimization algorithm. Appl Soft Comput 12(11):3615–3626
Article Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Sofeikov KI, Tyukin IY, Gorban AN, Mirkes EM, Prokhorov DV, Romanenko IV (2014) Learning optimization for decision tree classification of non-categorical data with information gain impurity criterion. In: 2014 International joint conference on neural networks (IJCNN). IEEE, pp 3548–3555
Suárez A, Lutsko JF (1999) Globally optimal fuzzy decision trees for classification and regression. IEEE Trans Pattern Anal Mach Intell 21:1297–1311
Article Google Scholar
Yıldız OT, İrsoy O, Alpaydın E (2016) Bagging soft decision trees. In: Holzinger A (ed) Machine learning for health informatics: state-of-the-art and future challenges. Springer, Cham, pp 25–36

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod Vodárenskou věží 271/2, 182 07, Prague 8, Czech Republic
Jakub Dvořák

Authors

Jakub Dvořák
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Dvořák.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by SVV Project Number 260 333.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dvořák, J. Classification trees with soft splits optimized for ranking. Comput Stat 34, 763–786 (2019). https://doi.org/10.1007/s00180-019-00867-1

Download citation

Received: 21 September 2016
Accepted: 16 January 2019
Published: 04 February 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00180-019-00867-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification trees with soft splits optimized for ranking

Abstract

Access this article

Similar content being viewed by others

Consistency of Probabilistic Classifier Trees

Addressing Local Class Imbalance in Balanced Datasets with Dynamic Impurity Decision Trees

Multivariate Predictive Clustering Trees for Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classification trees with soft splits optimized for ranking

Abstract

Access this article

Similar content being viewed by others

Consistency of Probabilistic Classifier Trees

Addressing Local Class Imbalance in Balanced Datasets with Dynamic Impurity Decision Trees

Multivariate Predictive Clustering Trees for Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation