AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria

Yan, Jianjian; Zhang, Zhongnan; Dong, Huailin

doi:10.1007/s10489-020-02061-z

AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria

Published: 05 January 2021

Volume 51, pages 4744–4761, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

349 Accesses
2 Citations
Explore all metrics

Abstract

As it is well known, decision tree is a kind of data-driven classification model, and its primary core is the split criterion. Although a great deal of split criteria have been proposed so far, almost all of them focus on the global class distribution of the training data. However, they ignored the local class imbalance problem that commonly appears during the decision tree induction over balanced or roughly balanced binary class data sets. In the present study, this problem is investigated in detail and an adaptive approach based on multiple existing split criteria is proposed. In the proposed scheme, the local class imbalanced ratio is considered as the weight factor to weigh the importance between these split criteria so as to determine the optimal splitting point at each internal node. In order to evaluate the effectiveness of the proposed method, it is applied on twenty roughly balanced real-world binary class data sets. Experimental results show that the proposed method not only outperforms all other methods, but also improves the prediction accuracy of each class.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A Review on Random Forest: An Ensemble Classifier

References

Wu CC, Chen YL, Tang K (2019) Cost-sensitive decision tree with multiple resource constraints. Appl Intell 49(10):3765–3782
Article Google Scholar
Weinberg AI, Mark L (2018) Interpretable decision-tree induction in a big data parallel framework. Int J Appl Math Comput Sci 27(4):737–748
Cieslask DA, Chawla NV (2008) Learning decision trees for unbalanced data. Machine learning and knowledge discovery in databases. ECML PKDD, pp 241–256
Da Costa VGT, Junior SB (2018) Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining. Pattern Recogn Lett 116(1):22–28
Article Google Scholar
Jaworski M, Duda P, Rutkowski L (2018) New splitting criteria for decision trees in stationary data streams. IEEE Trans Neural Netw Learn Syst 19(6):2516–2529
Article MathSciNet Google Scholar
Ni Z, Yan Y, Si C, Wang H, Shen C (2018) Multi-label learning based deep transfer neural network for facial attribute classification. Pattern Recogn 80:225–240
Article Google Scholar
Jin KH, Mccann MT, Froustey E, Unser M (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Trans Image Process 26(9):4509–4522
Article MathSciNet Google Scholar
Yan X, Jia M (2018) A novel optimized SVM classification algorithm with multi-domain feature and its application to fault diagnosis of rolling bearing. Neurocomputing 313(3):47–64
Article Google Scholar
Richhariya B, Tanveer M (2018) EEG Signal classification using universum support vector machine. Expert Syst Appl 106:169–182
Article Google Scholar
Breiman L (2001) Random Forests. Mach Learn 45(1):5–32
Article Google Scholar
Yan JJ, Zhang ZN, Lin KH, Yang F, Luo XB (2020) A Hybrid Scheme-Based One-vs-All Decision Trees for Multi-Class Classification Tasks. Know-Based Syst, pp 198
Xia Y, Liu C, Li Y, Liu N (2017) A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241
Article Google Scholar
Soemers DJ, Brys T, Driessens K, Winands MH, Nowé A (2018) Adapting to concept drift in credit card transaction data streams using contextual bandits and decision trees. Proc AAAI:7831–7836
Li P, Wu X, Hu X, Wang H (2015) Learning concept-drifting data streams with random ensemble decision trees. Neurocomputing 166:68–83
Article Google Scholar
Xiao J, Stolkin R, Leonardis A (2017) Dynamic multi-level appearance models and adaptive clustered decision trees for single target tracking. Pattern Recogn:4978–4987
Hadoux E, Hunter A (2017) Strategic sequences of arguments for persuasion using decision trees. Proc AAAI:1128–1134
Flach PA (2003) The geometry of roc space: understanding machine learning metrics through roc isometrics. Proceedings of the 20st International Conference on Machine Learning, pp 194–201
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16(1):321–357
Article Google Scholar
Bunkhumpornpat C (2012) DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique. Appl Intell 36(3):664–684
Article Google Scholar
Drummond C, Holte RC (2000) Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. Seventeenth International Conference on Machine Learning, pp 239–246
Boonchuay K, Sinapiromsaran K, Lursinsap K (2017) Decision tree induction based on minority entropy for the class imbalance problem. Pattern Anal Appl 20(3):769–782
Article MathSciNet Google Scholar
Liu W, Chawla S, Cieslak DA, Chawla NV (2010) A robust decision tree algorithm for imbalanced data sets. Proceedings of the 2010 SIAM International Conference on Data Mining, pp 766–777
Su C, Cao J (2019) Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria. Appll Intell 49:1127–1145
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc. 340 Pine Street, Sixth FloorSan FranciscoCAUnited States
Nowozin S (2012) Improved information gain estimates for decision tree induction. ICML 23 (4):1293–1314
Google Scholar
Shafer JC, Agrawal R, Mehta M (1996) SPRINT: A scalable parallel classifier for data mining. Proceedings of 22th International Conference on Very Large Data Bases, pp 544–555
Chandra B, Kothari R, Paul P (2010) A new node splitting measure for decision tree construction. Pattern Recogn 43(8):2725–2731
Article Google Scholar
Wang R, Kwong S, Wang XZ, Jiang QS (2015) Segment based decision tree induction with continuous valued attributes. IEEE Trans Cybern 45(7):1262–1275
Article Google Scholar
Yan JJ, Zhang ZN, Xie LW, Zhu ZT (2019) A unified framework for decision tree on continuous Attributes[J]. IEEE Access 7(1):11924–11933
Article Google Scholar
Dietterich T, Kearns M, Mansour Y (1996) Applying the weak learning framework to understand and improve C4.5. 13th International Conference on Machine Learning, pp 96–104
Adnan MN, Islam MZ (2014) Combosplit:combining various splitting criteria for building a single decision tree. Proceedings of the International Conference on Artificial Interllingence and Pattern Recognition, pp 1–8
Tan P, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison-Wesley
Mulyar A, Krawczyk B (2018) Addressing Local Class Imbalance in Balanced Datasets with Dynamic Impurity Decision Tree. 21st International Conference, Discovery Science 2018, Limassol, Cyprus, pp 3–17
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, Mclachlan GJ, Ng A, Liu B, Yu PS (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Article Google Scholar
Breiman L, Friedman LH, Olshen RA, Stone CJ (1984) Classification and regression trees. Belmont, Wadsworth
Cieslak DA, Hoens TR, Chawla NV (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Discov 24(1):136–158
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, Xiamen University, Xiamen, China
Jianjian Yan, Zhongnan Zhang & Huailin Dong

Authors

Jianjian Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhongnan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huailin Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongnan Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, J., Zhang, Z. & Dong, H. AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria. Appl Intell 51, 4744–4761 (2021). https://doi.org/10.1007/s10489-020-02061-z

Download citation

Accepted: 03 November 2020
Published: 05 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10489-020-02061-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Review on Random Forest: An Ensemble Classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Review on Random Forest: An Ensemble Classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation