A comparison of random forest based algorithms: random credal random forest versus oblique random forest

Mantas, Carlos J.; Castellano, Javier G.; Moral-García, Serafín; Abellán, Joaquín

doi:10.1007/s00500-018-3628-5

A comparison of random forest based algorithms: random credal random forest versus oblique random forest

Methodologies and Application
Published: 17 November 2018

Volume 23, pages 10739–10754, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Carlos J. Mantas¹,
Javier G. Castellano¹,
Serafín Moral-García¹ &
…
Joaquín Abellán¹

3007 Accesses
74 Citations
Explore all metrics

Abstract

Random forest (RF) is an ensemble learning method, and it is considered a reference due to its excellent performance. Several improvements in RF have been published. A kind of improvement for the RF algorithm is based on the use of multivariate decision trees with local optimization process (oblique RF). Another type of improvement is to provide additional diversity for the univariate decision trees by means of the use of imprecise probabilities (random credal random forest, RCRF). The aim of this work is to compare experimentally these improvements of the RF algorithm. It is shown that the improvement in RF with the use of additional diversity and imprecise probabilities achieves better results than the use of RF with multivariate decision trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence Random Forest Algorithm and the Application

A Brief Survey on Random Forest Ensembles in Classification Model

Improved Weighted Random Forest for Classification Problems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Normally, the value used for m is the integer part of $\log _2$ (number of features) $+1$.

References

Abellán J (2006) Uncertainty measures on probability intervals from the imprecise dirichlet model. Int J Gen Syst 35(5):509–528. https://doi.org/10.1080/03081070600687643
Article MathSciNet MATH Google Scholar
Abellán J, Masegosa A (2008) Requirements for total uncertainty measures in dempster–shafer theory of evidence. Int J Gen Syst 37(6):733–747. https://doi.org/10.1080/03081070802082486
Article MathSciNet MATH Google Scholar
Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837. https://doi.org/10.1016/j.eswa.2012.01.013
Article Google Scholar
Abellán J, Moral S (2003) Building classification trees using the total uncertainty criterion. Int J Intell Syst 18(12):1215–1225. https://doi.org/10.1002/int.10143
Article MATH Google Scholar
Abellán J, Mantas CJ, Castellano JG (2018a) Adaptative CC4.5: credal C4.5 with a rough class noise estimator. Expert Syst Appl 92:363–379. https://doi.org/10.1016/j.eswa.2017.09.057
Article Google Scholar
Abellán J, Mantas CJ, Castellano JG, Moral S (2018b) Increasing diversity in random forest learning algorithm via imprecise probabilities. Expert Syst Appl 97:228–243. https://doi.org/10.1016/j.eswa.2017.12.029
Article Google Scholar
Alcalá-Fdez J, Sánchez L, Garćýa S, del Jesus M, Ventura S, Garrell J, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318. https://doi.org/10.1007/s00500-008-0323-y
Article Google Scholar
Breiman L (2000) Randomizing outputs to increase prediction accuracy. Mach Learn 40(3):229–242. https://doi.org/10.1023/A:1007682208299
Article MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. J Inf Fusion 6:5–20
Article Google Scholar
Chen F-H, Howard H (2016) An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree. Soft Comput 20(5):1945–1960. https://doi.org/10.1007/s00500-015-1616-6
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dietterich TG (2000a) Ensemble methods in machine learning ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems proceedings of the first international workshop on multiple classifier systems, Springer, London, UK, pp 1–15
Dietterich TG (2000b) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learn 40(2):139–157. https://doi.org/10.1023/A:1007607513941
Article Google Scholar
Fan S-KS, Su C-J, Nien H-T, Tsai P-F, Cheng C-Y (2017) Using machine learning and big data approaches to predict travel time based on historical and real-time data from taiwan electronic toll collection. Soft Comput. https://doi.org/10.1007/s00500-017-2610-y
Frenay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701
Article Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92. https://doi.org/10.1214/aoms/1177731944
Article MathSciNet MATH Google Scholar
Hoerl AE, Kennard RW (2000) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1):80–86. https://doi.org/10.2307/1271436
Article MATH Google Scholar
Klir GJ (2005) Uncertainty and information: foundations of generalized information theory. Wiley, New York. https://doi.org/10.1002/0471755575
Book MATH Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Mantas CJ, Abellán J (2014a) Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst Appl 41(5):2514–2525. https://doi.org/10.1016/j.eswa.2013.09.050
Article Google Scholar
Mantas CJ, Abellán J (2014b) Credal-C4.5: decision tree based on imprecise probabilities to classify noisy data. Expert Syst Appl 41(10):4625–4637. https://doi.org/10.1016/j.eswa.2014.01.017
Article Google Scholar
Mantas CJ, Abellán J, Castellano JG (2016) Analysis of credal-C4.5 for classification in noisy domains. Expert Syst Appl 61:314–326. https://doi.org/10.1016/j.eswa.2016.05.035
Article Google Scholar
Marquardt DW, Snee RD (1975) Ridge regression in practice. Am Stat 29(1):3–20
MATH Google Scholar
Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Proceedings of the 2011 European conference on machine learning and knowledge discovery in databases-volume part ii, Springer, pp 453–469
Mistry P, Neagu D, Trundle PR, Vessey JD (2016) Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. Soft Comput 20(8):2967–2979. https://doi.org/10.1007/s00500-015-1925-9
Article Google Scholar
Nemenyi P (1963) Distribution-free multiple comparisons (Doctoral dissertation). Princeton University, Princeton
Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1023/A:1022643204877
Article Google Scholar
R Core Team (2013) R: a language and environment for statistical computing [computer software manual], Vienna, Austria. http://www.R-project.org/
Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53. https://doi.org/10.1109/MCI.2015.2471235
Article Google Scholar
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39. https://doi.org/10.1007/s10462-009-9124-7
Article Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Article MathSciNet MATH Google Scholar
Walley P (1996) Inferences from multinomial data; learning about a bag of marbles (with discussion). J R Stat Soc Ser B 58(1):3–57. https://doi.org/10.2307/2346164
Article MathSciNet MATH Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83. https://doi.org/10.2307/3001968
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Xu Y, Zhang Q, Wang L (2018) Metric forests based on gaussian mixture model for visual image classification. Soft Comput 22(2):499–509. https://doi.org/10.1007/s00500-016-2350-4
Article Google Scholar
Zhang L, Suganthan P (2014) Random forests with ensemble of feature spaces. Pattern Recognit 47:3429–3437
Article Google Scholar
Zhang L, Suganthan PN (2015) Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans Cybern 45(10):2165–2176. https://doi.org/10.1109/TCYB.2014.2366468
Article Google Scholar
Zhang L, Suganthan PN (2017) Benchmarking ensemble classifiers with novel co-trained kernal ridge regression and random vector functional link ensembles [research frontier]. IEEE Comput Intell Mag 12(4):61–72. https://doi.org/10.1109/MCI.2017.2742867
Article Google Scholar
Zhang L, Ren Y, Suganthan PN (2014) Towards generating random forests via extremely randomized trees. In: IJCNN, IEEE, pp 2645–2652
Zhang L, Varadarajan J, Suganthan PN, Ahuja N, Moulin P (2017) Robust visual tracking using oblique random forests. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2017:5825–5834
Google Scholar

Download references

Acknowledgements

This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R.

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Carlos J. Mantas, Javier G. Castellano, Serafín Moral-García & Joaquín Abellán

Authors

Carlos J. Mantas
View author publications
You can also search for this author in PubMed Google Scholar
Javier G. Castellano
View author publications
You can also search for this author in PubMed Google Scholar
Serafín Moral-García
View author publications
You can also search for this author in PubMed Google Scholar
Joaquín Abellán
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos J. Mantas.

Ethics declarations

Conflict of interest

Carlos J. Mantas, Javier G. Castellano, Serafín Moral-García and Joaquín Abellán declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Tables about accuracy results

Tables 6, 7, 8, 9 and 10 show the accuracy results obtained by the ensemble methods when they classify data sets with different added noise levels.

Tables 11, 12, 13, 14 and 15 show the p values of the Nemenyi test on the pairs of comparisons when they are applied on data sets with different percentage of added noise. In all the cases, Nemenyi’s procedures reject the hypotheses which have a corresponding p value $\le 0.01$. When there is a significative difference, the best algorithm is distinguished with bold fonts.

Table 6 Accuracy results of the ensemble methods when they are used on data sets without added noise

Full size table

Table 7 Accuracy results of the ensemble methods when they are used on data sets with a percentage of added label noise equal to 5%

Full size table

Table 8 Accuracy results of the ensemble methods when they are used on data sets with a percentage of added label noise equal to 10%

Full size table

Table 9 Accuracy results of the ensemble methods when they are used on data sets with a percentage of added label noise equal to 20%

Full size table

Table 10 Accuracy results of the ensemble methods when they are used on data sets with a percentage of added label noise equal to 30%

Full size table

Table 11 p Values of the Nemenyi test about the accuracy on data sets without added noise

Full size table

Table 12 p Values of the Nemenyi test about the accuracy on data sets with $5\%$ of added noise

Full size table

Table 13 p Values of the Nemenyi test about the accuracy on data sets with $10\%$ of added noise

Full size table

Table 14 p Values of the Nemenyi test about the accuracy on data sets with $20\%$ of added noise

Full size table

Table 15 p Values of the Nemenyi test about the accuracy on data sets with $30\%$ of added noise

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mantas, C.J., Castellano, J.G., Moral-García, S. et al. A comparison of random forest based algorithms: random credal random forest versus oblique random forest. Soft Comput 23, 10739–10754 (2019). https://doi.org/10.1007/s00500-018-3628-5

Download citation

Published: 17 November 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00500-018-3628-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of random forest based algorithms: random credal random forest versus oblique random forest

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Artificial Intelligence Random Forest Algorithm and the Application

A Brief Survey on Random Forest Ensembles in Classification Model

Improved Weighted Random Forest for Classification Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix A: Tables about accuracy results

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A comparison of random forest based algorithms: random credal random forest versus oblique random forest

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Artificial Intelligence Random Forest Algorithm and the Application

A Brief Survey on Random Forest Ensembles in Classification Model

Improved Weighted Random Forest for Classification Problems

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix A: Tables about accuracy results

Appendix A: Tables about accuracy results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation