Learning random forests for ranking

Jiang, Liangxiao

doi:10.1007/s11704-010-0388-5

Learning random forests for ranking

Research Article
Published: 04 December 2010

Volume 5, pages 79–86, (2011)
Cite this article

Frontiers of Computer Science in China Aims and scope Submit manuscript

Liangxiao Jiang¹

334 Accesses
16 Citations
Explore all metrics

Abstract

The random forests (RF) algorithm, which combines the predictions from an ensemble of random trees, has achieved significant improvements in terms of classification accuracy. In many real-world applications, however, ranking is often required in order to make optimal decisions. Thus, we focus our attention on the ranking performance of RF in this paper. Our experimental results based on the entire 36 UC Irvine Machine Learning Repository (UCI) data sets published on the main website of Weka platform show that RF doesn’t perform well in ranking, and is even about the same as a single C4.4 tree. This fact raises the question of whether several improvements to RF can scale up its ranking performance. To answer this question, we single out an improved random forests (IRF) algorithm. Instead of the information gain measure and the maximum-likelihood estimate, the average gain measure and the similarity-weighted estimate are used in IRF. Our experiments show that IRF significantly outperforms all the other algorithms used to compare in terms of ranking while maintains the high classification accuracy characterizing RF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Provost F, Domingos P. Tree induction for probability-based ranking. Machine Learning, 2003, 52(3): 199–215
Article MATH Google Scholar
Ling C X, Yan R J. Decision tree with better ranking. In: Proceedings of 20th International Conference on Machine Learning. 2003, 480–487
Jiang L X, Li C Q, Cai Z H. Learning decision tree for ranking. Knowledge and Information Systems, 2009, 20(1): 123–135
Article Google Scholar
Jiang L X, Wang D H, Zhang H, Cai Z H, Huang B. Using instance cloning to improve naive Bayes for ranking. International Journal of Pattern Recognition and Artificial Intelligence, 2008, 22(6): 1121–1140
Article Google Scholar
Bradley A P. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 1997, 30(7): 1145–1159
Article Google Scholar
Hand D J, Till R J. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning, 2001, 45(2): 171–186
Article MATH Google Scholar
Ling C X, Huang J, Zhang H. Auc: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of 18th International Joint Conference on Artificial Intelligence. 2003, 519–526
Quinlan J R. C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann, 1992
Google Scholar
Mitchell T M. Machine Learning. New York: McGraw-Hill, 1997
MATH Google Scholar
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32
Article MATH Google Scholar
Dietterich T G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 2000, 40(2): 139–157
Article Google Scholar
Breiman L. Bagging Predictors. Machine Learning, 1996, 24(2): 123–140
MathSciNet MATH Google Scholar
Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting and variants. Machine Learning, 1999, 36(1–2): 105–139
Article Google Scholar
Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 2nd ed. San Francisco: Morgan Kaufmann, 2005
MATH Google Scholar
Quinlan J R. Induction of decision trees. Machine Learning, 1986, 1(1): 81–106
Google Scholar
Wang D H, Jiang L X. An improved attribute selection measure for decision tree induction. In: Proceedings of 4th International Conference on Fuzzy Systems and Knowledge Discovery. 2007, 654–658
De Mántaras R L. A distance-based attribute selection measure for decision tree induction. Machine Learning, 1991, 6(1): 81–92
Article Google Scholar
Pazzani M J, Merz C J, Murphy P M, Ali K. Hume T, Brunk C. Reducing misclassification costs. In: Proceedings of 11th International Conference on Machine Learning. 1994, 217–225
Bradford J P, Kunz C, Kohavi R, Brunk C, Brodley C E. Pruning decision trees with misclassification costs. In: Proceedings of 10th European Conference on Machine Learning. 1998, 131–136
Provost F J, Fawcett T, Kohavi R. The case against accuracy estimation for comparing induction algorithms. In: Proceedings of 15th International Conference on Machine Learning. 1998, 445–453
Jiang L X, Wang D H, Cai Z H. Scaling up the accuracy of bayesian network classifiers by m-estimate. In: Proceedings of 3rd International Conference on Intelligent Computing. 2007, 475–484
Smyth P, Gray A, Fayyad UM. Retrofitting decision tree classifiers using kernel density estimation. In: Proceedings of 12th International Conference on Machine Learning. 1995, 506–514
Nadeau C, Bengio Y. Inference for the generalization error. Machine Learning, 2003, 52(3): 239–281
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, China University of Geosciences, Wuhan, 430074, China
Liangxiao Jiang

Authors

Liangxiao Jiang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Liangxiao Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, L. Learning random forests for ranking. Front. Comput. Sci. China 5, 79–86 (2011). https://doi.org/10.1007/s11704-010-0388-5

Download citation

Received: 01 May 2010
Accepted: 20 August 2010
Published: 04 December 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s11704-010-0388-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning random forests for ranking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A comparison of random forest based algorithms: random credal random forest versus oblique random forest

Probabilistic Feature Selection for Interpretable Random Forest Model

Mining Big Data with Random Forests

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Learning random forests for ranking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A comparison of random forest based algorithms: random credal random forest versus oblique random forest

Probabilistic Feature Selection for Interpretable Random Forest Model

Mining Big Data with Random Forests

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now