Abstract
The TreeRank algorithm was recently proposed in [1] and [2] as a scoring-based method based on recursive partitioning of the input space. This tree induction algorithm builds orderings by recursively optimizing the Receiver Operating Characteristic curve through a one-step optimization procedure called LeafRank. One of the aim of this paper is the in-depth analysis of the empirical performance of the variants of TreeRank/LeafRank method. Numerical experiments based on both artificial and real data sets are provided. Further experiments using resampling and randomization, in the spirit of bagging and random forests are developed [3, 4] and we show how they increase both stability and accuracy in bipartite ranking. Moreover, an empirical comparison with other efficient scoring algorithms such as RankBoost and RankSVM is presented on UCI benchmark data sets.










Similar content being viewed by others
Reference
Clémençon S, Vayatis N (2009) Tree-based ranking methods. IEEE Trans Inf Theory 9:4316–4336
Clémençon S, Depecker M, Vayatis N (2011) Adaptive partitioning schemes for bipartite ranking. J Mach Learn 43(1):3169
Clémençon S, Depecker M, Vayatis N (2009) Bagging ranking trees. In: Proceedings of ICMLA, international conference on machine learning and applications
Clémençon S, Vayatis N (2010) Ranking forests (to be published)
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman & Hall, Boca Raton
Zhu J, Hastie T (2005) Kernel logistic regression and the import vector machine. J Comput Graph Stat 14(1):185–205
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142
Pahikkala T, Tsivtsivadze E, Airola A, Boberg J, Salakoski T (2007) Learning to rank with pairwise regularized least-squares. In: Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval, pp 27–33
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of ICML, 22nd international conference on machine learning, pp 89–96
Dodd L, Pepe M (2003) Partial AUC estimation and regression. Biometrics 59(3):614–623
Clémençon S, Vayatis N (2007) Ranking the Best Instances. J Mach Learn Res 8:2671–2699
Clémençon S, Vayatis N (2008) Empirical performance maximization for linear rank statistics. In: Proceedings of NIPS’08, conference on neural information processing systems, pp 305–312
Rudin C (2009) The P-norm push: a simple convex ranking algorithm that concentrates at the top of the list. J Mach Learn Res 10:2233–2271
Robertson S, Zaragoza H (2007) On rank-based effectiveness measures and optimization. Inf Retr 10(3):321–339
Bartlett P, Jordan M, McAuliffe J (2006) Convexity classification and risk bounds. J Am Stat Assoc 101(473):138–156
Bartlett P, Tewari A (2007) Sparseness vs estimating conditional probabilities: some asymptotic results. J Mach Learn Res 8:775–790
Mease D, Wyner A (2008) Evidence contrary to the statistical view of boosting. J Mach Learn Res 9:131–156
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin
Clémençon S, Vayatis N (2010) Overlaying classifiers: a practical approach for optimal scoring. Constr Approx 32(3):619–648
Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey of recent advances. ESAIM Probab Stat 9:323–375
anley J, McNeil J (1982) The meaning and use of the area under a ROC curve. Radiology 143:29–36
Clémençon S, Lugosi G, Vayatis N (2008) Ranking and empirical risk minimization of U-statistics. Ann Stat 36:844–874
Ailon N, Mohri M (2010) Preference-based learning to rank. Mach Learn J 80(2):189–211
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification, regression trees. Wadsworth and Brooks, Monterey
Bach FR, Heckerman D, Eric H (2006) Considering cost asymmetry in learning classifiers. J Mach Learn Res 7:1713–1741
Acknowledgments
We warmly thank Cynthia Rudin who kindly provided the code for the P-norm Push algorithm.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Clémençon, S., Depecker, M. & Vayatis, N. An empirical comparison of learning algorithms for nonparametric scoring: the TreeRank algorithm and other methods. Pattern Anal Applic 16, 475–496 (2013). https://doi.org/10.1007/s10044-012-0299-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-012-0299-1