Abstract
In applications related to information retrieval, the goal is not only to build a classifier for deciding whether a document x among a list χ is relevant or not, but to learn a scoring function s : χ → ℝ for ranking all possible documents with respect to their relevancy. Here we show how the bipartite ranking problem boils down to binary classification with dependent data when accuracy is measured by the A U C criterion. The natural estimate of the risk being of the form of a U-statistic, consistency of methods based on empirical risk minimization is studied using the theory of U-processes. Taking advantage of this specific form, we prove that fast rates of convergence may be achieved under general noise assumptions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AGARWAL, S., HAR-PELED, S., and ROTH, D. (2005): A uniform convergence bound for the area under the ROC curve. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, Barbados.
BACH, F.R., HECKERMAN, D., and HORVITZ, E. (2004): On the path to an ideal ROC Curve: considering cost asymmetry in learning classifiers. Technical report MSR-TR-2004-24, University of California, Berkeley.
CLEMENÇON, S., LUGOSI, G., and VAYATIS, N. (2005): Ranking and scoring using empirical risk minimization. Preprint.
DE LA PEÑA, V. and GINE, E. (1999): Decoupling: from dependence to independence. Springer.
DEVROYE, L., GYÖRFI, L., and LUGOSI, G. (1996): A Probabilistic Theory of Pattern Recognition. Springer.
FREUND, Y., IYER, R., SCHAPIRE, R.E., and SINGER, Y. (2003): An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, 4, 933–969.
GREEN, D.M. and SWETS, J.A. (1966): Signal detection theory and psychophysics. Wiley, New York.
HANLEY, J.A. and McNEIL, J. (1982): The meaning and use of the area under a ROC curve. Radiology, 143, 29–36.
HERBRICH, R., GRAEPEL, T., and OBERMAYER, K. (2000): Large margin rank boundaries for ordinal regression. In: A. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans (Eds.): Advances in Large Margin Classifiers. The MIT Press, 115–132.
LUGOSI, G. (2002): Pattern classification and learning theory. In: Györfi, L. (Ed.), Principles of Nonparametric Learning, Springer, Wien, New York, 1–56.
MASSART, P. and NEDELEC, E. (2003): Risk bounds for statistical learning. Preprint, Université Paris XI.
McDIARMID, C. (1989): On the method of bounded differences. In: Surveys in Combinatorics 1989, Cambridge University Press, 148–188.
TSYBAKOV, A. (2004): Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32, 135–166.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Berlin · Heidelberg
About this paper
Cite this paper
Clémençon, S., Lugosi, G., Vayatis, N. (2006). From Ranking to Classification: A Statistical View. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_25
Download citation
DOI: https://doi.org/10.1007/3-540-31314-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)