From Ranking to Classification: A Statistical View

Clémençon, Stéphan; Lugosi, Gábor; Vayatis, Nicolas

doi:10.1007/3-540-31314-1_25

Stéphan Clémençon^22,24,
Gábor Lugosi²³ &
Nicolas Vayatis²⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2209 Accesses

Abstract

In applications related to information retrieval, the goal is not only to build a classifier for deciding whether a document x among a list χ is relevant or not, but to learn a scoring function s : χ → ℝ for ranking all possible documents with respect to their relevancy. Here we show how the bipartite ranking problem boils down to binary classification with dependent data when accuracy is measured by the A U C criterion. The natural estimate of the risk being of the form of a U-statistic, consistency of methods based on empirical risk minimization is studied using the theory of U-processes. Taking advantage of this specific form, we prove that fast rates of convergence may be achieved under general noise assumptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AGARWAL, S., HAR-PELED, S., and ROTH, D. (2005): A uniform convergence bound for the area under the ROC curve. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, Barbados.
Google Scholar
BACH, F.R., HECKERMAN, D., and HORVITZ, E. (2004): On the path to an ideal ROC Curve: considering cost asymmetry in learning classifiers. Technical report MSR-TR-2004-24, University of California, Berkeley.
Google Scholar
CLEMENÇON, S., LUGOSI, G., and VAYATIS, N. (2005): Ranking and scoring using empirical risk minimization. Preprint.
Google Scholar
DE LA PEÑA, V. and GINE, E. (1999): Decoupling: from dependence to independence. Springer.
Google Scholar
DEVROYE, L., GYÖRFI, L., and LUGOSI, G. (1996): A Probabilistic Theory of Pattern Recognition. Springer.
Google Scholar
FREUND, Y., IYER, R., SCHAPIRE, R.E., and SINGER, Y. (2003): An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, 4, 933–969.
MathSciNet Google Scholar
GREEN, D.M. and SWETS, J.A. (1966): Signal detection theory and psychophysics. Wiley, New York.
Google Scholar
HANLEY, J.A. and McNEIL, J. (1982): The meaning and use of the area under a ROC curve. Radiology, 143, 29–36.
Google Scholar
HERBRICH, R., GRAEPEL, T., and OBERMAYER, K. (2000): Large margin rank boundaries for ordinal regression. In: A. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans (Eds.): Advances in Large Margin Classifiers. The MIT Press, 115–132.
Google Scholar
LUGOSI, G. (2002): Pattern classification and learning theory. In: Györfi, L. (Ed.), Principles of Nonparametric Learning, Springer, Wien, New York, 1–56.
Google Scholar
MASSART, P. and NEDELEC, E. (2003): Risk bounds for statistical learning. Preprint, Université Paris XI.
Google Scholar
McDIARMID, C. (1989): On the method of bounded differences. In: Surveys in Combinatorics 1989, Cambridge University Press, 148–188.
Google Scholar
TSYBAKOV, A. (2004): Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32, 135–166.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

MODAL’X, Université Paris X, 92001, Nanterre, France
Stéphan Clémençon
Departament d’Economia i Empresa, Universitat Pompeu Fabra, 08005, Barcelona, Spain
Gábor Lugosi
Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris VI et Paris VII, 75013, Paris, France
Stéphan Clémençon & Nicolas Vayatis

Authors

Stéphan Clémençon
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Lugosi
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Vayatis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Technische und Betriebliche Informationssysteme, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Myra Spiliopoulou
Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse , Christian Borgelt & Andreas Nürnberger , &
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Clémençon, S., Lugosi, G., Vayatis, N. (2006). From Ranking to Classification: A Statistical View. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_25

Download citation

DOI: https://doi.org/10.1007/3-540-31314-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics