Ranking-based evaluation of regression models

Rosset, Saharon; Perlich, Claudia; Zadrozny, Bianca

doi:10.1007/s10115-006-0037-3

Ranking-based evaluation of regression models

Regular Paper
Published: 11 October 2006

Volume 12, pages 331–353, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Saharon Rosset¹,
Claudia Perlich¹ &
Bianca Zadrozny²

142 Accesses
20 Citations
Explore all metrics

Abstract

We suggest the use of ranking-based evaluation measures for regression models, as a complement to the commonly used residual-based evaluation. We argue that in some cases, such as the case study we present, ranking can be the main underlying goal in building a regression model, and ranking performance is the correct evaluation metric. However, even when ranking is not the contextually correct performance metric, the measures we explore still have significant advantages: They are robust against extreme outliers in the evaluation set; and they are interpretable. The two measures we consider correspond closely to non-parametric correlation coefficients commonly used in data analysis (Spearman's ρ and Kendall's τ); and they both have interesting graphical representations, which, similarly to ROC curves, offer useful various model performance views, in addition to a one-number summary in the area under the curve. An interesting extension which we explore is to evaluate models on their performance in “partially” ranking the data, which we argue can better represent the utility of the model in many cases. We illustrate our methods on a case study of evaluating IT Wallet size estimation models for IBM's customers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploratory Analysis of Ranking Data

Probability Models for Ranking Data

Decision Tree Models for Ranking Data

References

Bay SD (2000) UCI KDD archive. Department of Information and Computer Sciences, University of California, Irvine. Available at: http://kdd.ics.uci.edu/
Book Google Scholar
Bi J, Bennett KP (2003) Regression error characteristic curves. In: Proceedings of the twentieth international conference on machine learning (ICML-03), Washington, DC
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recog 30(7):1145–1159
Article Google Scholar
Egan J (1975) Signal detection theory and ROC analysis. Academic Press, San Diego
Google Scholar
Garland R (2004) Share of wallet's role in customer profitability. J Finan Serv Mark 8(3):259–268
Article Google Scholar
Georges J, Milley AH (1999) KDD'99 competition: knowledge discovery contest report. Available at: http://www-cse.ucsd.edu/users/elkan/kdresults.html
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York
MATH Google Scholar
Kendall M, Gibbons JM (1990) Rank correlation methods. Edward Arnold, London
MATH Google Scholar
Noether NE (1967) Elements of nonparametric statistics. Wiley, New York
MATH Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer-Verlag, Berlin Heidelberg New York
MATH Google Scholar
Vucetic S, Obradovic Z (2005) Collaborative filtering using a regression-based approach. Knowl Inf Syst 7(1)
Witten IH, Frank E (2000) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco. Available at: http://www.cs.waikato.ac.nz/ml/weka/
Google Scholar
Wu X, Yu P, Piatetsky-Shapiro G, Cercone N, Lin TY, Kotagiri R, Wah BW (2003) Data mining: how research meets practical development? Knowl Inf Syst 5(2):248–261
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM, T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY, 10598, USA
Saharon Rosset & Claudia Perlich
Computer Science Institute, Federal Fluminense University, Brazil, Rua Passo da Pátria, 156, Bloco E, Sala 302, Niterói, RJ, Brazil, CEP 24210-240
Bianca Zadrozny

Authors

Saharon Rosset
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Perlich
View author publications
You can also search for this author in PubMed Google Scholar
Bianca Zadrozny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saharon Rosset.

Additional information

Saharon Rosset is Research Staff Member in the Data Analytics Research Group at IBM's T. J. Watson Research Center. He received his B.S. in Mathematics and M.Sc., in Statistics from Tel Aviv University in Israel, and his Ph.D. in Statistics from Stanford University in 2003. In his research, he aspires to develop practically useful predictive modeling methodologies and tools, and apply them to solve problems in business and scientific domains. Currently, his major projects include work on customer wallet estimation and analysis of genetic data.

Claudia Perlich has received a M.Sc. in Computer Science from Colorado University at Boulder, a Diploma in Computer Science from Technische Universitaet in Darmstadt, and her Ph.D. in Information Systems from Stern School of Business, New York University. Her Ph.D. thesis concentrated on probability estimation in multi-relational domains that capture information of multiple entity types and relationships between them. Her dissertation was recognized as an additional winner of the International SAP Doctoral Support Award Competition. Claudia joined the Data Analytics Research group at IBM's T.J. Watson Research Center as a Research Staff Member in October 2004. Her research interests are in statistical machine learning for complex real-world domains and business applications.

Bianca Zadrozny is currently an associate professor at the Computer Science Department of Federal Fluminense University in Brazil. Her research interests are in the areas of applied machine learning and data mining. She received her B.Sc. in Computer Engineering from the Pontifical Catholic University in Rio de Janeiro, Brazil, and her M.Sc. and Ph.D. in Computer Science from the University of California at San Diego. She has also worked as a research staff member in the data analytics research group at IBM T.J. Watson Research Center.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rosset, S., Perlich, C. & Zadrozny, B. Ranking-based evaluation of regression models. Knowl Inf Syst 12, 331–353 (2007). https://doi.org/10.1007/s10115-006-0037-3

Download citation

Received: 30 November 2005
Revised: 23 January 2006
Accepted: 01 April 2006
Published: 11 October 2006
Issue Date: August 2007
DOI: https://doi.org/10.1007/s10115-006-0037-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ranking-based evaluation of regression models

Abstract

Access this article

Similar content being viewed by others

Exploratory Analysis of Ranking Data

Probability Models for Ranking Data

Decision Tree Models for Ranking Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ranking-based evaluation of regression models

Abstract

Access this article

Similar content being viewed by others

Exploratory Analysis of Ranking Data

Probability Models for Ranking Data

Decision Tree Models for Ranking Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation