research-article

Top-k learning to rank: labeling, ranking and evaluation

Authors:
Shuzi Niu

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

,
Jiafeng Guo

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

,
Yanyan Lan

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

,
Xueqi Cheng

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
View Profile

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalAugust 2012Pages 751–760https://doi.org/10.1145/2348283.2348384

Published:12 August 2012Publication History

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Pages 751–760

ABSTRACT

In this paper, we propose a novel top-k learning to rank framework, which involves labeling strategy, ranking model and evaluation measure. The motivation comes from the difficulty in obtaining reliable relevance judgments from human assessors when applying learning to rank in real search systems. The traditional absolute relevance judgment method is difficult in both gradation specification and human assessing, resulting in high level of disagreement on judgments. While the pairwise preference judgment, as a good alternative, is often criticized for increasing the complexity of judgment from O(n) to (n log n). Considering the fact that users mainly care about top ranked search results, we propose a novel top-k labeling strategy which adopts the pairwise preference judgment to generate the top k ordering items from n documents (i.e. top-k ground-truth) in a manner similar to that of HeapSort. As a result, the complexity of judgment is reduced to O(n log k). With the top-k ground-truth, traditional ranking models (e.g. pairwise or listwise models) and evaluation measures (e.g. NDCG) no longer fit the data set. Therefore, we introduce a new ranking model, namely FocusedRank, which fully captures the characteristics of the top-k ground-truth. We also extend the widely used evaluation measures NDCG and ERR to be applicable to the top-k ground-truth, referred as κ-NDCG and κ-ERR, respectively. Finally, we conduct extensive experiments on benchmark data collections to demonstrate the efficiency and effectiveness of our top-k labeling strategy and ranking models.

References

N. Ailon and M. Mohri. An efficient reduction of ranking to classification. COLT '08, pages 87--98, 2008.Google Scholar
C. Buckley and E. M. Voorhees. Retrieval system evaluation, chapter TREC: experiment and evaluation in information retrieval. MIT press, 2005.Google Scholar
C. Burges, T. Shaked, and et al. Learning to rank using gradient descent. ICML '05, pages 89--96, 2005. Google ScholarDigital Library
R. Burgin. Variations in relevance judgments and the evaluation of retrieval performance. IPM, 28:619--627, 1992. Google ScholarDigital Library
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. ICML '07, pages 129--136, 2007. Google ScholarDigital Library
B. Carterette and P. N. Bennett. Evaluation measures for preference judgments. SIGIR '08, pages 685--686, 2008. Google ScholarDigital Library
B. Carterette, P. N. Bennett, D. M. Chickering, and S. T. Dumais. Here or there: preference judgments for relevance. ECIR'08, pages 16--27, 2008. Google ScholarDigital Library
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. CIKM '09, pages 621--630. ACM, 2009. Google ScholarDigital Library
S. Clémençon and N. Vayatis. Ranking the best instances. JMLR, 8:2671--2699, 2007. Google ScholarDigital Library
D. Cossock and T. Zhang. Subset ranking using regression. Learning theory, 4005:605--619, 2006. Google ScholarDigital Library
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933--969, 2003. Google ScholarDigital Library
E. P. C. III. The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, No. 4:407--422.Google Scholar
K. Jarvelin and J. Kek\"al\"ainen. Ir evaluation methods for retrieving highly relevant documents. SIGIR '00, pages 41--48, 2000. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. KDD '02, pages 133--142, 2002. Google ScholarDigital Library
J. Kekalainen. Binary and graded relevance in ir evaluations-comparison of the effects on ranking of ir systems. IPM, 41:1019--1033, 2005. Google ScholarDigital Library
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27:2:1--2:27, 2008. Google ScholarDigital Library
L. P., B. C., and W. Q. Mcrank: learning to rank using multiple classification and gradient boosting. In NIPS2007, pages 845--852.Google Scholar
T. Qin, T.-Y. Liu, and et al. Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13:346--374, 2010. Google ScholarDigital Library
T. R., S. W. Jr., and V. J.L. Towards the identification of the optimal number of relevance categories. JASIS, 50:254--264, 1999. Google ScholarDigital Library
K. Radinsky and N. Ailon. Ranking from pairs and triplets: information quality, evaluation methods and query complexity. WSDM '11, pages 105--114, 2011. Google ScholarDigital Library
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. KDD '05, pages 239--248, 2005. Google ScholarDigital Library
F. G. Rebecca and N. Melisa. The neutral point on a likert scale. Journal of Psychology, 95:199--204, 1971.Google Scholar
P. R.L. The analysis of permutations. Applied Statistics, 24(2):193--202, 1974.Google Scholar
M. Rorvig. The simple scalability of documents. JASIS, 41:590--598, 1990.Google ScholarCross Ref
C. Rudin. Ranking with a p-norm push. In COLT, pages 589--604, 2006. Google ScholarDigital Library
R. Song, Q. Guo, R. Zhang, and et al. Select-the-best-ones: A new way to judge relative relevance. IPM, 47:37--52, 2011. Google ScholarDigital Library
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. SIGIR '98, pages 315--323. ACM, 1998. Google ScholarDigital Library
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. IPM, 36:697--716, 2000. Google ScholarDigital Library
E. M. Voorhees. Evaluation by highly relevant documents. SIGIR '01, pages 74--82. ACM, 2001. Google ScholarDigital Library
F. Xia, T.-Y. Liu, and H. Li. Statistical consistency of top-k ranking. In NIPS, pages 2098--2106, 2009.Google ScholarDigital Library
J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. SIGIR '07, pages 391--398, 2007. Google ScholarDigital Library
Yao. Measuring retrieval effectiveness based on user preference of documents. JASIS, 46:133--145, 1995. Google ScholarDigital Library
Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. SIGIR '07, pages 271--278, 2007. Google ScholarDigital Library
B. Zhou and Y. Yao. Evaluating information retrieval system performance based on user preference. JIIS, 34:227--248, 2010. Google ScholarDigital Library

Index Terms

Top-k learning to rank: labeling, ranking and evaluation
1. Information systems
  1. Information retrieval

Recommendations

Policy-Aware Unbiased Learning to Rank for Top-k Rankings
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently ...
Read More
Is top-k sufficient for ranking?
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Recently,`top-k learning to rank' has attracted much attention in the community of information retrieval. The motivation comes from the difficulty in obtaining a full-order ranking list for training, when employing reliable pairwise preference judgment. ...
Read More
Learning to rank code examples for code search engines

Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
August 2012
1236 pages
ISBN:9781450314725
DOI:10.1145/2348283
General Chair:
William Hersh
Oregon Health & Science University, USA
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
,
Mark Sanderson
Royal Melbourne Institute of Technology, Australia
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation
learning to rank
preference judgment
top-k
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 2,215
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

Is top-k sufficient for ranking?

Learning to rank code examples for code search engines