skip to main content
10.1145/2348283.2348384acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Top-k learning to rank: labeling, ranking and evaluation

Authors Info & Claims
Published:12 August 2012Publication History

ABSTRACT

In this paper, we propose a novel top-k learning to rank framework, which involves labeling strategy, ranking model and evaluation measure. The motivation comes from the difficulty in obtaining reliable relevance judgments from human assessors when applying learning to rank in real search systems. The traditional absolute relevance judgment method is difficult in both gradation specification and human assessing, resulting in high level of disagreement on judgments. While the pairwise preference judgment, as a good alternative, is often criticized for increasing the complexity of judgment from O(n) to (n log n). Considering the fact that users mainly care about top ranked search results, we propose a novel top-k labeling strategy which adopts the pairwise preference judgment to generate the top k ordering items from n documents (i.e. top-k ground-truth) in a manner similar to that of HeapSort. As a result, the complexity of judgment is reduced to O(n log k). With the top-k ground-truth, traditional ranking models (e.g. pairwise or listwise models) and evaluation measures (e.g. NDCG) no longer fit the data set. Therefore, we introduce a new ranking model, namely FocusedRank, which fully captures the characteristics of the top-k ground-truth. We also extend the widely used evaluation measures NDCG and ERR to be applicable to the top-k ground-truth, referred as κ-NDCG and κ-ERR, respectively. Finally, we conduct extensive experiments on benchmark data collections to demonstrate the efficiency and effectiveness of our top-k labeling strategy and ranking models.

References

  1. N. Ailon and M. Mohri. An efficient reduction of ranking to classification. COLT '08, pages 87--98, 2008.Google ScholarGoogle Scholar
  2. C. Buckley and E. M. Voorhees. Retrieval system evaluation, chapter TREC: experiment and evaluation in information retrieval. MIT press, 2005.Google ScholarGoogle Scholar
  3. C. Burges, T. Shaked, and et al. Learning to rank using gradient descent. ICML '05, pages 89--96, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Burgin. Variations in relevance judgments and the evaluation of retrieval performance. IPM, 28:619--627, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. ICML '07, pages 129--136, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Carterette and P. N. Bennett. Evaluation measures for preference judgments. SIGIR '08, pages 685--686, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Carterette, P. N. Bennett, D. M. Chickering, and S. T. Dumais. Here or there: preference judgments for relevance. ECIR'08, pages 16--27, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. CIKM '09, pages 621--630. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Clémençon and N. Vayatis. Ranking the best instances. JMLR, 8:2671--2699, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Cossock and T. Zhang. Subset ranking using regression. Learning theory, 4005:605--619, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933--969, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. P. C. III. The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, No. 4:407--422.Google ScholarGoogle Scholar
  13. K. Jarvelin and J. Kek\"al\"ainen. Ir evaluation methods for retrieving highly relevant documents. SIGIR '00, pages 41--48, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Joachims. Optimizing search engines using clickthrough data. KDD '02, pages 133--142, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Kekalainen. Binary and graded relevance in ir evaluations-comparison of the effects on ranking of ir systems. IPM, 41:1019--1033, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27:2:1--2:27, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. P., B. C., and W. Q. Mcrank: learning to rank using multiple classification and gradient boosting. In NIPS2007, pages 845--852.Google ScholarGoogle Scholar
  18. T. Qin, T.-Y. Liu, and et al. Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13:346--374, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. R., S. W. Jr., and V. J.L. Towards the identification of the optimal number of relevance categories. JASIS, 50:254--264, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Radinsky and N. Ailon. Ranking from pairs and triplets: information quality, evaluation methods and query complexity. WSDM '11, pages 105--114, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. KDD '05, pages 239--248, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. G. Rebecca and N. Melisa. The neutral point on a likert scale. Journal of Psychology, 95:199--204, 1971.Google ScholarGoogle Scholar
  23. P. R.L. The analysis of permutations. Applied Statistics, 24(2):193--202, 1974.Google ScholarGoogle Scholar
  24. M. Rorvig. The simple scalability of documents. JASIS, 41:590--598, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  25. C. Rudin. Ranking with a p-norm push. In COLT, pages 589--604, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Song, Q. Guo, R. Zhang, and et al. Select-the-best-ones: A new way to judge relative relevance. IPM, 47:37--52, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. SIGIR '98, pages 315--323. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. IPM, 36:697--716, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. M. Voorhees. Evaluation by highly relevant documents. SIGIR '01, pages 74--82. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Xia, T.-Y. Liu, and H. Li. Statistical consistency of top-k ranking. In NIPS, pages 2098--2106, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. SIGIR '07, pages 391--398, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yao. Measuring retrieval effectiveness based on user preference of documents. JASIS, 46:133--145, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. SIGIR '07, pages 271--278, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Zhou and Y. Yao. Evaluating information retrieval system performance based on user preference. JIIS, 34:227--248, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Top-k learning to rank: labeling, ranking and evaluation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
      August 2012
      1236 pages
      ISBN:9781450314725
      DOI:10.1145/2348283

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader