ABSTRACT
Recently increasing attention has been focused on directly optimizing ranking measures and inducing sparsity in learning models. However, few attempts have been made to relate them together in approaching the problem of learning to rank. In this paper, we consider the sparse algorithms to directly optimize the Normalized Discounted Cumulative Gain (NDCG) which is a widely-used ranking measure. We begin by establishing a reduction framework under which we reduce ranking, as measured by NDCG, to the importance weighted pairwise classification. Furthermore, we provide a sound theoretical guarantee for this reduction, bounding the realized NDCG regret in terms of a properly weighted pairwise classification regret, which implies that good performance can be robustly transferred from pairwise classification to ranking. Based on the converted pairwise loss function, it is conceivable to take into account sparsity in ranking models and to come up with a gradient possessing certain performance guarantee. For the sake of achieving sparsity, a novel algorithm named RSRank has also been devised, which performs L1 regularization using truncated gradient descent. Finally, experimental results on benchmark collection confirm the significant advantage of RSRank in comparison with several baseline methods.
- M.R. Asia. Letor3.0: benchmark datasets for learning to rank. Microsoft Corporation, December 2008.Google Scholar
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
- M.-F. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. Sorkin. Robust reductions from ranking to classification. Machine Learning, 72(1-2):139--153, August 2008. Google ScholarDigital Library
- A. Beygelzimer, V. Dani, T. Hayes, and J. Langford. Reductions between classification tasks. Electronic Colloquium on Computational Complexity, 11, 2004.Google Scholar
- A. Beygelzimer, V. Dani, T. Hayes, J. Langford, and B. Zadrozny. Error limiting reductions between classification tasks. In Proceedings of the 22th International Conference on Machine Learning, pages 49--56, ACM, 2005. Google ScholarDigital Library
- C. Burges, R. Ragno, and Q. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 19, pages 193--200. MIT Press, 2007.Google Scholar
- Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning, pages 129--136. ACM, 2007. Google ScholarDigital Library
- S. Chakrabarti, R. Khanna, U. Sawant, and C. Bhattacharyya. Structured learning for non-smooth ranking losses. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 88--96. ACM, 2008. Google ScholarDigital Library
- O. Chapelle, Q. Le, and A. Smola. Large margin optimization of ranking measures. In NIPS Workshop on Machine Learning for Web Search, 2007.Google Scholar
- J. Guiver and E. Snelson. Learning to rank with softrank and gaussian processes. In Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in IR, pages 259--266. ACM, 2008. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, October 2002. Google ScholarDigital Library
- J. Langford and A. Beygelzimer. Sensitive error correcting output codes. Lecture Notes in Computer Science, 3559:158--172, June 2005. Google ScholarDigital Library
- J. Langford, L. Li and T. Zhang. Sparse online learning via truncated gradient. CoRR, abs/0806.4686, 2008.Google Scholar
- Q. Le and A. Smola. Direct optimization of ranking measures. CoRR, abs/0704.3359, 2007.Google Scholar
- T. Qin, T.-Y. Liu, and H. Li. A general approximation framework for direct optimization of information retrieval measures. MSR-TR-2008-164, Microsoft Research, 2008.Google Scholar
- M. Taylor, J. Guiver, S. Robertson, and T. Minka. SoftRank: optimizing non-smooth rank metrics. In Proceedings of the International Conference on Web Search and Web Data Mining, pages 77--86. ACM, 2008. Google ScholarDigital Library
- E. Voorhees. Overview of the TREC 2002 question answering track. In Proceedings of the Eleventh Text Retrieval Conference(TREC), pages 115--123, 2002.Google Scholar
- J. Xu and H. Li. AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in IR, pages 391--398. ACM, 2007. Google ScholarDigital Library
- J.-Y. Yeh, J.-Y. Lin, H.-R. Ke, and W.-P. Yang. Learning to rank for information retrieval using genetic programming. In LR4IR, 2007.Google Scholar
- Y. Yue and C. Burges. On using simultaneous perturbation stochastic approximation for learning to rank, and the empirical optimality of lambdarank. MSR-TR-2007-115, Microsoft Research, 2007.Google Scholar
- T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annuals of Statistics, 32:56--85, 2004.Google ScholarCross Ref
- T. Zhang. Some sharp performance bounds for least squares regression with L1 regularization. Technical Report TR-2007-005, Rutgers Statistics Department, 2007.Google Scholar
- P. Zhao and B. Yu. On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541--2563, December 2006. Google ScholarDigital Library
Index Terms
- Robust sparse rank learning for non-smooth ranking measures
Recommendations
Learning to Rank with Ensemble Ranking SVM
In this paper, we propose a novel learning to rank method using Ensemble Ranking SVM. Ensemble Ranking SVM is based on Ranking SVM which has been commonly used for learning to rank. The basic idea of Ranking SVM is to formulate the problem of learning ...
Stabilized sparse online learning for sparse data
Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional sparse data, ...
Effective rank aggregation for metasearching
Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
Comments