ABSTRACT
Learning to rank has been intensively studied and has shown great value in many fields, such as web search, question answering and recommender systems. This paper focuses on listwise document ranking, where all documents associated with the same query in the training data are used as the input. We propose a novel ranking method, referred to as WassRank, under which the problem of listwise document ranking boils down to the task of learning the optimal ranking function that achieves the minimum Wasserstein distance. Specifically, given the query level predictions and the ground truth labels, we first map them into two probability vectors. Analogous to the optimal transport problem, we view each probability vector as a pile of relevance mass with peaks indicating higher relevance. The listwise ranking loss is formulated as the minimum cost (the Wasserstein distance) of transporting (or reshaping) the pile of predicted relevance mass so that it matches the pile of ground-truth relevance mass. The smaller the Wasserstein distance is, the closer the prediction gets to the ground-truth. To better capture the inherent relevance-based order information among documents with different relevance labels and lower the variance of predictions for documents with the same relevance label, ranking-specific cost matrix is imposed. To validate the effectiveness of WassRank, we conduct a series of experiments on two benchmark collections. The experimental results demonstrate that: compared with four non-trivial listwise ranking methods (i.e., LambdaRank, ListNet, ListMLE and ApxNDCG), WassRank can achieve substantially improved performance in terms of nDCG and ERR across different rank positions. Specifically, the maximum improvements of WassRank over LambdaRank, ListNet, ListMLE and ApxNDCG in terms of nDCG@1 are 15%, 5%, 7%, 5%, respectively.
- Jason Altschuler, Jonathan Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of NIPS conference. 1964--1974. Google ScholarDigital Library
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th ICML. 214--223. Google ScholarDigital Library
- Peter L. Bartlett and Shahar Mendelson. 2003. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, Vol. 3 (2003), 463--482. Google ScholarDigital Library
- Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi. 2004. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning (2004), 169--207.Google ScholarCross Ref
- Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd ICML. 89--96. Google ScholarDigital Library
- Christopher J.C. Burges, Robert Ragno, and Quoc Viet Le. 2006. Learning to Rank with Nonsmooth Cost Functions. In Proceedings of NIPS conference. 193--200. Google ScholarDigital Library
- Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to Rank: From Pairwise Approach to Listwise Approach. In Proceedings of the 24th ICML. 129--136. Google ScholarDigital Library
- Olivier Chapelle and Yi Chang. 2010. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the 2010 International Conference on YLRC. 1--24. Google ScholarDigital Library
- Olivier Chapelle, Quoc Le, and Alex Smola. 2007. Large margin optimization of ranking measures. In NIPS workshop on Machine Learning for Web Search.Google Scholar
- Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th CIKM. 621--630. Google ScholarDigital Library
- Wei Chu and Zoubin Ghahramani. 2005. Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research, Vol. 6 (2005), 1019--1041. Google ScholarDigital Library
- Wei Chu and S. Sathiya Keerthi. 2005. New Approaches to Support Vector Ordinal Regression. In Proceedings of the 22nd ICML. 145--152. Google ScholarDigital Library
- David Cossock and Tong Zhang. 2006. Subset Ranking Using Regression. In Proceedings of the 19th Annual Conference on Learning Theory. 605--619. Google ScholarDigital Library
- Marco Cuturi. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proceedings of NIPS 26. 2292--2300. Google ScholarDigital Library
- Julie Delon. 2006. Movie and video scale-time equalization application to flicker reduction. IEEE Transactions on Image Processing, Vol. 15, 1 (2006), 241--248. Google ScholarDigital Library
- Sira Ferradans, Gui-Song Xia, Gabriel Peyré, and Jean-Francc ois Aujol. 2013. Static and Dynamic Texture Mixing Using Optimal Transport. In Scale Space and Variational Methods in Computer Vision. 137--148.Google Scholar
- Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, Vol. 4 (2003), 933--969. Google ScholarDigital Library
- Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. 2015. Learning with a Wasserstein Loss. In Proceedings of NIPS 28. 2053--2061. Google ScholarDigital Library
- Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-tracking Analysis of User Behavior in WWW Search. In Proceedings of the 27th SIGIR. 478--479. Google ScholarDigital Library
- John Guiver and Edward Snelson. 2008. Learning to Rank with SoftRank and Gaussian Processes. In Proceedings of the 31st SIGIR. 259--266. Google ScholarDigital Library
- Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th CIKM. 55--64. Google ScholarDigital Library
- Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Proceedings of NIPS 27. 2042--2050. Google ScholarDigital Library
- Gao Huang, Chuan Quo, Matt J. Kusner, Yu Sun, Kilian Q. Weinberger, and Fei Sha. 2016. Supervised Word Mover's Distance. In Proceedings of NIPS conference. 4869--4877. Google ScholarDigital Library
- Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM2013. 2333--2338. Google ScholarDigital Library
- Thomas Hurtut, Yann Gousseau, and Francis Schmitt. 2008. Adaptive image retrieval based on the spatial organization of colors. Computer Vision and Image Understanding, Vol. 112, 2 (2008), 101--113. Google ScholarDigital Library
- Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 422--446. Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. Proceedings of the 8th KDD. 133--142. Google ScholarDigital Library
- Thorsten Joachims. 2006. Training Linear SVMs in Linear Time. In Proceedings of the 12th KDD. 217--226. Google ScholarDigital Library
- Yanyan Lan, Tie-Yan Liu, Zhiming Ma, and Hang Li. 2009. Generalization Analysis of Listwise Learning-to-rank Algorithms. In Proceedings of the 26th ICML. 577--584. Google ScholarDigital Library
- Yanyan Lan, Yadong Zhu, Jiafeng Guo, Shuzi Niu, and Xueqi Cheng. 2014. Position-aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the 30th Conference on UAI. 449--458. Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.Google ScholarCross Ref
- Hang Li. 2011. Learning to Rank for Information Retrieval and Natural Language Processing. Vol. 4. Synthesis Lectures on Human Language Technologies.Google ScholarDigital Library
- Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval.Springer.Google ScholarDigital Library
- Grégoire Montavon, Klaus-Robert Müller, and Marco Cuturi. 2016. Wasserstein Training of Restricted Boltzmann Machines. In Proceedings of NIPS conference. 3718--3726. Google ScholarDigital Library
- Ramesh Nallapati. 2004. Discriminative Models for Information Retrieval. Proceedings of the 27th SIGIR. 64--71. Google ScholarDigital Library
- Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, et almbox. 2018. Neural Information Retrieval: At the End of the Early Years. Journal of Information Retrieval, Vol. 21, 2--3 (2018), 111--182. Google ScholarDigital Library
- Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching As Image Recognition. In Proceedings of AAAI Conference on Artificial Intelligence. 2793--2799. Google ScholarDigital Library
- Gabriel Peyré and Marco Cuturi. 2018. Computational Optimal Transport.Google Scholar
- Gabriel Peyré, Jalal Fadili, and Julien Rabin. 2012. Wasserstein active contours. In 19th IEEE International Conference on Image Processing. 2541--2544.Google ScholarCross Ref
- Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Journal of Information Retrieval, Vol. 13, 4 (2010), 375--397. Google ScholarDigital Library
- Tao Qin, Xu-Dong Zhang, Ming-Feng Tsai, De-Sheng Wang, Tie-Yan Liu, and Hang Li. 2008. Query-level loss functions for information retrieval. Information Processing and Management, Vol. 44, 2 (2008), 838--855. Google ScholarDigital Library
- Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. 2011. On NDCG Consistency of Listwise Ranking Methods. Proceedings of Machine Learning Research. 618--626.Google Scholar
- Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of TREC.Google Scholar
- Antoine Rolet, Marco Cuturi, and Gabriel Peyré. 2016. Fast Dictionary Learning with a Smoothed Wasserstein Loss. In Proceedings of the 19th International Conference on AIS. 630--638.Google Scholar
- Libin Shen and Aravind K. Joshi. 2005. Ranking and Reranking with Perceptron. Machine Learning, Vol. 60, 1--3 (2005), 73--96. Google ScholarDigital Library
- Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning Semantic Representations Using Convolutional Neural Networks for Web Search. In Proceedings of the 23rd WWW. 373--374. Google ScholarDigital Library
- Richard Sinkhorn. 1967. Diagonal Equivalence to Matrices with Prescribed Row and Column Sums. The American Mathematical Monthly, Vol. 74, 4 (1967), 402--405.Google ScholarCross Ref
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 28, 1 (1972), 11--21.Google ScholarCross Ref
- Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In Proceedings of the 1st WSDM. 77--86. Google ScholarDigital Library
- Maksims N. Volkovs and Richard S. Zemel. 2009. BoltzRank: Learning to Maximize Expected Ranking Gain. In Proceedings of ICML conference. 1089--1096. Google ScholarDigital Library
- Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016. Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN. In Proceedings of IJCAI conference. 2922--2928. Google ScholarDigital Library
- Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th SIGIR. 283--292. Google ScholarDigital Library
- Qiang Wu, Christopher J. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, Vol. 13, 3 (2010), 254--270. Google ScholarDigital Library
- Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise Approach to Learning to Rank: Theory and Algorithm. In Proceedings of the 25th ICML. 1192--1199. Google ScholarDigital Library
- Jia Xu, Bin Lei, Yu Gu, Marianne Winslett, Ge Yu, and Zhenjie Zhang. 2015. Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 8 (2015), 2148--2162.Google ScholarDigital Library
- Jun Xu and Hang Li. 2007. AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th SIGIR. 391--398. Google ScholarDigital Library
- Fajie Yuan, Guibing Guo, Joemon Jose, Long Chen, Hai-Tao Yu, and Weinan Zhang. 2016. LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates. In Proceedings of the 25th CIKM. 227--236. Google ScholarDigital Library
- Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. 2007. A Support Vector Method for Optimizing Average Precision. In Proceedings of the 30th SIGIR. 271--278. Google ScholarDigital Library
- Martin A. Zinkevich, Markus Weimer, Alex Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. In Proceedings of NIPS conference. 2595--2603. Google ScholarDigital Library
Index Terms
- WassRank: Listwise Document Ranking Using Optimal Transport Theory
Recommendations
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebModern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Learning to rank code examples for code search engines
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Ranking Relevance in Yahoo Search
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningSearch engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search ...
Comments