research-article

WassRank: Listwise Document Ranking Using Optimal Transport Theory

Authors:

Joemon M. Jose,

Long ChenAuthors Info & Claims

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pages 24 - 32

https://doi.org/10.1145/3289600.3291006

Published: 30 January 2019 Publication History

Abstract

Learning to rank has been intensively studied and has shown great value in many fields, such as web search, question answering and recommender systems. This paper focuses on listwise document ranking, where all documents associated with the same query in the training data are used as the input. We propose a novel ranking method, referred to as WassRank, under which the problem of listwise document ranking boils down to the task of learning the optimal ranking function that achieves the minimum Wasserstein distance. Specifically, given the query level predictions and the ground truth labels, we first map them into two probability vectors. Analogous to the optimal transport problem, we view each probability vector as a pile of relevance mass with peaks indicating higher relevance. The listwise ranking loss is formulated as the minimum cost (the Wasserstein distance) of transporting (or reshaping) the pile of predicted relevance mass so that it matches the pile of ground-truth relevance mass. The smaller the Wasserstein distance is, the closer the prediction gets to the ground-truth. To better capture the inherent relevance-based order information among documents with different relevance labels and lower the variance of predictions for documents with the same relevance label, ranking-specific cost matrix is imposed. To validate the effectiveness of WassRank, we conduct a series of experiments on two benchmark collections. The experimental results demonstrate that: compared with four non-trivial listwise ranking methods (i.e., LambdaRank, ListNet, ListMLE and ApxNDCG), WassRank can achieve substantially improved performance in terms of nDCG and ERR across different rank positions. Specifically, the maximum improvements of WassRank over LambdaRank, ListNet, ListMLE and ApxNDCG in terms of nDCG@1 are 15%, 5%, 7%, 5%, respectively.

References

[1]

Jason Altschuler, Jonathan Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of NIPS conference. 1964--1974.

Digital Library

[2]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th ICML. 214--223.

Digital Library

[3]

Peter L. Bartlett and Shahar Mendelson. 2003. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, Vol. 3 (2003), 463--482.

Digital Library

[4]

Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi. 2004. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning (2004), 169--207.

[5]

Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd ICML. 89--96.

Digital Library

[6]

Christopher J.C. Burges, Robert Ragno, and Quoc Viet Le. 2006. Learning to Rank with Nonsmooth Cost Functions. In Proceedings of NIPS conference. 193--200.

Digital Library

[7]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to Rank: From Pairwise Approach to Listwise Approach. In Proceedings of the 24th ICML. 129--136.

Digital Library

[8]

Olivier Chapelle and Yi Chang. 2010. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the 2010 International Conference on YLRC. 1--24.

Digital Library

[9]

Olivier Chapelle, Quoc Le, and Alex Smola. 2007. Large margin optimization of ranking measures. In NIPS workshop on Machine Learning for Web Search.

[10]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th CIKM. 621--630.

Digital Library

[11]

Wei Chu and Zoubin Ghahramani. 2005. Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research, Vol. 6 (2005), 1019--1041.

Digital Library

[12]

Wei Chu and S. Sathiya Keerthi. 2005. New Approaches to Support Vector Ordinal Regression. In Proceedings of the 22nd ICML. 145--152.

Digital Library

[13]

David Cossock and Tong Zhang. 2006. Subset Ranking Using Regression. In Proceedings of the 19th Annual Conference on Learning Theory. 605--619.

Digital Library

[14]

Marco Cuturi. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proceedings of NIPS 26. 2292--2300.

Digital Library

[15]

Julie Delon. 2006. Movie and video scale-time equalization application to flicker reduction. IEEE Transactions on Image Processing, Vol. 15, 1 (2006), 241--248.

Digital Library

[16]

Sira Ferradans, Gui-Song Xia, Gabriel Peyré, and Jean-Francc ois Aujol. 2013. Static and Dynamic Texture Mixing Using Optimal Transport. In Scale Space and Variational Methods in Computer Vision. 137--148.

[17]

Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, Vol. 4 (2003), 933--969.

Digital Library

[18]

Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. 2015. Learning with a Wasserstein Loss. In Proceedings of NIPS 28. 2053--2061.

Digital Library

[19]

Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-tracking Analysis of User Behavior in WWW Search. In Proceedings of the 27th SIGIR. 478--479.

Digital Library

[20]

John Guiver and Edward Snelson. 2008. Learning to Rank with SoftRank and Gaussian Processes. In Proceedings of the 31st SIGIR. 259--266.

Digital Library

[21]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th CIKM. 55--64.

Digital Library

[22]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Proceedings of NIPS 27. 2042--2050.

Digital Library

[23]

Gao Huang, Chuan Quo, Matt J. Kusner, Yu Sun, Kilian Q. Weinberger, and Fei Sha. 2016. Supervised Word Mover's Distance. In Proceedings of NIPS conference. 4869--4877.

Digital Library

[24]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM2013. 2333--2338.

Digital Library

[25]

Thomas Hurtut, Yann Gousseau, and Francis Schmitt. 2008. Adaptive image retrieval based on the spatial organization of colors. Computer Vision and Image Understanding, Vol. 112, 2 (2008), 101--113.

Digital Library

[26]

Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 422--446.

Digital Library

[27]

Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. Proceedings of the 8th KDD. 133--142.

Digital Library

[28]

Thorsten Joachims. 2006. Training Linear SVMs in Linear Time. In Proceedings of the 12th KDD. 217--226.

Digital Library

[29]

Yanyan Lan, Tie-Yan Liu, Zhiming Ma, and Hang Li. 2009. Generalization Analysis of Listwise Learning-to-rank Algorithms. In Proceedings of the 26th ICML. 577--584.

Digital Library

[30]

Yanyan Lan, Yadong Zhu, Jiafeng Guo, Shuzi Niu, and Xueqi Cheng. 2014. Position-aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the 30th Conference on UAI. 449--458.

Digital Library

[31]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.

[32]

Hang Li. 2011. Learning to Rank for Information Retrieval and Natural Language Processing. Vol. 4. Synthesis Lectures on Human Language Technologies.

Digital Library

[33]

Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval.Springer.

Digital Library

[34]

Grégoire Montavon, Klaus-Robert Müller, and Marco Cuturi. 2016. Wasserstein Training of Restricted Boltzmann Machines. In Proceedings of NIPS conference. 3718--3726.

Digital Library

[35]

Ramesh Nallapati. 2004. Discriminative Models for Information Retrieval. Proceedings of the 27th SIGIR. 64--71.

Digital Library

[36]

Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, et almbox. 2018. Neural Information Retrieval: At the End of the Early Years. Journal of Information Retrieval, Vol. 21, 2--3 (2018), 111--182.

Digital Library

[37]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching As Image Recognition. In Proceedings of AAAI Conference on Artificial Intelligence. 2793--2799.

Digital Library

[38]

Gabriel Peyré and Marco Cuturi. 2018. Computational Optimal Transport.

[39]

Gabriel Peyré, Jalal Fadili, and Julien Rabin. 2012. Wasserstein active contours. In 19th IEEE International Conference on Image Processing. 2541--2544.

[40]

Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Journal of Information Retrieval, Vol. 13, 4 (2010), 375--397.

Digital Library

[41]

Tao Qin, Xu-Dong Zhang, Ming-Feng Tsai, De-Sheng Wang, Tie-Yan Liu, and Hang Li. 2008. Query-level loss functions for information retrieval. Information Processing and Management, Vol. 44, 2 (2008), 838--855.

Digital Library

[42]

Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. 2011. On NDCG Consistency of Listwise Ranking Methods. Proceedings of Machine Learning Research. 618--626.

[43]

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of TREC.

[44]

Antoine Rolet, Marco Cuturi, and Gabriel Peyré. 2016. Fast Dictionary Learning with a Smoothed Wasserstein Loss. In Proceedings of the 19th International Conference on AIS. 630--638.

[45]

Libin Shen and Aravind K. Joshi. 2005. Ranking and Reranking with Perceptron. Machine Learning, Vol. 60, 1--3 (2005), 73--96.

Digital Library

[46]

Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning Semantic Representations Using Convolutional Neural Networks for Web Search. In Proceedings of the 23rd WWW. 373--374.

Digital Library

[47]

Richard Sinkhorn. 1967. Diagonal Equivalence to Matrices with Prescribed Row and Column Sums. The American Mathematical Monthly, Vol. 74, 4 (1967), 402--405.

[48]

Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 28, 1 (1972), 11--21.

[49]

Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In Proceedings of the 1st WSDM. 77--86.

Digital Library

[50]

Maksims N. Volkovs and Richard S. Zemel. 2009. BoltzRank: Learning to Maximize Expected Ranking Gain. In Proceedings of ICML conference. 1089--1096.

Digital Library

[51]

Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016. Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN. In Proceedings of IJCAI conference. 2922--2928.

Digital Library

[52]

Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th SIGIR. 283--292.

Digital Library

[53]

Qiang Wu, Christopher J. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, Vol. 13, 3 (2010), 254--270.

Digital Library

[54]

Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise Approach to Learning to Rank: Theory and Algorithm. In Proceedings of the 25th ICML. 1192--1199.

Digital Library

[55]

Jia Xu, Bin Lei, Yu Gu, Marianne Winslett, Ge Yu, and Zhenjie Zhang. 2015. Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 8 (2015), 2148--2162.

Digital Library

[56]

Jun Xu and Hang Li. 2007. AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th SIGIR. 391--398.

Digital Library

[57]

Fajie Yuan, Guibing Guo, Joemon Jose, Long Chen, Hai-Tao Yu, and Weinan Zhang. 2016. LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates. In Proceedings of the 25th CIKM. 227--236.

Digital Library

[58]

Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. 2007. A Support Vector Method for Optimizing Average Precision. In Proceedings of the 30th SIGIR. 271--278.

Digital Library

[59]

Martin A. Zinkevich, Markus Weimer, Alex Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. In Proceedings of NIPS conference. 2595--2603.

Digital Library

Cited By

Yao XZhang ZHu XYang JGuo YZhu DAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)COTER: Conditional Optimal Transport meets Table RetrievalProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635796(911-919)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635796
Li YWang YLin GHuang YLiu JLin YWei DZhang QMa KZhang ZLu GZheng Y(2024)Triplet-branch network with contrastive prior-knowledge embedding for disease gradingArtificial Intelligence in Medicine10.1016/j.artmed.2024.102801149(102801)Online publication date: Mar-2024
https://doi.org/10.1016/j.artmed.2024.102801
Yang WYang JLi WGuo Y(2024)ConClue: Conditional Clue Extraction for Multiple Choice Question AnsweringDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70552-6_11(183-198)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-70552-6_11
Show More Cited By

Index Terms

WassRank: Listwise Document Ranking Using Optimal Transport Theory
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Learning to rank code examples for code search engines

Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Ranking Relevance in Yahoo Search
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

January 2019

874 pages

ISBN:9781450359405

DOI:10.1145/3289600

General Chairs:
J. Shane Culpepper
RMIT University
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Paul N. Bennett
Microsoft
,
Kristina Lerman
University of Southern California

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Japan Society for the Promotion of Science (JSPS)

Conference

WSDM '19

Sponsor:

WSDM '19: The Twelfth ACM International Conference on Web Search and Data Mining

February 11 - 15, 2019

Melbourne VIC, Australia

Acceptance Rates

WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
875
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)3

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao XZhang ZHu XYang JGuo YZhu DAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)COTER: Conditional Optimal Transport meets Table RetrievalProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635796(911-919)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635796
Li YWang YLin GHuang YLiu JLin YWei DZhang QMa KZhang ZLu GZheng Y(2024)Triplet-branch network with contrastive prior-knowledge embedding for disease gradingArtificial Intelligence in Medicine10.1016/j.artmed.2024.102801149(102801)Online publication date: Mar-2024
https://doi.org/10.1016/j.artmed.2024.102801
Yang WYang JLi WGuo Y(2024)ConClue: Conditional Clue Extraction for Multiple Choice Question AnsweringDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70552-6_11(183-198)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-70552-6_11
Tan HYang KYu H(2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 23-Mar-2024
https://doi.org/10.1007/978-3-031-56063-7_39
Li FChen WYang ZFu MZhan YQu H(2023)An Empirical Perspective on Learning-to-rankProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594351(419-424)Online publication date: 17-Mar-2023
https://dl.acm.org/doi/10.1145/3594315.3594351
Li FQu HFu MZhang LZhang FChen WSun RZhang H(2023)Neural Reranking-Based Collaborative Filtering by Leveraging Listwise Relative Ranking InformationIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2022.318886953:2(882-896)Online publication date: Feb-2023
https://doi.org/10.1109/TSMC.2022.3188869
Yu HPiryani RJatowt AInagaki RJoho HKim K(2023)An in-depth study on adversarial learning-to-rankInformation Retrieval Journal10.1007/s10791-023-09419-026:1Online publication date: 28-Feb-2023
https://doi.org/10.1007/s10791-023-09419-0
Zhang FChen WFu MLi FQu HYi Z(2022)An Attention-Based Interactive Learning-to-Rank Model for Document RetrievalIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2021.312983952:9(5770-5782)Online publication date: Sep-2022
https://doi.org/10.1109/TSMC.2021.3129839
Keshvari SEnsan FSadoghi Yazdi H(2022)ListMAP: Listwise learning to rank as maximum a posteriori estimationInformation Processing & Management10.1016/j.ipm.2022.10296259:4(102962)Online publication date: Jul-2022
https://doi.org/10.1016/j.ipm.2022.102962
Wu X(2021)When Creative AI Meets Conversational AIJournal of Natural Language Processing10.5715/jnlp.28.88128:3(881-887)Online publication date: 2021
https://doi.org/10.5715/jnlp.28.881
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten