skip to main content
10.1145/3289600.3291006acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

WassRank: Listwise Document Ranking Using Optimal Transport Theory

Published: 30 January 2019 Publication History

Abstract

Learning to rank has been intensively studied and has shown great value in many fields, such as web search, question answering and recommender systems. This paper focuses on listwise document ranking, where all documents associated with the same query in the training data are used as the input. We propose a novel ranking method, referred to as WassRank, under which the problem of listwise document ranking boils down to the task of learning the optimal ranking function that achieves the minimum Wasserstein distance. Specifically, given the query level predictions and the ground truth labels, we first map them into two probability vectors. Analogous to the optimal transport problem, we view each probability vector as a pile of relevance mass with peaks indicating higher relevance. The listwise ranking loss is formulated as the minimum cost (the Wasserstein distance) of transporting (or reshaping) the pile of predicted relevance mass so that it matches the pile of ground-truth relevance mass. The smaller the Wasserstein distance is, the closer the prediction gets to the ground-truth. To better capture the inherent relevance-based order information among documents with different relevance labels and lower the variance of predictions for documents with the same relevance label, ranking-specific cost matrix is imposed. To validate the effectiveness of WassRank, we conduct a series of experiments on two benchmark collections. The experimental results demonstrate that: compared with four non-trivial listwise ranking methods (i.e., LambdaRank, ListNet, ListMLE and ApxNDCG), WassRank can achieve substantially improved performance in terms of nDCG and ERR across different rank positions. Specifically, the maximum improvements of WassRank over LambdaRank, ListNet, ListMLE and ApxNDCG in terms of nDCG@1 are 15%, 5%, 7%, 5%, respectively.

References

[1]
Jason Altschuler, Jonathan Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of NIPS conference. 1964--1974.
[2]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th ICML. 214--223.
[3]
Peter L. Bartlett and Shahar Mendelson. 2003. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, Vol. 3 (2003), 463--482.
[4]
Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi. 2004. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning (2004), 169--207.
[5]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd ICML. 89--96.
[6]
Christopher J.C. Burges, Robert Ragno, and Quoc Viet Le. 2006. Learning to Rank with Nonsmooth Cost Functions. In Proceedings of NIPS conference. 193--200.
[7]
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to Rank: From Pairwise Approach to Listwise Approach. In Proceedings of the 24th ICML. 129--136.
[8]
Olivier Chapelle and Yi Chang. 2010. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the 2010 International Conference on YLRC. 1--24.
[9]
Olivier Chapelle, Quoc Le, and Alex Smola. 2007. Large margin optimization of ranking measures. In NIPS workshop on Machine Learning for Web Search.
[10]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th CIKM. 621--630.
[11]
Wei Chu and Zoubin Ghahramani. 2005. Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research, Vol. 6 (2005), 1019--1041.
[12]
Wei Chu and S. Sathiya Keerthi. 2005. New Approaches to Support Vector Ordinal Regression. In Proceedings of the 22nd ICML. 145--152.
[13]
David Cossock and Tong Zhang. 2006. Subset Ranking Using Regression. In Proceedings of the 19th Annual Conference on Learning Theory. 605--619.
[14]
Marco Cuturi. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proceedings of NIPS 26. 2292--2300.
[15]
Julie Delon. 2006. Movie and video scale-time equalization application to flicker reduction. IEEE Transactions on Image Processing, Vol. 15, 1 (2006), 241--248.
[16]
Sira Ferradans, Gui-Song Xia, Gabriel Peyré, and Jean-Francc ois Aujol. 2013. Static and Dynamic Texture Mixing Using Optimal Transport. In Scale Space and Variational Methods in Computer Vision. 137--148.
[17]
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, Vol. 4 (2003), 933--969.
[18]
Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. 2015. Learning with a Wasserstein Loss. In Proceedings of NIPS 28. 2053--2061.
[19]
Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-tracking Analysis of User Behavior in WWW Search. In Proceedings of the 27th SIGIR. 478--479.
[20]
John Guiver and Edward Snelson. 2008. Learning to Rank with SoftRank and Gaussian Processes. In Proceedings of the 31st SIGIR. 259--266.
[21]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th CIKM. 55--64.
[22]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Proceedings of NIPS 27. 2042--2050.
[23]
Gao Huang, Chuan Quo, Matt J. Kusner, Yu Sun, Kilian Q. Weinberger, and Fei Sha. 2016. Supervised Word Mover's Distance. In Proceedings of NIPS conference. 4869--4877.
[24]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM2013. 2333--2338.
[25]
Thomas Hurtut, Yann Gousseau, and Francis Schmitt. 2008. Adaptive image retrieval based on the spatial organization of colors. Computer Vision and Image Understanding, Vol. 112, 2 (2008), 101--113.
[26]
Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 422--446.
[27]
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. Proceedings of the 8th KDD. 133--142.
[28]
Thorsten Joachims. 2006. Training Linear SVMs in Linear Time. In Proceedings of the 12th KDD. 217--226.
[29]
Yanyan Lan, Tie-Yan Liu, Zhiming Ma, and Hang Li. 2009. Generalization Analysis of Listwise Learning-to-rank Algorithms. In Proceedings of the 26th ICML. 577--584.
[30]
Yanyan Lan, Yadong Zhu, Jiafeng Guo, Shuzi Niu, and Xueqi Cheng. 2014. Position-aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the 30th Conference on UAI. 449--458.
[31]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.
[32]
Hang Li. 2011. Learning to Rank for Information Retrieval and Natural Language Processing. Vol. 4. Synthesis Lectures on Human Language Technologies.
[33]
Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval.Springer.
[34]
Grégoire Montavon, Klaus-Robert Müller, and Marco Cuturi. 2016. Wasserstein Training of Restricted Boltzmann Machines. In Proceedings of NIPS conference. 3718--3726.
[35]
Ramesh Nallapati. 2004. Discriminative Models for Information Retrieval. Proceedings of the 27th SIGIR. 64--71.
[36]
Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, et almbox. 2018. Neural Information Retrieval: At the End of the Early Years. Journal of Information Retrieval, Vol. 21, 2--3 (2018), 111--182.
[37]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching As Image Recognition. In Proceedings of AAAI Conference on Artificial Intelligence. 2793--2799.
[38]
Gabriel Peyré and Marco Cuturi. 2018. Computational Optimal Transport.
[39]
Gabriel Peyré, Jalal Fadili, and Julien Rabin. 2012. Wasserstein active contours. In 19th IEEE International Conference on Image Processing. 2541--2544.
[40]
Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Journal of Information Retrieval, Vol. 13, 4 (2010), 375--397.
[41]
Tao Qin, Xu-Dong Zhang, Ming-Feng Tsai, De-Sheng Wang, Tie-Yan Liu, and Hang Li. 2008. Query-level loss functions for information retrieval. Information Processing and Management, Vol. 44, 2 (2008), 838--855.
[42]
Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. 2011. On NDCG Consistency of Listwise Ranking Methods. Proceedings of Machine Learning Research. 618--626.
[43]
Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of TREC.
[44]
Antoine Rolet, Marco Cuturi, and Gabriel Peyré. 2016. Fast Dictionary Learning with a Smoothed Wasserstein Loss. In Proceedings of the 19th International Conference on AIS. 630--638.
[45]
Libin Shen and Aravind K. Joshi. 2005. Ranking and Reranking with Perceptron. Machine Learning, Vol. 60, 1--3 (2005), 73--96.
[46]
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning Semantic Representations Using Convolutional Neural Networks for Web Search. In Proceedings of the 23rd WWW. 373--374.
[47]
Richard Sinkhorn. 1967. Diagonal Equivalence to Matrices with Prescribed Row and Column Sums. The American Mathematical Monthly, Vol. 74, 4 (1967), 402--405.
[48]
Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 28, 1 (1972), 11--21.
[49]
Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In Proceedings of the 1st WSDM. 77--86.
[50]
Maksims N. Volkovs and Richard S. Zemel. 2009. BoltzRank: Learning to Maximize Expected Ranking Gain. In Proceedings of ICML conference. 1089--1096.
[51]
Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016. Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN. In Proceedings of IJCAI conference. 2922--2928.
[52]
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th SIGIR. 283--292.
[53]
Qiang Wu, Christopher J. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, Vol. 13, 3 (2010), 254--270.
[54]
Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise Approach to Learning to Rank: Theory and Algorithm. In Proceedings of the 25th ICML. 1192--1199.
[55]
Jia Xu, Bin Lei, Yu Gu, Marianne Winslett, Ge Yu, and Zhenjie Zhang. 2015. Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 8 (2015), 2148--2162.
[56]
Jun Xu and Hang Li. 2007. AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th SIGIR. 391--398.
[57]
Fajie Yuan, Guibing Guo, Joemon Jose, Long Chen, Hai-Tao Yu, and Weinan Zhang. 2016. LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates. In Proceedings of the 25th CIKM. 227--236.
[58]
Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. 2007. A Support Vector Method for Optimizing Average Precision. In Proceedings of the 30th SIGIR. 271--278.
[59]
Martin A. Zinkevich, Markus Weimer, Alex Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. In Proceedings of NIPS conference. 2595--2603.

Cited By

View all
  • (2024)COTER: Conditional Optimal Transport meets Table RetrievalProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635796(911-919)Online publication date: 4-Mar-2024
  • (2024)Triplet-branch network with contrastive prior-knowledge embedding for disease gradingArtificial Intelligence in Medicine10.1016/j.artmed.2024.102801149(102801)Online publication date: Mar-2024
  • (2024)ConClue: Conditional Clue Extraction for Multiple Choice Question AnsweringDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70552-6_11(183-198)Online publication date: 11-Sep-2024
  • Show More Cited By

Index Terms

  1. WassRank: Listwise Document Ranking Using Optimal Transport Theory

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
    January 2019
    874 pages
    ISBN:9781450359405
    DOI:10.1145/3289600
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 January 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. learning to rank
    2. optimal transport
    3. wasserstein distance

    Qualifiers

    • Research-article

    Funding Sources

    • The Japan Society for the Promotion of Science (JSPS)

    Conference

    WSDM '19

    Acceptance Rates

    WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)COTER: Conditional Optimal Transport meets Table RetrievalProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635796(911-919)Online publication date: 4-Mar-2024
    • (2024)Triplet-branch network with contrastive prior-knowledge embedding for disease gradingArtificial Intelligence in Medicine10.1016/j.artmed.2024.102801149(102801)Online publication date: Mar-2024
    • (2024)ConClue: Conditional Clue Extraction for Multiple Choice Question AnsweringDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70552-6_11(183-198)Online publication date: 11-Sep-2024
    • (2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 23-Mar-2024
    • (2023)An Empirical Perspective on Learning-to-rankProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594351(419-424)Online publication date: 17-Mar-2023
    • (2023)Neural Reranking-Based Collaborative Filtering by Leveraging Listwise Relative Ranking InformationIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2022.318886953:2(882-896)Online publication date: Feb-2023
    • (2023)An in-depth study on adversarial learning-to-rankInformation Retrieval Journal10.1007/s10791-023-09419-026:1Online publication date: 28-Feb-2023
    • (2022)An Attention-Based Interactive Learning-to-Rank Model for Document RetrievalIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2021.312983952:9(5770-5782)Online publication date: Sep-2022
    • (2022)ListMAP: Listwise learning to rank as maximum a posteriori estimationInformation Processing & Management10.1016/j.ipm.2022.10296259:4(102962)Online publication date: Jul-2022
    • (2021)When Creative AI Meets Conversational AIJournal of Natural Language Processing10.5715/jnlp.28.88128:3(881-887)Online publication date: 2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media