skip to main content
10.1145/3289600.3291006acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

WassRank: Listwise Document Ranking Using Optimal Transport Theory

Published:30 January 2019Publication History

ABSTRACT

Learning to rank has been intensively studied and has shown great value in many fields, such as web search, question answering and recommender systems. This paper focuses on listwise document ranking, where all documents associated with the same query in the training data are used as the input. We propose a novel ranking method, referred to as WassRank, under which the problem of listwise document ranking boils down to the task of learning the optimal ranking function that achieves the minimum Wasserstein distance. Specifically, given the query level predictions and the ground truth labels, we first map them into two probability vectors. Analogous to the optimal transport problem, we view each probability vector as a pile of relevance mass with peaks indicating higher relevance. The listwise ranking loss is formulated as the minimum cost (the Wasserstein distance) of transporting (or reshaping) the pile of predicted relevance mass so that it matches the pile of ground-truth relevance mass. The smaller the Wasserstein distance is, the closer the prediction gets to the ground-truth. To better capture the inherent relevance-based order information among documents with different relevance labels and lower the variance of predictions for documents with the same relevance label, ranking-specific cost matrix is imposed. To validate the effectiveness of WassRank, we conduct a series of experiments on two benchmark collections. The experimental results demonstrate that: compared with four non-trivial listwise ranking methods (i.e., LambdaRank, ListNet, ListMLE and ApxNDCG), WassRank can achieve substantially improved performance in terms of nDCG and ERR across different rank positions. Specifically, the maximum improvements of WassRank over LambdaRank, ListNet, ListMLE and ApxNDCG in terms of nDCG@1 are 15%, 5%, 7%, 5%, respectively.

References

  1. Jason Altschuler, Jonathan Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of NIPS conference. 1964--1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th ICML. 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter L. Bartlett and Shahar Mendelson. 2003. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, Vol. 3 (2003), 463--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi. 2004. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning (2004), 169--207.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd ICML. 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christopher J.C. Burges, Robert Ragno, and Quoc Viet Le. 2006. Learning to Rank with Nonsmooth Cost Functions. In Proceedings of NIPS conference. 193--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to Rank: From Pairwise Approach to Listwise Approach. In Proceedings of the 24th ICML. 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Olivier Chapelle and Yi Chang. 2010. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the 2010 International Conference on YLRC. 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Olivier Chapelle, Quoc Le, and Alex Smola. 2007. Large margin optimization of ranking measures. In NIPS workshop on Machine Learning for Web Search.Google ScholarGoogle Scholar
  10. Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th CIKM. 621--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wei Chu and Zoubin Ghahramani. 2005. Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research, Vol. 6 (2005), 1019--1041. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wei Chu and S. Sathiya Keerthi. 2005. New Approaches to Support Vector Ordinal Regression. In Proceedings of the 22nd ICML. 145--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David Cossock and Tong Zhang. 2006. Subset Ranking Using Regression. In Proceedings of the 19th Annual Conference on Learning Theory. 605--619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marco Cuturi. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proceedings of NIPS 26. 2292--2300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Julie Delon. 2006. Movie and video scale-time equalization application to flicker reduction. IEEE Transactions on Image Processing, Vol. 15, 1 (2006), 241--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sira Ferradans, Gui-Song Xia, Gabriel Peyré, and Jean-Francc ois Aujol. 2013. Static and Dynamic Texture Mixing Using Optimal Transport. In Scale Space and Variational Methods in Computer Vision. 137--148.Google ScholarGoogle Scholar
  17. Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, Vol. 4 (2003), 933--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. 2015. Learning with a Wasserstein Loss. In Proceedings of NIPS 28. 2053--2061. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-tracking Analysis of User Behavior in WWW Search. In Proceedings of the 27th SIGIR. 478--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. John Guiver and Edward Snelson. 2008. Learning to Rank with SoftRank and Gaussian Processes. In Proceedings of the 31st SIGIR. 259--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th CIKM. 55--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Proceedings of NIPS 27. 2042--2050. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gao Huang, Chuan Quo, Matt J. Kusner, Yu Sun, Kilian Q. Weinberger, and Fei Sha. 2016. Supervised Word Mover's Distance. In Proceedings of NIPS conference. 4869--4877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM2013. 2333--2338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thomas Hurtut, Yann Gousseau, and Francis Schmitt. 2008. Adaptive image retrieval based on the spatial organization of colors. Computer Vision and Image Understanding, Vol. 112, 2 (2008), 101--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. Proceedings of the 8th KDD. 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Thorsten Joachims. 2006. Training Linear SVMs in Linear Time. In Proceedings of the 12th KDD. 217--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yanyan Lan, Tie-Yan Liu, Zhiming Ma, and Hang Li. 2009. Generalization Analysis of Listwise Learning-to-rank Algorithms. In Proceedings of the 26th ICML. 577--584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yanyan Lan, Yadong Zhu, Jiafeng Guo, Shuzi Niu, and Xueqi Cheng. 2014. Position-aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the 30th Conference on UAI. 449--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.Google ScholarGoogle ScholarCross RefCross Ref
  32. Hang Li. 2011. Learning to Rank for Information Retrieval and Natural Language Processing. Vol. 4. Synthesis Lectures on Human Language Technologies.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval.Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Grégoire Montavon, Klaus-Robert Müller, and Marco Cuturi. 2016. Wasserstein Training of Restricted Boltzmann Machines. In Proceedings of NIPS conference. 3718--3726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ramesh Nallapati. 2004. Discriminative Models for Information Retrieval. Proceedings of the 27th SIGIR. 64--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, et almbox. 2018. Neural Information Retrieval: At the End of the Early Years. Journal of Information Retrieval, Vol. 21, 2--3 (2018), 111--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching As Image Recognition. In Proceedings of AAAI Conference on Artificial Intelligence. 2793--2799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Gabriel Peyré and Marco Cuturi. 2018. Computational Optimal Transport.Google ScholarGoogle Scholar
  39. Gabriel Peyré, Jalal Fadili, and Julien Rabin. 2012. Wasserstein active contours. In 19th IEEE International Conference on Image Processing. 2541--2544.Google ScholarGoogle ScholarCross RefCross Ref
  40. Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Journal of Information Retrieval, Vol. 13, 4 (2010), 375--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tao Qin, Xu-Dong Zhang, Ming-Feng Tsai, De-Sheng Wang, Tie-Yan Liu, and Hang Li. 2008. Query-level loss functions for information retrieval. Information Processing and Management, Vol. 44, 2 (2008), 838--855. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. 2011. On NDCG Consistency of Listwise Ranking Methods. Proceedings of Machine Learning Research. 618--626.Google ScholarGoogle Scholar
  43. Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of TREC.Google ScholarGoogle Scholar
  44. Antoine Rolet, Marco Cuturi, and Gabriel Peyré. 2016. Fast Dictionary Learning with a Smoothed Wasserstein Loss. In Proceedings of the 19th International Conference on AIS. 630--638.Google ScholarGoogle Scholar
  45. Libin Shen and Aravind K. Joshi. 2005. Ranking and Reranking with Perceptron. Machine Learning, Vol. 60, 1--3 (2005), 73--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning Semantic Representations Using Convolutional Neural Networks for Web Search. In Proceedings of the 23rd WWW. 373--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Richard Sinkhorn. 1967. Diagonal Equivalence to Matrices with Prescribed Row and Column Sums. The American Mathematical Monthly, Vol. 74, 4 (1967), 402--405.Google ScholarGoogle ScholarCross RefCross Ref
  48. Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 28, 1 (1972), 11--21.Google ScholarGoogle ScholarCross RefCross Ref
  49. Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In Proceedings of the 1st WSDM. 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Maksims N. Volkovs and Richard S. Zemel. 2009. BoltzRank: Learning to Maximize Expected Ranking Gain. In Proceedings of ICML conference. 1089--1096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016. Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN. In Proceedings of IJCAI conference. 2922--2928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th SIGIR. 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Qiang Wu, Christopher J. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, Vol. 13, 3 (2010), 254--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise Approach to Learning to Rank: Theory and Algorithm. In Proceedings of the 25th ICML. 1192--1199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jia Xu, Bin Lei, Yu Gu, Marianne Winslett, Ge Yu, and Zhenjie Zhang. 2015. Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 8 (2015), 2148--2162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jun Xu and Hang Li. 2007. AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th SIGIR. 391--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Fajie Yuan, Guibing Guo, Joemon Jose, Long Chen, Hai-Tao Yu, and Weinan Zhang. 2016. LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates. In Proceedings of the 25th CIKM. 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. 2007. A Support Vector Method for Optimizing Average Precision. In Proceedings of the 30th SIGIR. 271--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Martin A. Zinkevich, Markus Weimer, Alex Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. In Proceedings of NIPS conference. 2595--2603. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. WassRank: Listwise Document Ranking Using Optimal Transport Theory

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
      January 2019
      874 pages
      ISBN:9781450359405
      DOI:10.1145/3289600

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 January 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WSDM '19 Paper Acceptance Rate84of511submissions,16%Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader