skip to main content
10.1145/3366423.3380305acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Leveraging Passage-level Cumulative Gain for Document Ranking

Published: 20 April 2020 Publication History

Abstract

Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun to address this problem from fine-grained document modeling. Several works leveraged fine-grained passage-level relevance signals in ranking models. However, most of these works focus on context-independent passage-level relevance signals and ignore the context information, which may lead to inaccurate estimation of passage-level relevance. In this paper, we investigate how information gain accumulates with passages when users sequentially read a document. We propose the context-aware Passage-level Cumulative Gain (PCG), which aggregates relevance scores of passages and avoids the need to formally split a document into independent passages. Next, we incorporate the patterns of PCG into a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) to predict the PCG sequence. Finally, we apply PCGM to the document ranking task. Experimental results on two public ad hoc retrieval benchmark datasets show that PCGM outperforms most existing ranking models and also indicates the effectiveness of PCG signals. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.

References

[1]
Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.
[2]
James P. Callan. 1994. Passage-level Evidence in Document Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(Dublin, Ireland) (SIGIR ’94). Springer-Verlag New York, Inc., New York, NY, USA, 302–310. http://dl.acm.org/citation.cfm?id=188490.188589
[3]
Charles LA Clarke, Falk Scholer, and Ian Soboroff. 2005. The TREC 2005 Terabyte Track. In TREC.
[4]
Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19).
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[6]
Georges E. Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore) (SIGIR ’08). ACM, New York, NY, USA, 331–338. https://doi.org/10.1145/1390334.1390392
[7]
Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, Chengxiang Zhai, and Xueqi Cheng. 2018. Modeling Diverse Relevance Patterns in Ad-hoc Retrieval. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’18).
[8]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management(CIKM ’16).
[9]
Jiafeng Guo, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2019. MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR’19). ACM, New York, NY, USA, 1297–1300. https://doi.org/10.1145/3331184.3331403
[10]
Andrew F. Hayes and Klaus Krippendorff. 2007. Answering the Call for a Standard Reliability Measure for Coding Data. Communication Methods and Measures 1, 1 (2007), 77–89. https://doi.org/10.1080/19312450709336664
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[12]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042–2050.
[13]
Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. PACRR: A position-aware neural IR model for relevance matching. arXiv preprint arXiv:1704.03940(2017).
[14]
Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’00).
[15]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
[16]
Kalervo Järvelin, Susan L Price, Lois ML Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In European Conference on Information Retrieval. Springer, 4–15.
[17]
Marcin Kaszkiel and Justin Zobel. 2001. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology 52, 4 (2001), 344–364.
[18]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[19]
K Kong, R Luk, K Ho, and F Chung. 2004. Passage-based retrieval using parameterized fuzzy set operators. In ACM SIGIR Workshop on Mathematical/Formal. Methods for Information Retrieval.
[20]
Xiangsheng Li, Yiqun Liu, Jiaxin Mao, Zexue He, Min Zhang, and Shaoping Ma. 2018. Understanding Reading Attention Distribution During Relevance Judgement. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management(CIKM ’18).
[21]
Xiangsheng Li, Jiaxin Mao, Chao Wang, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19).
[22]
Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Now Publishers Inc., Hanover, MA, USA.
[23]
Xiaoyong Liu and W Bruce Croft. 2002. Passage retrieval based on language models. In Proceedings of the eleventh international conference on Information and knowledge management. ACM, 375–382.
[24]
Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, and Jingfang Xu. 2017. Overview of the ntcir-13 we want web task. Proc. NTCIR-13 (2017).
[25]
Cheng Luo, Yukun Zheng, Yiqun Liu, Xiaochuan Wang, Jingfang Xu, Min Zhang, and Shaoping Ma. 2017. SogouT-16: A New Web Corpus to Embrace IR Research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval(Shinjuku, Tokyo, Japan) (SIGIR ’17). ACM, New York, NY, USA, 1233–1236. https://doi.org/10.1145/3077136.3080694
[26]
Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, and Zhicheng Dou. 2019. Overview of the ntcir-14 we want web task. In Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies.
[27]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A deep investigation of deep ir models. arXiv preprint arXiv:1707.07700(2017).
[28]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Thirtieth AAAI Conference on Artificial Intelligence.
[29]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management(CIKM ’17).
[30]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 257–266.
[31]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).
[32]
Jay Michael Ponte and W Bruce Croft. 1998. A language modeling approach to information retrieval. Ph.D. Dissertation. University of Massachusetts at Amherst.
[33]
Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR abs/1306.2597(2013). arxiv:1306.2597http://arxiv.org/abs/1306.2597
[34]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’94). Springer, 232–241.
[35]
Tetsuya Sakai. 2004. New Performance Metrics Based on Multigrade Relevance: Their Application to Question Answering. In NTCIR.
[36]
Tetsuya Sakai and Ruihua Song. 2011. Evaluating Diversified Search Results Using Per-intent Graded Relevance. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (Beijing, China) (SIGIR ’11). ACM, New York, NY, USA, 1043–1052. https://doi.org/10.1145/2009916.2010055
[37]
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile) (SIGIR ’15). ACM, New York, NY, USA, 283–292. https://doi.org/10.1145/2766462.2767712
[38]
Zhijing Wu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19).
[39]
Wensi Xi, Richard Xu-Rong, Christopher SG Khoo, and Ee-Peng Lim. 2001. Incorporating window-based passage-level evidence in document retrieval. Journal of information science 27, 2 (2001), 73–80.
[40]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. ACM, 55–64.
[41]
Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).
[42]
Chengxiang Zhai and John Lafferty. 2017. A study of smoothing methods for language models applied to ad hoc information retrieval. In ACM SIGIR Forum, Vol. 51. ACM, 268–276.
[43]
Yukun Zheng, Zhumin Chu, Xiangsheng Li, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. [n.d.]. THUIR at the NTCIR-14 WWW-2 Task. ([n. d.]).
[44]
Yukun Zheng, Zhen Fan, Yiqun Liu, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Sogou-QCL: A New Dataset with Click Relevance Label. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR ’18). ACM, New York, NY, USA, 1117–1120. https://doi.org/10.1145/3209978.3210092

Cited By

View all
  • (2025)A Systematic Review of Fairness, Accountability, Transparency, and Ethics in Information RetrievalACM Computing Surveys10.1145/363721157:6(1-29)Online publication date: 10-Feb-2025
  • (2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
  • (2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
  • Show More Cited By

Index Terms

  1. Leveraging Passage-level Cumulative Gain for Document Ranking
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '20: Proceedings of The Web Conference 2020
      April 2020
      3143 pages
      ISBN:9781450370233
      DOI:10.1145/3366423
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 April 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Passage-level cumulative gain
      2. document ranking
      3. neural network

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      WWW '20
      Sponsor:
      WWW '20: The Web Conference 2020
      April 20 - 24, 2020
      Taipei, Taiwan

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A Systematic Review of Fairness, Accountability, Transparency, and Ethics in Information RetrievalACM Computing Surveys10.1145/363721157:6(1-29)Online publication date: 10-Feb-2025
      • (2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
      • (2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
      • (2024)A multi-dimensional semantic pseudo-relevance feedback framework for information retrievalScientific Reports10.1038/s41598-024-82871-014:1Online publication date: 30-Dec-2024
      • (2024)Event is more valuable than you thinkInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10372961:4Online publication date: 18-Jul-2024
      • (2024)Transformer based contextual text representation framework for intelligent information retrievalExpert Systems with Applications10.1016/j.eswa.2023.121629238(121629)Online publication date: Mar-2024
      • (2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
      • (2023)Boosting legal case retrieval by query content selection with large language modelsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625328(176-184)Online publication date: 26-Nov-2023
      • (2023)Extractive Explanations for Interpretable Text RankingACM Transactions on Information Systems10.1145/357692441:4(1-31)Online publication date: 23-Mar-2023
      • (2023)A Passage-Level Reading Behavior Model for Mobile SearchProceedings of the ACM Web Conference 202310.1145/3543507.3583343(3236-3246)Online publication date: 30-Apr-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media