research-article

Leveraging Passage-level Cumulative Gain for Document Ranking

Authors:

Shaoping MaAuthors Info & Claims

WWW '20: Proceedings of The Web Conference 2020

Pages 2421 - 2431

https://doi.org/10.1145/3366423.3380305

Published: 20 April 2020 Publication History

Abstract

Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun to address this problem from fine-grained document modeling. Several works leveraged fine-grained passage-level relevance signals in ranking models. However, most of these works focus on context-independent passage-level relevance signals and ignore the context information, which may lead to inaccurate estimation of passage-level relevance. In this paper, we investigate how information gain accumulates with passages when users sequentially read a document. We propose the context-aware Passage-level Cumulative Gain (PCG), which aggregates relevance scores of passages and avoids the need to formally split a document into independent passages. Next, we incorporate the patterns of PCG into a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) to predict the PCG sequence. Finally, we apply PCGM to the document ranking task. Experimental results on two public ad hoc retrieval benchmark datasets show that PCGM outperforms most existing ranking models and also indicates the effectiveness of PCG signals. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.

References

[1]

Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.

[2]

James P. Callan. 1994. Passage-level Evidence in Document Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(Dublin, Ireland) (SIGIR ’94). Springer-Verlag New York, Inc., New York, NY, USA, 302–310. http://dl.acm.org/citation.cfm?id=188490.188589

[3]

Charles LA Clarke, Falk Scholer, and Ian Soboroff. 2005. The TREC 2005 Terabyte Track. In TREC.

[4]

Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19).

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).

[6]

Georges E. Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore) (SIGIR ’08). ACM, New York, NY, USA, 331–338. https://doi.org/10.1145/1390334.1390392

Digital Library

[7]

Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, Chengxiang Zhai, and Xueqi Cheng. 2018. Modeling Diverse Relevance Patterns in Ad-hoc Retrieval. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’18).

Digital Library

[8]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management(CIKM ’16).

Digital Library

[9]

Jiafeng Guo, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2019. MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR’19). ACM, New York, NY, USA, 1297–1300. https://doi.org/10.1145/3331184.3331403

Digital Library

[10]

Andrew F. Hayes and Klaus Krippendorff. 2007. Answering the Call for a Standard Reliability Measure for Coding Data. Communication Methods and Measures 1, 1 (2007), 77–89. https://doi.org/10.1080/19312450709336664

[11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[12]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems. 2042–2050.

[13]

Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. PACRR: A position-aware neural IR model for relevance matching. arXiv preprint arXiv:1704.03940(2017).

[14]

Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’00).

Digital Library

[15]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.

Digital Library

[16]

Kalervo Järvelin, Susan L Price, Lois ML Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In European Conference on Information Retrieval. Springer, 4–15.

[17]

Marcin Kaszkiel and Justin Zobel. 2001. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology 52, 4 (2001), 344–364.

Digital Library

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[19]

K Kong, R Luk, K Ho, and F Chung. 2004. Passage-based retrieval using parameterized fuzzy set operators. In ACM SIGIR Workshop on Mathematical/Formal. Methods for Information Retrieval.

[20]

Xiangsheng Li, Yiqun Liu, Jiaxin Mao, Zexue He, Min Zhang, and Shaoping Ma. 2018. Understanding Reading Attention Distribution During Relevance Judgement. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management(CIKM ’18).

Digital Library

[21]

Xiangsheng Li, Jiaxin Mao, Chao Wang, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19).

Digital Library

[22]

Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Now Publishers Inc., Hanover, MA, USA.

Digital Library

[23]

Xiaoyong Liu and W Bruce Croft. 2002. Passage retrieval based on language models. In Proceedings of the eleventh international conference on Information and knowledge management. ACM, 375–382.

Digital Library

[24]

Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, and Jingfang Xu. 2017. Overview of the ntcir-13 we want web task. Proc. NTCIR-13 (2017).

[25]

Cheng Luo, Yukun Zheng, Yiqun Liu, Xiaochuan Wang, Jingfang Xu, Min Zhang, and Shaoping Ma. 2017. SogouT-16: A New Web Corpus to Embrace IR Research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval(Shinjuku, Tokyo, Japan) (SIGIR ’17). ACM, New York, NY, USA, 1233–1236. https://doi.org/10.1145/3077136.3080694

Digital Library

[26]

Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, and Zhicheng Dou. 2019. Overview of the ntcir-14 we want web task. In Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies.

[27]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A deep investigation of deep ir models. arXiv preprint arXiv:1707.07700(2017).

[28]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Thirtieth AAAI Conference on Artificial Intelligence.

Digital Library

[29]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management(CIKM ’17).

Digital Library

[30]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 257–266.

Digital Library

[31]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).

[32]

Jay Michael Ponte and W Bruce Croft. 1998. A language modeling approach to information retrieval. Ph.D. Dissertation. University of Massachusetts at Amherst.

Digital Library

[33]

Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR abs/1306.2597(2013). arxiv:1306.2597http://arxiv.org/abs/1306.2597

[34]

Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’94). Springer, 232–241.

[35]

Tetsuya Sakai. 2004. New Performance Metrics Based on Multigrade Relevance: Their Application to Question Answering. In NTCIR.

[36]

Tetsuya Sakai and Ruihua Song. 2011. Evaluating Diversified Search Results Using Per-intent Graded Relevance. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (Beijing, China) (SIGIR ’11). ACM, New York, NY, USA, 1043–1052. https://doi.org/10.1145/2009916.2010055

Digital Library

[37]

Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile) (SIGIR ’15). ACM, New York, NY, USA, 283–292. https://doi.org/10.1145/2766462.2767712

Digital Library

[38]

Zhijing Wu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19).

Digital Library

[39]

Wensi Xi, Richard Xu-Rong, Christopher SG Khoo, and Ee-Peng Lim. 2001. Incorporating window-based passage-level evidence in document retrieval. Journal of information science 27, 2 (2001), 73–80.

[40]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. ACM, 55–64.

Digital Library

[41]

Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).

[42]

Chengxiang Zhai and John Lafferty. 2017. A study of smoothing methods for language models applied to ad hoc information retrieval. In ACM SIGIR Forum, Vol. 51. ACM, 268–276.

[43]

Yukun Zheng, Zhumin Chu, Xiangsheng Li, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. [n.d.]. THUIR at the NTCIR-14 WWW-2 Task. ([n. d.]).

[44]

Yukun Zheng, Zhen Fan, Yiqun Liu, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Sogou-QCL: A New Dataset with Click Relevance Label. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR ’18). ACM, New York, NY, USA, 1117–1120. https://doi.org/10.1145/3209978.3210092

Digital Library

Cited By

Bernard NBalog K(2025)A Systematic Review of Fairness, Accountability, Transparency, and Ethics in Information RetrievalACM Computing Surveys10.1145/363721157:6(1-29)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3637211
Su ZDou ZZhu YWen J(2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3653672
Wang JHuang JTu XWang JHuang ALaskar MBhuiyan A(2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3648471
Show More Cited By

Index Terms

Leveraging Passage-level Cumulative Gain for Document Ranking
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval models and ranking

Index terms have been assigned to the content through auto-classification.

Recommendations

Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking
Abstract
Document ranking is one of the most studied but challenging problems in information retrieval (IR). More and more studies have begun to address this problem from fine-grained document modeling. However, most of them focus on context-independent ...
Context-sensitive document ranking

Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides ...
Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

The configuration of 512 window size prevents transformers from being directly applicable to document ranking that requires larger context. Hence, recent works propose to estimate document relevance with fine-grained passage-level relevance signals. A ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Proceedings of The Web Conference 2020

April 2020

3143 pages

ISBN:9781450370233

DOI:10.1145/3366423

Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
494
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bernard NBalog K(2025)A Systematic Review of Fairness, Accountability, Transparency, and Ethics in Information RetrievalACM Computing Surveys10.1145/363721157:6(1-29)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3637211
Su ZDou ZZhu YWen J(2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3653672
Wang JHuang JTu XWang JHuang ALaskar MBhuiyan A(2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3648471
Pan MLiu YChen JHuang EHuang J(2024)A multi-dimensional semantic pseudo-relevance feedback framework for information retrievalScientific Reports10.1038/s41598-024-82871-014:1Online publication date: 30-Dec-2024
https://doi.org/10.1038/s41598-024-82871-0
Zhang YZhai SMeng YBi SChen YQi G(2024)Event is more valuable than you thinkInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10372961:4Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1016/j.ipm.2024.103729
Bhopale ATiwari A(2024)Transformer based contextual text representation framework for intelligent information retrievalExpert Systems with Applications10.1016/j.eswa.2023.121629238(121629)Online publication date: Mar-2024
https://doi.org/10.1016/j.eswa.2023.121629
Sidi MGunal S(2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
https://doi.org/10.3390/app131810285
Zhou YHuang HWu Z(2023)Boosting legal case retrieval by query content selection with large language modelsProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625328(176-184)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625328
Leonhardt JRudra KAnand A(2023)Extractive Explanations for Interpretable Text RankingACM Transactions on Information Systems10.1145/357692441:4(1-31)Online publication date: 23-Mar-2023
https://dl.acm.org/doi/10.1145/3576924
Wu ZMao JXu KSong DHuang H(2023)A Passage-Level Reading Behavior Model for Mobile SearchProceedings of the ACM Web Conference 202310.1145/3543507.3583343(3236-3246)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583343
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten