Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking

Wu, Zhi-Jing; Liu, Yi-Qun; Mao, Jia-Xin; Zhang, Min; Ma, Shao-Ping

doi:10.1007/s11390-022-2031-y

Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking

Regular Paper
Published: 30 July 2022

Volume 37, pages 814–838, (2022)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zhi-Jing Wu^1,2,
Yi-Qun Liu^1,2,
Jia-Xin Mao³,
Min Zhang^1,2 &
…
Shao-Ping Ma^1,2

193 Accesses
2 Citations
Explore all metrics

Abstract

Document ranking is one of the most studied but challenging problems in information retrieval (IR). More and more studies have begun to address this problem from fine-grained document modeling. However, most of them focus on context-independent passage-level relevance signals and ignore the context information. In this paper, we investigate how information gain accumulates with passages and propose the context-aware Passage Cumulative Gain (PCG). The fine-grained PCG avoids the need to split documents into independent passages. We investigate PCG patterns at the document level (DPCG) and the query level (QPCG). Based on the patterns, we propose a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) and show that PCGM can effectively predict PCG sequences. Finally, we apply PCGM to the document ranking task using two approaches. The first one is leveraging DPCG sequences to estimate the gain of an individual document. Experimental results on two public ad hoc retrieval datasets show that PCGM outperforms most existing ranking models. The second one considers the cross-document effects and leverages QPCG sequences to estimate the marginal relevance. Experimental results show that predicted results are highly consistent with users’ preferences. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scientific paper recommendation systems: a literature review of recent publications

Article Open access 05 October 2022

A survey on neural topic models: methods, applications, and challenges

Article Open access 25 January 2024

Research-paper recommender systems: a literature survey

Article 26 July 2015

References

Robertson S E, Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proc. the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Jul. 1994, pp.232-241. DOI: https://doi.org/10.1007/978-1-4471-2099-5_24.
Ponte J M. A language modeling approach to information retrieval [Ph.D. Thesis]. University of Massachusetts, 1998.
Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval. ACM SIGIR Forum, 2017, 51(2): 268-276. DOI: https://doi.org/10.1145/3130348.3130377.
Article Google Scholar
Burges C J. From RankNet to LambdaRank to LambdaMART: An overview. Technical Report, MSR-TR-2010-82, Microsoft, 2010. https://www.microsoft.com/enus/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf, Apr. 2022.
Liu T. Learning to Rank for Information Retrieval. Springer, 2011. DOI: https://doi.org/10.1007/978-3-642-14267-3.
Pang L, Lan Y, Guo J, Xu J, Cheng X. A deep investigation of deep IR models. arXiv:1707.07700, 2017. https://arxiv.org/abs/1707.07700, May 2022.
Clarke C L, Scholer F, Soboroff I. The TREC 2005 terabyte track. In Proc. the 14th Text Retrieval Conference, Nov. 2005.
Callan J P. Passage-level evidence in document retrieval. In Proc. the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, July 1994, pp.302-310. DOI: 10.1007/978-1-4471-2099-5 31.
Kaszkiel M, Zobel J. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology, 2001, 52(4): 344-364. DOI: https://doi.org/10.1002/1532-2890(2000)9999:9999<::AIDASI1075>3.0.CO;2-%23.
Article Google Scholar
Xi W, Xu R R, Khoo C S, Lim E P. Incorporating windowbased passage-level evidence in document retrieval. Journal of Information Science, 2001, 27(2): 73-80. DOI: https://doi.org/10.1177/016555150102700202.
Article Google Scholar
Dai Z, Callan J. Deeper text understanding for IR with contextual neural language modeling. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp.985-988. DOI: 10.1145/3331184.3331303.
Wu Z, Mao J, Liu Y, Zhang M, Ma S. Investigating passage-level relevance and its role in documentlevel relevance judgment. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp.605-614. DOI: 10.1145/3331184.3331233.
Fan Y, Guo J, Lan Y, Xu J, Zhai C, Cheng X. Modeling diverse relevance patterns in ad-hoc retrieval. In Proc. the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2018, pp.375-384. DOI: 10.1145/3209978.3209980.
Pang L, Lan Y, Guo J, Xu J, Xu J, Cheng X. DeepRank: A new deep architecture for relevance ranking in information retrieval. In Proc. the 2017 ACM Conference on Information and Knowledge Management, Nov. 2017, pp.257-266. DOI: 10.1145/3132847.3132914.
Li X, Liu Y, Mao J, He Z, Zhang M, Ma S. Understanding reading attention distribution during relevance judgement. In Proc. the 27th ACM International Conference on Information and Knowledge Management, Oct. 2018, pp.733-742. DOI: 10.1145/3269206.3271764.
Järvelin K, Kekäläinen J. IR evaluation methods for retrieving highly relevant documents. In Proc. the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2000, pp.41-48. DOI: 10.1145/345508.345545.
Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422-446. DOI: https://doi.org/10.1145/582415.582418.
Article Google Scholar
Järvelin K, Price S L, Delcambre L M, Nielsen M L. Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proc. the 30th European Conference on Information Retrieval Research, March 30-April 3, 2008, pp.4-15. DOI: 10.1007/978-3-540-78646-7_4.
Carbonell J, Goldstein J. The use of MMR, diversitybased reranking for reordering documents and producing summaries. In Proc. the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 1998, pp.335-336. DOI: 10.1145/290941.291025.
Liu M, Liu Y, Mao J, Luo C, Zhang M, Ma S. \Satisfaction with failure" or \unsatisfied success": Investigating the relationship between search success and user satisfaction. In Proc. the 2018 World Wide Web Conference, Apr. 2018, pp.1533-1542. DOI: 10.1145/3178876.3186065.
Devlin J, Chang M, Lee K, Toutanova K. BERT: Pretraining of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2019, pp.4171-4186. DOI: 10.18653/v1/N19-1423.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
Wu Z, Mao J, Liu Y, Zhan J, Zheng Y, Zhang M, Ma S. Leveraging passage-level cumulative gain for document ranking. In Proc. the Web Conference 2020, Apr. 2020, pp.2421-2431. DOI: https://doi.org/10.1145/3366423.3380305.
Liu X, CroftWB. Passage retrieval based on language models. In Proc. the 2002 ACM CIKM International Conference on Information and Knowledge Management, Nov. 2002, pp.375-382. DOI: 10.1145/584792.584854.
Wu Z, Mao J, Liu Y, Zhang M, Ma S. Investigating reading behavior in fine-grained relevance judgment. In Proc. the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2020, pp.1889-1892. DOI: 10.1145/3397271.3401305.
Hearst M A, Plaunt C. Subtopic structuring for full-length document access. In Proc. the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, June 27-July 1, 1993, pp.59-68. DOI: 10.1145/160688.160695.
Salton G, Allan J, Buckley C. Approaches to passage retrieval in full text information systems. In Proc. the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, June 27-July 1, 1993, pp.49-58. DOI: 10.1145/160688.160693.
Hui K, Yates A, Berberich K, De Melo G. PACCR: A position-aware neural IR model for relevance matching. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, Sept. 2017, pp.1049-1058. DOI: 10.18653/v1/D17-1110.
Hu B, Lu Z, Li H, Chen Q. Convolutional neural network architectures for matching natural language sentences. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2042-2050.
Guo J, Fan Y, Ai Q, Croft W B. A deep relevance matching model for ad-hoc retrieval. In Proc. the 25th ACM International Conference on Information and Knowledge Management, Oct. 2016, pp.55-64. DOI: 10.1145/2983323.2983769.
Pang L, Lan Y, Guo J, Xu J, Wan S, Cheng X. Text matching as image recognition. In Proc. the 30th AAAI Conference on Artificial Intelligence, Feb. 2016, pp.2793-2799.
Xiong C, Dai Z, Callan J, Liu Z, Power R. End-to-end neural ad-hoc ranking with kernel pooling. In Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2017, pp.55-64. DOI: 10.1145/3077136.3080809.
Li X, Mao J, Wang C, Liu Y, Zhang M, Ma S. Teach machine how to read: Reading behavior inspired relevance estimation. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp.795-804. DOI: 10.1145/3331184.3331205.
Robertson S E. The probability ranking principle in IR. In Readings in Information Retrieval, Jones K S, Willett P (eds.), Morgan Kaufmann Publishers Inc., 1997, pp.281-286.
Goffman W. A searching procedure for information retrieval. Information Storage and Retrieval, 1964, 2: 73-78. DOI: https://doi.org/10.1016/0020-0271(64)90006-3.
Article MATH Google Scholar
Fuhr N. A probability ranking principle for interactive information retrieval. Information Retrieval, 2008, 11(3): 251-265. DOI: https://doi.org/10.1007/s10791-008-9045-0.
Article Google Scholar
Zuccon G, Azzopardi L A, Van Rijsbergen K. The quantum probability ranking principle for information retrieval. In Proc. the 2nd Conference on the Theory of Information Retrieval, Sept. 2009, pp.232-240. DOI: 10.1007/978-3-642-04417-5_21.
Chen H, Karger D R. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2006, pp.429-436. DOI: 10.1145/1148170.1148245.
Hayes A F, Krippendorff K. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 2007, 1(1): 77-89. DOI: https://doi.org/10.1080/19312450709336664.
Article Google Scholar
Roitero K, Maddalena E, Demartini G, Mizzaro S. On fine-grained relevance scales. In Proc. the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2018, pp.675-684. DOI: 10.1145/3209978.3210052.
Sarkar P, Pillai J S. User expectations of augmented reality experience in Indian school education. In Proc. the 7th International Conference on Research into Design, Jan. 2019, pp.745-755. DOI: 10.1007/978-981-13-5977-4 63.
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. https://arxiv.org/abs/14-12.6980, May 2022.
Sakai T, Song R. Evaluating diversified search results using per-intent graded relevance. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2011, pp.1043-1052. DOI: 10.1145/2009916.2010055.
Luo C, Zheng Y, Liu Y, Wang X, Xu J, Zhang M, Ma S. SogouT-16: A new web corpus to embrace IR research. In Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2017, pp.1233-1236. DOI: https://doi.org/10.1145/3077136.3080694.
Guo J, Fan Y, Ji X, Cheng X. MatchZoo: A learning, practicing, and developing system for neural text matching. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp.1297-1300. DOI: 10.1145/3331184.3331403.
Zheng Y, Fan Z, Liu Y, Luo C, Zhang M, Ma S. Sogou-QCL: A new dataset with click relevance label. In Proc. the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2018, pp.1117-1120. DOI: 10.1145/3209978.3210092.
Wang C, Liu Y, Wang M, Zhou K, Nie J, Ma S. Incorporating non-sequential behavior into click models. In Proc. the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2015, pp.283-292. DOI: 10.1145/2766462.2767712.
Dupret G E, Piwowarski B. A user browsing model to predict search engine click data from past observations. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2008, pp.331-338. DOI: 10.1145/1390334.1390392.
Zheng Y, Chu Z, Li X, Mao J, Liu Y, Zhang M, Ma S. THUIR at the NTCIR-14 WWW-2 task. In Proc. the 14th International Conference on NII Testbeds and Community for Information Access Research, Jun. 2019, pp.165-179. DOI: 10.1007/978-3-030-36805-0_13.
Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/12-12.5701, May 2022.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Zhi-Jing Wu, Yi-Qun Liu, Min Zhang & Shao-Ping Ma
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, 100084, China
Zhi-Jing Wu, Yi-Qun Liu, Min Zhang & Shao-Ping Ma
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, 100084, China
Jia-Xin Mao

Authors

Zhi-Jing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Qun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Xin Mao
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shao-Ping Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi-Qun Liu.

Supplementary Information

ESM 1

(PDF 314 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, ZJ., Liu, YQ., Mao, JX. et al. Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking. J. Comput. Sci. Technol. 37, 814–838 (2022). https://doi.org/10.1007/s11390-022-2031-y

Download citation

Received: 19 November 2021
Accepted: 29 June 2022
Published: 30 July 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11390-022-2031-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking

Abstract

Access this article

Similar content being viewed by others

Scientific paper recommendation systems: a literature review of recent publications

A survey on neural topic models: methods, applications, and challenges

Research-paper recommender systems: a literature survey

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking

Abstract

Access this article

Similar content being viewed by others

Scientific paper recommendation systems: a literature review of recent publications

A survey on neural topic models: methods, applications, and challenges

Research-paper recommender systems: a literature survey

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation