research-article

Ranking Documents by Answer-Passage Quality

Authors:

Ruey-Cheng Chen,

W. Bruce Croft,

Mark SandersonAuthors Info & Claims

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 335 - 344

https://doi.org/10.1145/3209978.3210028

Published: 27 June 2018 Publication History

Abstract

Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this evidence is integrated into ranking models. Our experiments on two web collections show that such quality estimates from answer passages provide a strong indication of document relevance and compare favorably to previous passage-based methods. Combining such evidence can significantly improve over a set of state-of-the-art ranking models, including Quality-Biased Ranking, External Expansion, and a combination of both. A final ranking model that incorporates all quality estimates achieves further improvements on both collections.

References

[1]

Eugene Agichtein, Eric Brill, and Susan Dumais . 2006. Improving web search ranking by incorporating user behavior information Proc. of SIGIR. ACM, 19--26.

Digital Library

[2]

Gianni Amati and Cornelis Joost van Rijsbergen . 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 357--389.

Digital Library

[3]

Michael Bendersky, W. Bruce Croft, and Yanlei Diao . 2011. Quality-biased Ranking of Web Documents. In Proc. of WSDM. ACM, 95--104.

Digital Library

[4]

Michael Bendersky and Oren Kurland . 2008. Utilizing passage-based language models for document retrieval Proc. of ECIR. Springer, 162--174.

Digital Library

[5]

Michael Bendersky, Donald Metzler, and W. Bruce Croft . 2010. Learning Concept Importance Using a Weighted Dependence Model Proc. of WSDM. ACM, 31--40.

Digital Library

[6]

Jiang Bian, Yandong Liu, Eugene Agichtein, and Hongyuan Zha . 2008. Finding the right facts in the crowd: factoid question answering over social media Proc. of WWW. ACM, 467--476.

Digital Library

[7]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov . 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).

[8]

James P. Callan . 1994. Passage-level Evidence in Document Retrieval Proc. of SIGIR. Springer-Verlag New York, Inc., 302--310.

Digital Library

[9]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes . 2017. Reading Wikipedia to Answer Open-Domain Questions. Proc. of ACL. Association for Computational Linguistics, 1870--1879.

[10]

Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke . 2011. Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets. Inf. Retr., Vol. 14, 5 (Oct. . 2011), 441--465.

Digital Library

[11]

W Bruce Croft . 2002. Combining approaches to information retrieval. Proc. of ECIR. Springer, 1--36.

[12]

Fernando Diaz and Donald Metzler . 2006. Improving the Estimation of Relevance Models Using Large External Corpora Proc. of SIGIR. ACM, 154--161.

Digital Library

[13]

Dan Gillick and Benoit Favre . 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing. Association for Computational Linguistics, 10--18.

Digital Library

[14]

Jing He, Pablo Duboue, and Jian-Yun Nie . 2012. Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation. In Proc. of COLING. 1129--1146.

[15]

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom . 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693--1701.

Digital Library

[16]

Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446.

Digital Library

[17]

Mostafa Keikha, Jae Hyun Park, and W Bruce Croft . 2014. Evaluating answer passages using summarization measures Proc. of SIGIR. ACM, 963--966.

Digital Library

[18]

Diederik P Kingma and Jimmy Ba . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[19]

Eyal Krikon and Oren Kurland . 2011. A study of the integration of passage-, document-, and cluster-based information for re-ranking search results. Information Retrieval Vol. 14, 6 (2011), 593--616.

Digital Library

[20]

Oren Kurland and Lillian Lee . 2010. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM TOIS, Vol. 28, 4 (2010), 18.

Digital Library

[21]

Saar Kuzi, Anna Shtok, and Oren Kurland . 2016. Query expansion using word embeddings. In Proc. of CIKM. ACM, 1929--1932.

Digital Library

[22]

Adenike M. Lam-Adesina and Gareth J. F. Jones . 2001. Applying Summarization Techniques for Term Selection in Relevance Feedback Proc. of SIGIR. ACM, 1--9.

Digital Library

[23]

Victor Lavrenko and W Bruce Croft . 2001. Relevance based language models. In Proc. of SIGIR. ACM, 120--127.

Digital Library

[24]

Hui Lin and Jeff Bilmes . 2010. Multi-document summarization via budgeted maximization of submodular functions Proc. of HLT/NAACL. Association for Computational Linguistics, 912--920.

Digital Library

[25]

Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, and Idan Szpektor . 2011. Predicting web searcher satisfaction with existing community-based answers Proc. of SIGIR. ACM, 415--424.

Digital Library

[26]

Yandong Liu, Jiang Bian, and Eugene Agichtein . 2008. Predicting information seeker satisfaction in community question answering Proc. of SIGIR. ACM, 483--490.

Digital Library

[27]

Craig Macdonald, Rodrygo L.T. Santos, and Iadh Ounis . 2012. On the Usefulness of Query Features for Learning to Rank Proc. of CIKM. ACM, 2559--2562.

Digital Library

[28]

Edgar Meij and Maarten de Rijke . 2010. Supervised query modeling using wikipedia. In Proc. of SIGIR. ACM, 875--876.

Digital Library

[29]

Donald Metzler and W. Bruce Croft . 2005. A Markov Random Field Model for Term Dependencies Proc. of SIGIR. ACM, 472--479.

Digital Library

[30]

Donald Metzler and Tapas Kanungo . 2008. Machine Learned Sentence Selection Strategies for Query-Biased Summarization. In SIGIR Learning to Rank Workshop.

[31]

Bhaskar Mitra and Nick Craswell . 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1705.01509 (2017).

[32]

Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. Vol. 27, 1 (2008), 2.

Digital Library

[33]

John O'Connor . 1980. Answer-passage retrieval by text searching. Journal of the Association for Information Science and Technology, Vol. 31, 4 (1980), 227--239.

[34]

Jay M Ponte and W Bruce Croft . 1998. A language modeling approach to information retrieval Proc. of SIGIR. ACM, 275--281.

Digital Library

[35]

Dragomir Radev, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda cCelebi, Stanko Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, and Zhu Zhang . 2004. MEAD -- A platform for multidocument multilingual text summarization Proc. of LREC.

[36]

Fiana Raiber and Oren Kurland . 2013. Ranking document clusters using markov random fields Proc. of SIGIR. ACM, 333--342.

Digital Library

[37]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang . 2016. Squad: 100,000

[38]

questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016).

[39]

Stephen E Robertson . 1997. Overview of the okapi projects. Journal of Documentation Vol. 53, 1 (1997), 3--7.

[40]

Joseph John Rocchio . 1971. Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing (1971), 313--323.

[41]

Tetsuya Sakai and Karen Sparck-Jones . 2001. Generic Summaries for Indexing in Information Retrieval Proc. of SIGIR. ACM, 190--198.

Digital Library

[42]

Chirag Shah and Jefferey Pomerantz . 2010. Evaluating and predicting answer quality in community QA Proc. of SIGIR. ACM, 411--418.

Digital Library

[43]

Hiroya Takamura and Manabu Okumura . 2009. Text summarization model based on maximum coverage problem and its variant Proc. of EACL. Association for Computational Linguistics, 781--789.

Digital Library

[44]

Anastasios Tombros and Mark Sanderson . 1998. Advantages of Query Biased Summaries in Information Retrieval Proc. of SIGIR. ACM, 2--10.

Digital Library

[45]

Ingmar Weber, Antti Ukkonen, and Aris Gionis . 2012. Answers, not links: extracting tips from yahoo! answers to address how-to web queries Proc. of WSDM. ACM, 613--622.

Digital Library

[46]

Wouter Weerkamp, Krisztian Balog, and Maarten de Rijke . 2012. Exploiting External Collections for Query Expansion. ACM Trans. Web, Vol. 6, 4 (2012), 1--29.

Digital Library

[47]

Ross Wilkinson . 1994. Effective Retrieval of Structured Documents. Proc. of SIGIR. Springer-Verlag New York, Inc., 311--317.

Digital Library

[48]

Kristian Woodsend and Mirella Lapata . 2012. Multiple aspect summarization using integer linear programming Proc. of EMNLP. Association for Computational Linguistics, 233--243.

Digital Library

[49]

Chenyan Xiong, Jamie Callan, and Tie-Yan Liu . 2017. Word-Entity Duet Representations for Document Ranking Proc. of SIGIR. ACM, 763--772.

Digital Library

[50]

Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft . 2008. Retrieval models for question and answer archives. Proc. of SIGIR. ACM, 475--482.

Digital Library

[51]

Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, and Juanzi Li . 2011. Social context summarization. In Proc. of SIGIR. ACM, 255--264.

Digital Library

[52]

Evi Yulianti, Ruey-Cheng Chen, Falk Scholer, W. Bruce Croft, and Mark Sanderson . 2018. Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data Eng. Vol. 30, 1 (2018), 15--28.

[53]

Hamed Zamani and W Bruce Croft . 2016. Embedding-based query language models. In Proc. of ICTIR. ACM, 147--156.

Digital Library

[54]

Chengxiang Zhai and John Lafferty . 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. Vol. 22, 2 (2004), 179--214.

Digital Library

Cited By

Banjar AShaheen AAmjad TAlharbey RDaud A(2024)Users’ satisfaction based ranking for Yahoo AnswersMultimedia Tools and Applications10.1007/s11042-024-18433-383:28(71265-71284)Online publication date: 7-Feb-2024
https://doi.org/10.1007/s11042-024-18433-3
Roy P(2023)Predicting answer acceptability for question-answering systemInternational Journal on Digital Libraries10.1007/s00799-023-00357-225:4(555-568)Online publication date: 5-May-2023
https://doi.org/10.1007/s00799-023-00357-2
Amancio LDorneles CDalip D(2021)Recency and quality-based ranking question in CQAsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10255258:4Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1016/j.ipm.2021.102552
Show More Cited By

Index Terms

Ranking Documents by Answer-Passage Quality
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Leveraging Passage-level Cumulative Gain for Document Ranking
WWW '20: Proceedings of The Web Conference 2020

Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun ...
Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

The configuration of 512 window size prevents transformers from being directly applicable to document ranking that requires larger context. Hence, recent works propose to estimate document relevance with fine-grained passage-level relevance signals. A ...
Context-sensitive document ranking

Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

June 2018

1509 pages

ISBN:9781450356572

DOI:10.1145/3209978

General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Indonesia Endowment Fund for Education
Australian Research Council

Conference

SIGIR '18

Sponsor:

SIGIR

SIGIR '18: The 41st International ACM SIGIR conference on research and development in Information Retrieval

July 8 - 12, 2018

MI, Ann Arbor, USA

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
455
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Banjar AShaheen AAmjad TAlharbey RDaud A(2024)Users’ satisfaction based ranking for Yahoo AnswersMultimedia Tools and Applications10.1007/s11042-024-18433-383:28(71265-71284)Online publication date: 7-Feb-2024
https://doi.org/10.1007/s11042-024-18433-3
Roy P(2023)Predicting answer acceptability for question-answering systemInternational Journal on Digital Libraries10.1007/s00799-023-00357-225:4(555-568)Online publication date: 5-May-2023
https://doi.org/10.1007/s00799-023-00357-2
Amancio LDorneles CDalip D(2021)Recency and quality-based ranking question in CQAsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10255258:4Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1016/j.ipm.2021.102552
Roul RSahoo J(2020)A novel approach for ranking web documents based on query-optimized personalized pagerankInternational Journal of Data Science and Analytics10.1007/s41060-020-00232-211:1(37-55)Online publication date: 18-Aug-2020
https://doi.org/10.1007/s41060-020-00232-2
Sheetrit EShtok AKurland O(2020)A passage-based approach to learning to rank documentsInformation Retrieval Journal10.1007/s10791-020-09369-x23:2(159-186)Online publication date: 6-Mar-2020
https://doi.org/10.1007/s10791-020-09369-x
Xie WWong LLee LAu OHao T(2020)A Semantic Expansion-Based Joint Model for Answer Ranking in Chinese Question Answering SystemsInformation Retrieval Technology10.1007/978-3-030-42835-8_3(22-33)Online publication date: 27-Feb-2020
https://doi.org/10.1007/978-3-030-42835-8_3
Lin DTang JPang KLi SWang T(2019)Selecting Paragraphs to Answer Questions for Multi-passage Machine Reading ComprehensionInformation Retrieval10.1007/978-3-030-31624-2_10(121-132)Online publication date: 18-Sep-2019
https://doi.org/10.1007/978-3-030-31624-2_10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents