skip to main content
10.1145/3209978.3210028acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Ranking Documents by Answer-Passage Quality

Published: 27 June 2018 Publication History

Abstract

Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this evidence is integrated into ranking models. Our experiments on two web collections show that such quality estimates from answer passages provide a strong indication of document relevance and compare favorably to previous passage-based methods. Combining such evidence can significantly improve over a set of state-of-the-art ranking models, including Quality-Biased Ranking, External Expansion, and a combination of both. A final ranking model that incorporates all quality estimates achieves further improvements on both collections.

References

[1]
Eugene Agichtein, Eric Brill, and Susan Dumais . 2006. Improving web search ranking by incorporating user behavior information Proc. of SIGIR. ACM, 19--26.
[2]
Gianni Amati and Cornelis Joost van Rijsbergen . 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 357--389.
[3]
Michael Bendersky, W. Bruce Croft, and Yanlei Diao . 2011. Quality-biased Ranking of Web Documents. In Proc. of WSDM. ACM, 95--104.
[4]
Michael Bendersky and Oren Kurland . 2008. Utilizing passage-based language models for document retrieval Proc. of ECIR. Springer, 162--174.
[5]
Michael Bendersky, Donald Metzler, and W. Bruce Croft . 2010. Learning Concept Importance Using a Weighted Dependence Model Proc. of WSDM. ACM, 31--40.
[6]
Jiang Bian, Yandong Liu, Eugene Agichtein, and Hongyuan Zha . 2008. Finding the right facts in the crowd: factoid question answering over social media Proc. of WWW. ACM, 467--476.
[7]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov . 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).
[8]
James P. Callan . 1994. Passage-level Evidence in Document Retrieval Proc. of SIGIR. Springer-Verlag New York, Inc., 302--310.
[9]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes . 2017. Reading Wikipedia to Answer Open-Domain Questions. Proc. of ACL. Association for Computational Linguistics, 1870--1879.
[10]
Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke . 2011. Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets. Inf. Retr., Vol. 14, 5 (Oct. . 2011), 441--465.
[11]
W Bruce Croft . 2002. Combining approaches to information retrieval. Proc. of ECIR. Springer, 1--36.
[12]
Fernando Diaz and Donald Metzler . 2006. Improving the Estimation of Relevance Models Using Large External Corpora Proc. of SIGIR. ACM, 154--161.
[13]
Dan Gillick and Benoit Favre . 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing. Association for Computational Linguistics, 10--18.
[14]
Jing He, Pablo Duboue, and Jian-Yun Nie . 2012. Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation. In Proc. of COLING. 1129--1146.
[15]
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom . 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693--1701.
[16]
Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. Vol. 20, 4 (2002), 422--446.
[17]
Mostafa Keikha, Jae Hyun Park, and W Bruce Croft . 2014. Evaluating answer passages using summarization measures Proc. of SIGIR. ACM, 963--966.
[18]
Diederik P Kingma and Jimmy Ba . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[19]
Eyal Krikon and Oren Kurland . 2011. A study of the integration of passage-, document-, and cluster-based information for re-ranking search results. Information Retrieval Vol. 14, 6 (2011), 593--616.
[20]
Oren Kurland and Lillian Lee . 2010. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM TOIS, Vol. 28, 4 (2010), 18.
[21]
Saar Kuzi, Anna Shtok, and Oren Kurland . 2016. Query expansion using word embeddings. In Proc. of CIKM. ACM, 1929--1932.
[22]
Adenike M. Lam-Adesina and Gareth J. F. Jones . 2001. Applying Summarization Techniques for Term Selection in Relevance Feedback Proc. of SIGIR. ACM, 1--9.
[23]
Victor Lavrenko and W Bruce Croft . 2001. Relevance based language models. In Proc. of SIGIR. ACM, 120--127.
[24]
Hui Lin and Jeff Bilmes . 2010. Multi-document summarization via budgeted maximization of submodular functions Proc. of HLT/NAACL. Association for Computational Linguistics, 912--920.
[25]
Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, and Idan Szpektor . 2011. Predicting web searcher satisfaction with existing community-based answers Proc. of SIGIR. ACM, 415--424.
[26]
Yandong Liu, Jiang Bian, and Eugene Agichtein . 2008. Predicting information seeker satisfaction in community question answering Proc. of SIGIR. ACM, 483--490.
[27]
Craig Macdonald, Rodrygo L.T. Santos, and Iadh Ounis . 2012. On the Usefulness of Query Features for Learning to Rank Proc. of CIKM. ACM, 2559--2562.
[28]
Edgar Meij and Maarten de Rijke . 2010. Supervised query modeling using wikipedia. In Proc. of SIGIR. ACM, 875--876.
[29]
Donald Metzler and W. Bruce Croft . 2005. A Markov Random Field Model for Term Dependencies Proc. of SIGIR. ACM, 472--479.
[30]
Donald Metzler and Tapas Kanungo . 2008. Machine Learned Sentence Selection Strategies for Query-Biased Summarization. In SIGIR Learning to Rank Workshop.
[31]
Bhaskar Mitra and Nick Craswell . 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1705.01509 (2017).
[32]
Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. Vol. 27, 1 (2008), 2.
[33]
John O'Connor . 1980. Answer-passage retrieval by text searching. Journal of the Association for Information Science and Technology, Vol. 31, 4 (1980), 227--239.
[34]
Jay M Ponte and W Bruce Croft . 1998. A language modeling approach to information retrieval Proc. of SIGIR. ACM, 275--281.
[35]
Dragomir Radev, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda cCelebi, Stanko Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, and Zhu Zhang . 2004. MEAD -- A platform for multidocument multilingual text summarization Proc. of LREC.
[36]
Fiana Raiber and Oren Kurland . 2013. Ranking document clusters using markov random fields Proc. of SIGIR. ACM, 333--342.
[37]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang . 2016. Squad: 100,000
[38]
questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016).
[39]
Stephen E Robertson . 1997. Overview of the okapi projects. Journal of Documentation Vol. 53, 1 (1997), 3--7.
[40]
Joseph John Rocchio . 1971. Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing (1971), 313--323.
[41]
Tetsuya Sakai and Karen Sparck-Jones . 2001. Generic Summaries for Indexing in Information Retrieval Proc. of SIGIR. ACM, 190--198.
[42]
Chirag Shah and Jefferey Pomerantz . 2010. Evaluating and predicting answer quality in community QA Proc. of SIGIR. ACM, 411--418.
[43]
Hiroya Takamura and Manabu Okumura . 2009. Text summarization model based on maximum coverage problem and its variant Proc. of EACL. Association for Computational Linguistics, 781--789.
[44]
Anastasios Tombros and Mark Sanderson . 1998. Advantages of Query Biased Summaries in Information Retrieval Proc. of SIGIR. ACM, 2--10.
[45]
Ingmar Weber, Antti Ukkonen, and Aris Gionis . 2012. Answers, not links: extracting tips from yahoo! answers to address how-to web queries Proc. of WSDM. ACM, 613--622.
[46]
Wouter Weerkamp, Krisztian Balog, and Maarten de Rijke . 2012. Exploiting External Collections for Query Expansion. ACM Trans. Web, Vol. 6, 4 (2012), 1--29.
[47]
Ross Wilkinson . 1994. Effective Retrieval of Structured Documents. Proc. of SIGIR. Springer-Verlag New York, Inc., 311--317.
[48]
Kristian Woodsend and Mirella Lapata . 2012. Multiple aspect summarization using integer linear programming Proc. of EMNLP. Association for Computational Linguistics, 233--243.
[49]
Chenyan Xiong, Jamie Callan, and Tie-Yan Liu . 2017. Word-Entity Duet Representations for Document Ranking Proc. of SIGIR. ACM, 763--772.
[50]
Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft . 2008. Retrieval models for question and answer archives. Proc. of SIGIR. ACM, 475--482.
[51]
Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, and Juanzi Li . 2011. Social context summarization. In Proc. of SIGIR. ACM, 255--264.
[52]
Evi Yulianti, Ruey-Cheng Chen, Falk Scholer, W. Bruce Croft, and Mark Sanderson . 2018. Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data Eng. Vol. 30, 1 (2018), 15--28.
[53]
Hamed Zamani and W Bruce Croft . 2016. Embedding-based query language models. In Proc. of ICTIR. ACM, 147--156.
[54]
Chengxiang Zhai and John Lafferty . 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. Vol. 22, 2 (2004), 179--214.

Cited By

View all
  • (2024)Users’ satisfaction based ranking for Yahoo AnswersMultimedia Tools and Applications10.1007/s11042-024-18433-383:28(71265-71284)Online publication date: 7-Feb-2024
  • (2023)Predicting answer acceptability for question-answering systemInternational Journal on Digital Libraries10.1007/s00799-023-00357-225:4(555-568)Online publication date: 5-May-2023
  • (2021)Recency and quality-based ranking question in CQAsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10255258:4Online publication date: 1-Jul-2021
  • Show More Cited By

Index Terms

  1. Ranking Documents by Answer-Passage Quality

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
    June 2018
    1509 pages
    ISBN:9781450356572
    DOI:10.1145/3209978
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. answer passages
    2. document ranking
    3. quality estimation

    Qualifiers

    • Research-article

    Funding Sources

    • Indonesia Endowment Fund for Education
    • Australian Research Council

    Conference

    SIGIR '18
    Sponsor:

    Acceptance Rates

    SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Users’ satisfaction based ranking for Yahoo AnswersMultimedia Tools and Applications10.1007/s11042-024-18433-383:28(71265-71284)Online publication date: 7-Feb-2024
    • (2023)Predicting answer acceptability for question-answering systemInternational Journal on Digital Libraries10.1007/s00799-023-00357-225:4(555-568)Online publication date: 5-May-2023
    • (2021)Recency and quality-based ranking question in CQAsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10255258:4Online publication date: 1-Jul-2021
    • (2020)A novel approach for ranking web documents based on query-optimized personalized pagerankInternational Journal of Data Science and Analytics10.1007/s41060-020-00232-211:1(37-55)Online publication date: 18-Aug-2020
    • (2020)A passage-based approach to learning to rank documentsInformation Retrieval Journal10.1007/s10791-020-09369-x23:2(159-186)Online publication date: 6-Mar-2020
    • (2020)A Semantic Expansion-Based Joint Model for Answer Ranking in Chinese Question Answering SystemsInformation Retrieval Technology10.1007/978-3-030-42835-8_3(22-33)Online publication date: 27-Feb-2020
    • (2019)Selecting Paragraphs to Answer Questions for Multi-passage Machine Reading ComprehensionInformation Retrieval10.1007/978-3-030-31624-2_10(121-132)Online publication date: 18-Sep-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media