short-paper

Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering

Authors:

Mark D. SmuckerAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 797 - 800

https://doi.org/10.1145/3077136.3080645

Published: 07 August 2017 Publication History

Abstract

We propose a heuristic called "one answer per document" for automatically extracting high-quality negative examples for answer selection in question answering. Starting with a collection of question-answer pairs from the popular TrecQA dataset, we identify the original documents from which the answers were drawn. Sentences from these source documents that contain query terms (aside from the answers) are selected as negative examples. Training on the original data plus these negative examples yields improvements in effectiveness by a margin that is comparable to successive recent publications on this dataset. Our technique is completely unsupervised, which means that the gains come essentially for free. We confirm that the improvements can be directly attributed to our heuristic, as other approaches to extracting comparable amounts of training data are not effective. Beyond the empirical validation of this heuristic, we also share our improved TrecQA dataset with the community to support further work in answer selection.

References

[1]

ACL 2017. Question Answering (State of the art). http://www.aclweb.org/aclwiki/index.php?title=Question_Answering_(State_of_the_art). (2017). Accessed: 2017-05-01.

[2]

Michele Banko and Eric Brill. 2001. Scaling to Very Very Large Corpora for Natural Language Disambiguation ACL. 26--33.

[3]

Charles L. A. Clarke, Gordon Cormack, and Thomas Lynam. 2001. Exploiting Redundancy in Question Answering. In SIGIR. 375--383.

Digital Library

[4]

Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, and Andrew Ng. 2002. Web Question Answering: Is More Always Better? SIGIR. 291--298.

[5]

Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems Vol. 24, 2 (2009), 8--12.

Digital Library

[6]

Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks EMNLP. 1576--1586.

[7]

Hua He and Jimmy Lin. 2016. Pairwise Word Interaction Modeling with Neural Networks for Semantic Similarity Measurement NAACL-HLT. 937--948.

[8]

Steve Krenzel. 2010. Finding blurbs. http://www.stevekrenzel.com/articles/blurbs.

[9]

Virgil Pavlu, Shahzad Rajput, Peter B. Golbus, and Javed A. Aslam. 2012. IR System Evaluation using Nugget-based Test Collections WSDM. 393--402.

[10]

Jinfeng Rao, Hua He, and Jimmy Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks CIKM. 1913--1916.

[11]

Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks SIGIR. 373--382.

[12]

Stefanie Tellex, Boris Katz, Jimmy Lin, Gregory Marton, and Aaron Fernandes. 2003. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering SIGIR. 41--47.

[13]

Ellen M. Voorhees. 2002. Overview of the TREC 2002 Question Answering Track TREC.

[14]

Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA EMNLP-CoNLL. 22--32.

[15]

Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance HLT-NAACL. 858--867.

[16]

David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods ACL. 189--196.

[17]

Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep Learning for Answer Sentence Selection. In NIPS Deep Learning Workshop.

Cited By

Vallee ECharlet DGalassi FMarzinotto GClérot FMeyer F(2019)Unsupervised and weakly supervised approaches for answer selection tasks with scarce annotationsOpen Computer Science10.1515/comp-2019-00089:1(136-144)Online publication date: 30-Jul-2019
https://doi.org/10.1515/comp-2019-0008
Ben Abacha ADemner-Fushman D(2019)A question-entailment approach to question answeringBMC Bioinformatics10.1186/s12859-019-3119-420:1Online publication date: 22-Oct-2019
https://doi.org/10.1186/s12859-019-3119-4
Chen QHu QHuang JHe LCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)CANThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210019(815-824)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210019

Index Terms

Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Question answering

Recommendations

Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
Human question answering performance using an interactive document retrieval system
IIIX '12: Proceedings of the 4th Information Interaction in Context Symposium

Every day, people answer their questions by using document retrieval systems. Compared to document retrieval systems, question answering (QA) systems aim to speed the rate at which users find answers by retrieving answers rather than documents. To ...
Question Answering Based on Answer Trustworthiness
AIRS '09: Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology

Nowadays, we are faced with finding "trustworthy" answers not only "relevant" answers. This paper proposes a QA model based on answer trustworthiness. Contrary to the past researches which focused simple trust factors of a document, we identified three ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
392
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vallee ECharlet DGalassi FMarzinotto GClérot FMeyer F(2019)Unsupervised and weakly supervised approaches for answer selection tasks with scarce annotationsOpen Computer Science10.1515/comp-2019-00089:1(136-144)Online publication date: 30-Jul-2019
https://doi.org/10.1515/comp-2019-0008
Ben Abacha ADemner-Fushman D(2019)A question-entailment approach to question answeringBMC Bioinformatics10.1186/s12859-019-3119-420:1Online publication date: 22-Oct-2019
https://doi.org/10.1186/s12859-019-3119-4
Chen QHu QHuang JHe LCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)CANThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210019(815-824)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210019

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten