skip to main content
10.1145/3077136.3080645acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering

Published: 07 August 2017 Publication History

Abstract

We propose a heuristic called "one answer per document" for automatically extracting high-quality negative examples for answer selection in question answering. Starting with a collection of question-answer pairs from the popular TrecQA dataset, we identify the original documents from which the answers were drawn. Sentences from these source documents that contain query terms (aside from the answers) are selected as negative examples. Training on the original data plus these negative examples yields improvements in effectiveness by a margin that is comparable to successive recent publications on this dataset. Our technique is completely unsupervised, which means that the gains come essentially for free. We confirm that the improvements can be directly attributed to our heuristic, as other approaches to extracting comparable amounts of training data are not effective. Beyond the empirical validation of this heuristic, we also share our improved TrecQA dataset with the community to support further work in answer selection.

References

[1]
ACL 2017. Question Answering (State of the art). http://www.aclweb.org/aclwiki/index.php?title=Question_Answering_(State_of_the_art). (2017). Accessed: 2017-05-01.
[2]
Michele Banko and Eric Brill. 2001. Scaling to Very Very Large Corpora for Natural Language Disambiguation ACL. 26--33.
[3]
Charles L. A. Clarke, Gordon Cormack, and Thomas Lynam. 2001. Exploiting Redundancy in Question Answering. In SIGIR. 375--383.
[4]
Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, and Andrew Ng. 2002. Web Question Answering: Is More Always Better? SIGIR. 291--298.
[5]
Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems Vol. 24, 2 (2009), 8--12.
[6]
Hua He, Kevin Gimpel, and Jimmy Lin. 2015. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks EMNLP. 1576--1586.
[7]
Hua He and Jimmy Lin. 2016. Pairwise Word Interaction Modeling with Neural Networks for Semantic Similarity Measurement NAACL-HLT. 937--948.
[8]
Steve Krenzel. 2010. Finding blurbs. http://www.stevekrenzel.com/articles/blurbs.
[9]
Virgil Pavlu, Shahzad Rajput, Peter B. Golbus, and Javed A. Aslam. 2012. IR System Evaluation using Nugget-based Test Collections WSDM. 393--402.
[10]
Jinfeng Rao, Hua He, and Jimmy Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks CIKM. 1913--1916.
[11]
Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks SIGIR. 373--382.
[12]
Stefanie Tellex, Boris Katz, Jimmy Lin, Gregory Marton, and Aaron Fernandes. 2003. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering SIGIR. 41--47.
[13]
Ellen M. Voorhees. 2002. Overview of the TREC 2002 Question Answering Track TREC.
[14]
Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA EMNLP-CoNLL. 22--32.
[15]
Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance HLT-NAACL. 858--867.
[16]
David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods ACL. 189--196.
[17]
Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep Learning for Answer Sentence Selection. In NIPS Deep Learning Workshop.

Cited By

View all
  • (2019)Unsupervised and weakly supervised approaches for answer selection tasks with scarce annotationsOpen Computer Science10.1515/comp-2019-00089:1(136-144)Online publication date: 30-Jul-2019
  • (2019)A question-entailment approach to question answeringBMC Bioinformatics10.1186/s12859-019-3119-420:1Online publication date: 22-Oct-2019
  • (2018)CANThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210019(815-824)Online publication date: 27-Jun-2018

Index Terms

  1. Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
    August 2017
    1476 pages
    ISBN:9781450350228
    DOI:10.1145/3077136
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. distant supervision
    3. question answering
    4. trec

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    SIGIR '17
    Sponsor:

    Acceptance Rates

    SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Unsupervised and weakly supervised approaches for answer selection tasks with scarce annotationsOpen Computer Science10.1515/comp-2019-00089:1(136-144)Online publication date: 30-Jul-2019
    • (2019)A question-entailment approach to question answeringBMC Bioinformatics10.1186/s12859-019-3119-420:1Online publication date: 22-Oct-2019
    • (2018)CANThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210019(815-824)Online publication date: 27-Jun-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media