research-article

Optimizing Nugget Annotations with Active Learning

Authors:
Gaurav Baruah

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Haotian Zhang

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Rakesh Guttikonda

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Jimmy Lin

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Mark D. Smucker

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Olga Vechtomova

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementOctober 2016Pages 2359–2364https://doi.org/10.1145/2983323.2983694

Published:24 October 2016Publication History

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 2359–2364

ABSTRACT

Nugget-based evaluations, such as those deployed in the TREC Temporal Summarization and Question Answering tracks, require human assessors to determine whether a nugget is present in a given piece of text. This process, known as nugget annotation, is labor-intensive. In this paper, we present two active learning techniques that prioritize the sequence in which candidate nugget/sentence pairs are presented to an assessor, based on the likelihood that the sentence contains a nugget. Our approach builds on the recognition that nugget annotation is similar to high-recall retrieval, and we adapt proven existing solutions. Simulation experiments with four existing TREC test collections show that our techniques yield far more matches for a given level of effort than baselines that are typically deployed in previous nugget-based evaluations.

References

J. Allan. HARD Track Overview in TREC 2004 High Accuracy Retrieval fromDocuments. TREC, 2004.Google ScholarCross Ref
J. A. Aslam, M. Ekstrand-Abueg, V. Pavlu, F. Diaz, and T. Sakai. TREC 2013 Temporal Summarization. TREC, 2013.Google Scholar
J. A. Aslam, M. Ekstrand-Abueg, V. Pavlu, F. Diaz, and T. Sakai. TREC 2014 Temporal Summarization. TREC, 2014.Google Scholar
L. Azzopardi and G. Zuccon. Building and Using Models of Information Seeking Search and Retrieval: Full Day Tutorial. SIGIR, 2015. Google ScholarDigital Library
G. Baruah, A. Roegiest, and M. D. Smucker. The Effect of Expanding Relevance Judgements with Duplicates. SIGIR, 2014. Google ScholarDigital Library
C. L. A. Clarke and M. D. Smucker. Time Well Spent. IIiX, 2014. Google ScholarDigital Library
G. Cormack and M. Grossman. Engineering Quality and Reliability in Technology-Assisted Review. SIGIR, 2016. Google ScholarDigital Library
G. V. Cormack and M. R. Grossman. Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery. SIGIR, 2014. Google ScholarDigital Library
G. V. Cormack and M. R. Grossman. Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review. CoRR, abs/1504.06868, 2015.Google Scholar
M. Ekstrand-Abueg. Personal Communication. 2014Google Scholar
D. Harman. Information Retrieval Evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services, 3(2), 2011. Google ScholarDigital Library
J. He, M. Bron, A. de Vries, L. Azzopardi, and M. de Rijke. Untangling Result List Refinement and Ranking Quality: A Framework for Evaluation and Prediction. SIGIR, 2015. Google ScholarDigital Library
J. Lin and D. Demner-Fushman. Automatically Evaluating Answers to Definition Questions. NAACL-HLT, 2005. Google ScholarDigital Library
G. Marton and A. Radul. Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements. NAACL-HLT, 2006. Google ScholarDigital Library
V. Pavlu, S. Rajput, P. B. Golbus, and J. A. Aslam. IR System Evaluation using Nugget-based Test Collections. WSDM, 2012. Google ScholarDigital Library
S. Rajput, M. Ekstrand-Abueg, V. Pavlu, and J. A. Aslam. Constructing Test Collections by Inferring Document Relevance via Extracted Relevant Information. CIKM, 2012. Google ScholarDigital Library
A. Roegiest, G. Cormack, M. Grossman, and C. Clarke. TREC 2015 Total Recall Track Overview. TREC, 2015.Google Scholar
E. M. Voorhees. Overview of the TREC 2004 Question Answering Track. TREC, 2004.Google Scholar
E. M. Voorhees. Overview of the TREC 2005 Question Answering Track. TREC, 2005.Google Scholar
E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval, MIT press Cambridge, 2005. Google ScholarDigital Library

Index Terms

Optimizing Nugget Annotations with Active Learning
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment

Recommendations

A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

There is growing interest in systems that generate timeline summaries by filtering high-volume streams of documents to retain only those that are relevant to a particular event or topic. Continued advances in algorithms and techniques for this task ...
Read More
Human question answering performance using an interactive document retrieval system
IIIX '12: Proceedings of the 4th Information Interaction in Context Symposium

Every day, people answer their questions by using document retrieval systems. Compared to document retrieval systems, question answering (QA) systems aim to speed the rate at which users find answers by retrieving answers rather than documents. To ...
Read More
Opinion retrieval from blogs
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Opinion retrieval is a document retrieval process, which requires documents to be retrieved and ranked according to their opinions about a query topic. A relevant document must satisfy two criteria: relevant to the query topic, and contains opinions ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
nugget-based evaluations
question answering
temporal summarization
trec
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 103
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing Nugget Annotations with Active Learning

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries

Human question answering performance using an interactive document retrieval system

Opinion retrieval from blogs