research-article

Modeling annotation time to reduce workload in comparative effectiveness reviews

Authors:
Byron C. Wallace

Tufts Medical Center, Boston, MA & Tufts University, Medford, MA, USA

Tufts Medical Center, Boston, MA & Tufts University, Medford, MA, USA
View Profile

,
Kevin Small

Tufts University, Medford, MA, USA

Tufts University, Medford, MA, USA
View Profile

,
Carla E. Brodley

Tufts University, Medford, MA, USA

Tufts University, Medford, MA, USA
View Profile

,
Joseph Lau

Tufts Medical Center, Boston, MA, USA

Tufts Medical Center, Boston, MA, USA
View Profile

,
Thomas A. Trikalinos

Tufts Medical Center, Boston, MA, USA

Tufts Medical Center, Boston, MA, USA
View Profile

IHI '10: Proceedings of the 1st ACM International Health Informatics SymposiumNovember 2010Pages 28–35https://doi.org/10.1145/1882992.1882999

Published:11 November 2010Publication History

IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium

Pages 28–35

ABSTRACT

Comparative effectiveness reviews (CERs), a central methodology of comparative effectiveness research, are increasingly used to inform healthcare decisions. During these systematic reviews of the scientific literature, the reviewers (MD-methodologists) must screen several thousands of citations for eligibility according to a pre-specified protocol. While previous research has demonstrated the theoretical potential of machine learning to reduce the workload in CERs, practical obstacles to deploying such a system remain. In this article, we describe work on an end-to-end, interactive machine learning system for assisting reviewers with the tedious task of citation screening for CERs. Specifically, we present ABSTRACKR, our open-source annotation tool. In addition to allowing reviewers to designate citations as 'relevant' or 'irrelevant' to the review at hand, ABSTRACKR facilitates communicating other information useful to the classification model, such as terms that are suggestive of the relevance (or irrelevance) of a citation. The tool also records the time taken to screen citations, over which we conducted a time-series analysis to derive an annotator model. Using this model, we found that both the order in which the citations are screened and the length of each citation affect annotation time. We propose a strategy that integrates labeled terms and timing data into the Active Learning (AL) framework, in which an algorithm selects citations for the reviewer to label. We demonstrate empirically that this additional information can improve the performance of the semi-automated citation screening system.

References

P. Wheeler, E. Balk, K. Bresnahan, B. Shephard, J. Lau, D. DeVine, M. Chung, and K. Miller. Criteria for determining disability in infants and children: short stature. Evidence Report/Technology Assessment No. 73. Prepared by New England Medical Center Evidence-based Practice Center under Contract No. 290-97-001, Mar 2003.Google Scholar
C. Cole, G. Binney, P. Casey, J. Fiascone, J. Hagadorn, C. Kim, C. Wang, D. Devine, K. Miller, and J. Lau. Criteria for determining disability in infants and children: Low birth weight. Evidence Report/Technology Assessment No. 70. Prepared by New England Medical Center Evidence-based Practice Center under Contract No. 290-97-0019, 2002.Google Scholar
E. Perrin, C. Cole, D. Frank, S. Glicken, N. Guerina, K. Petit, R. Sege, M. Volpe, P. Chew, C. MeFadden, D. Devine, K. Miller, and J. Lau. Criteria for determining disability in infants and children: failure to thrive. Evidence Report/Technology Assessment No. 72. Prepared by New England Medical Center Evidence-based Practice Center under Contract No. 290-97-0019, Mar 2003.Google Scholar
S. Arora, E. Nyberg, and C. Rosé. Estimating annotation cost for active learning in a multi-annotator environment. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) Workshop on Active Learning for Natural Language Processing, pages 18--26. Association for Computational Linguistics, 2009. Google ScholarDigital Library
J. Baldridge and A. Palmer. How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation. In Empirical Methods on Natural Language Processing (EMNLP), pages 296--305. Association for Computational Linguistics, 2009. Google ScholarDigital Library
P. Donmez and J. G. Carbonell. Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In Conference on Information and Knowledge Management (CIKM), pages 619--628, 2008. Google ScholarDigital Library
R. Haertel, K. Seppi, E. Ringger, and J. Carroll. Return on investment for active learning. In NIPS Workshop on Cost Sensitive Learning, 2009.Google Scholar
N. Japkowicz. Learning from imbalanced data sets: A comparison of various strategies. AAAI Workshop on Learning from Imbalanced Data Sets, 2000.Google Scholar
A. Mccallum and K. Nigam. Employing EM and pool-based active learning for text classification. In International Conference on Machine Learning (ICML), pages 350--358, San Francisco, CA, USA, 1998. Google ScholarDigital Library
B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.Google Scholar
B. Settles, M. Craven, and L. Friedland. Active learning with real annotation costs. In Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Cost-Sensitive Learning, pages 1069--1078. Citeseer, 2008.Google Scholar
K. Tomanek and F. Olsson. A web survey on the use of active learning to support annotation of text data. In NAACL Workshop on AL for NLP, pages 45--48, June 2009. Google ScholarDigital Library
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Journal of Machine Learning Research, pages 999--1006, 2000. Google ScholarDigital Library
A. J. Vickers and E. B. Elkin. Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26: 565--574, 2006.Google ScholarCross Ref
B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Active learning for biomedical citation screening. In Knowledge Discovery and Data mining (KDD), 2010. Google ScholarDigital Library
B. C. Wallace, T. A. Trikalinos, J. Lau, C. E. Brodley, and C. H. Schmid. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11, 2010.Google Scholar

Index Terms

Modeling annotation time to reduce workload in comparative effectiveness reviews
1. Applied computing
  1. Life and medical sciences

Recommendations

Active learning for biomedical citation screening
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Active learning (AL) is an increasingly popular strategy for mitigating the amount of labeled data required to train classifiers, thereby reducing annotator effort. We describe a real-world, deployed application of AL to the problem of biomedical ...
Read More
Deploying an interactive machine learning system in an evidence-based practice center: abstrackr
IHI '12: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium

Medical researchers looking for evidence pertinent to a specific clinical question must navigate an increasingly voluminous corpus of published literature. This data deluge has motivated the development of machine learning and data mining technologies ...
Read More
A semi-supervised approach using label propagation to support citation screening

Display Omitted Systematic reviews can benefit from automatically screening relevant citations.We propose a new method that improves the performance by using similar citations.We utilise unlabelled documents by propagating labels in close ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium
November 2010
886 pages
ISBN:9781450300308
DOI:10.1145/1882992
Editor:
Tiffany Veinot
University of Michigan, USA
,
General Chairs:
Ümit V. Çatalyürek
The Ohio State University, USA
,
Gang Luo
IBM Research, USA
,
Program Chairs:
Henrique Andrade
Goldman Sachs, USA
,
Neil R. Smalheiser
University of Illinois, Chicago, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 November 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
applications
medical
text classification
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 245
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modeling annotation time to reduce workload in comparative effectiveness reviews

IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active learning for biomedical citation screening

Deploying an interactive machine learning system in an evidence-based practice center: abstrackr

A semi-supervised approach using label propagation to support citation screening

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Modeling annotation time to reduce workload in comparative effectiveness reviews

IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active learning for biomedical citation screening

Deploying an interactive machine learning system in an evidence-based practice center: abstrackr

A semi-supervised approach using label propagation to support citation screening

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media