ABSTRACT
Comparative effectiveness reviews (CERs), a central methodology of comparative effectiveness research, are increasingly used to inform healthcare decisions. During these systematic reviews of the scientific literature, the reviewers (MD-methodologists) must screen several thousands of citations for eligibility according to a pre-specified protocol. While previous research has demonstrated the theoretical potential of machine learning to reduce the workload in CERs, practical obstacles to deploying such a system remain. In this article, we describe work on an end-to-end, interactive machine learning system for assisting reviewers with the tedious task of citation screening for CERs. Specifically, we present ABSTRACKR, our open-source annotation tool. In addition to allowing reviewers to designate citations as 'relevant' or 'irrelevant' to the review at hand, ABSTRACKR facilitates communicating other information useful to the classification model, such as terms that are suggestive of the relevance (or irrelevance) of a citation. The tool also records the time taken to screen citations, over which we conducted a time-series analysis to derive an annotator model. Using this model, we found that both the order in which the citations are screened and the length of each citation affect annotation time. We propose a strategy that integrates labeled terms and timing data into the Active Learning (AL) framework, in which an algorithm selects citations for the reviewer to label. We demonstrate empirically that this additional information can improve the performance of the semi-automated citation screening system.
- P. Wheeler, E. Balk, K. Bresnahan, B. Shephard, J. Lau, D. DeVine, M. Chung, and K. Miller. Criteria for determining disability in infants and children: short stature. Evidence Report/Technology Assessment No. 73. Prepared by New England Medical Center Evidence-based Practice Center under Contract No. 290-97-001, Mar 2003.Google Scholar
- C. Cole, G. Binney, P. Casey, J. Fiascone, J. Hagadorn, C. Kim, C. Wang, D. Devine, K. Miller, and J. Lau. Criteria for determining disability in infants and children: Low birth weight. Evidence Report/Technology Assessment No. 70. Prepared by New England Medical Center Evidence-based Practice Center under Contract No. 290-97-0019, 2002.Google Scholar
- E. Perrin, C. Cole, D. Frank, S. Glicken, N. Guerina, K. Petit, R. Sege, M. Volpe, P. Chew, C. MeFadden, D. Devine, K. Miller, and J. Lau. Criteria for determining disability in infants and children: failure to thrive. Evidence Report/Technology Assessment No. 72. Prepared by New England Medical Center Evidence-based Practice Center under Contract No. 290-97-0019, Mar 2003.Google Scholar
- S. Arora, E. Nyberg, and C. Rosé. Estimating annotation cost for active learning in a multi-annotator environment. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) Workshop on Active Learning for Natural Language Processing, pages 18--26. Association for Computational Linguistics, 2009. Google ScholarDigital Library
- J. Baldridge and A. Palmer. How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation. In Empirical Methods on Natural Language Processing (EMNLP), pages 296--305. Association for Computational Linguistics, 2009. Google ScholarDigital Library
- P. Donmez and J. G. Carbonell. Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In Conference on Information and Knowledge Management (CIKM), pages 619--628, 2008. Google ScholarDigital Library
- R. Haertel, K. Seppi, E. Ringger, and J. Carroll. Return on investment for active learning. In NIPS Workshop on Cost Sensitive Learning, 2009.Google Scholar
- N. Japkowicz. Learning from imbalanced data sets: A comparison of various strategies. AAAI Workshop on Learning from Imbalanced Data Sets, 2000.Google Scholar
- A. Mccallum and K. Nigam. Employing EM and pool-based active learning for text classification. In International Conference on Machine Learning (ICML), pages 350--358, San Francisco, CA, USA, 1998. Google ScholarDigital Library
- B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.Google Scholar
- B. Settles, M. Craven, and L. Friedland. Active learning with real annotation costs. In Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Cost-Sensitive Learning, pages 1069--1078. Citeseer, 2008.Google Scholar
- K. Tomanek and F. Olsson. A web survey on the use of active learning to support annotation of text data. In NAACL Workshop on AL for NLP, pages 45--48, June 2009. Google ScholarDigital Library
- S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Journal of Machine Learning Research, pages 999--1006, 2000. Google ScholarDigital Library
- A. J. Vickers and E. B. Elkin. Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26: 565--574, 2006.Google ScholarCross Ref
- B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Active learning for biomedical citation screening. In Knowledge Discovery and Data mining (KDD), 2010. Google ScholarDigital Library
- B. C. Wallace, T. A. Trikalinos, J. Lau, C. E. Brodley, and C. H. Schmid. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11, 2010.Google Scholar
Index Terms
- Modeling annotation time to reduce workload in comparative effectiveness reviews
Recommendations
Active learning for biomedical citation screening
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningActive learning (AL) is an increasingly popular strategy for mitigating the amount of labeled data required to train classifiers, thereby reducing annotator effort. We describe a real-world, deployed application of AL to the problem of biomedical ...
Deploying an interactive machine learning system in an evidence-based practice center: abstrackr
IHI '12: Proceedings of the 2nd ACM SIGHIT International Health Informatics SymposiumMedical researchers looking for evidence pertinent to a specific clinical question must navigate an increasingly voluminous corpus of published literature. This data deluge has motivated the development of machine learning and data mining technologies ...
A semi-supervised approach using label propagation to support citation screening
Display Omitted Systematic reviews can benefit from automatically screening relevant citations.We propose a new method that improves the performance by using similar citations.We utilise unlabelled documents by propagating labels in close ...
Comments