ABSTRACT
High-recall retrieval --- finding all or nearly all relevant documents --- is critical to applications such as electronic discovery, systematic review, and the construction of test collections for information retrieval tasks. The effectiveness of current methods for high-recall information retrieval is limited by their reliance on human input, either to generate queries, or to assess the relevance of documents. Past research has shown that humans can assess the relevance of documents faster and with little loss in accuracy by judging shorter document surrogates, e.g.\ extractive summaries, in place of full documents. To test the hypothesis that short document surrogates can reduce assessment time and effort for high-recall retrieval, we conducted a 50-person, controlled, user study. We designed a high-recall retrieval system using continuous active learning (CAL) that could display either full documents or short document excerpts for relevance assessment. In addition, we tested the value of integrating a search engine with CAL. In the experiment, we asked participants to try to find as many relevant documents as possible within one hour. We observed that our study participants were able to find significantly more relevant documents when they used the system with document excerpts as opposed to full documents. We also found that allowing participants to compose and execute their own search queries did not improve their ability to find relevant documents and, by some measures, impaired performance. These results suggest that for high-recall systems to maximize performance, system designers should think carefully about the amount and nature of user interaction incorporated into the system.
- Mustafa Abualsaud, Nimesh Ghelani, Haotian Zhang, Mark D. Smucker, Gordon V. Cormack, and Maura R. Grossman. 2018. A System for Efficient High-Recall Retrieval. In SIGIR . 1317--1320. Google ScholarDigital Library
- James Allan. 2004. HARD Track Overview in TREC 2004: High Accuracy Retrieval from Documents. In TREC .Google ScholarCross Ref
- James Allan, Evangelos Kanoulas, Dan Li, Christophe Van Gysel, Donna Harman, and Ellen Voorhees. 2017. TREC 2017 Common Core Track Overview. In TREC .Google Scholar
- Antonios Anagnostou, Athanasios Lagopoulos, Grigorios Tsoumakas, and Ioannis P. Vlahavas. 2017. Combining Inter-Review Learning-to-Rank and Intra-Review Incremental Training for Title and Abstract Screening in Systematic Reviews. In CLEF .Google Scholar
- Javed A Aslam, Virgiliu Pavlu, and Robert Savell. 2003. A unified model for metasearch, pooling, and system evaluation. In CIKM . 484--491. Google ScholarDigital Library
- Jason R Baron, David D Lewis, and Douglas W Oard. 2006. TREC 2006 Legal Track Overview. In TREC .Google Scholar
- Gaurav Baruah, Haotian Zhang, Rakesh Guttikonda, Jimmy Lin, Mark D Smucker, and Olga Vechtomova. 2016. Optimizing Nugget Annotations with Active Learning. In CIKM . 2359--2364. Google ScholarDigital Library
- Douglas Bates, Martin M"achler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4 . J. of Stat. Soft. , Vol. 67, 1, 1--48. %%Google ScholarCross Ref
- David C Blair and%% Melvin E Maron. 1985. %% An evaluation of retrieval effectiveness for a%% full-text document-retrieval system. %% Commun. ACM , Vol. 28,%% 3 (1985), 289--299. %% %% Google ScholarDigital Library
- Charles L. A. Clarke, Nick%% Craswell, and Ian Soboroff. %% 2009. %% Overview of the TREC 2009 Web Track. In%% TREC .%%Google Scholar
- Gordon V Cormack and Maura R Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In SIGIR . 153--162. Google ScholarDigital Library
- Gordon V. Cormack and Maura R. Grossman. 2015. Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review. CoRR , Vol. abs/1504.06868 (2015).Google Scholar
- Gordon V. Cormack and Maura R. Grossman. 2017. Technology-Assisted Review in Empirical Medicine: Waterloo Participation in CLEF eHealth 2017. In CLEF .Google Scholar
- Gordon V Cormack and Mona Mojdeh. 2009. Machine Learning for Information Retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. In TREC .Google Scholar
- Gordon V Cormack, Christopher R Palmer, and Charles LA Clarke. 1998. Efficient construction of large test collections. In SIGIR . 282--289. Google ScholarDigital Library
- Maura Grossman, Gordon Cormack, and Adam Roegiest. 2016. TREC 2016 Total Recall Track Overview. In TREC .Google Scholar
- Maura R Grossman, Gordon V Cormack, and Adam Roegiest. 2017. Automatic and Semi-Automatic Document Selection for Technology-Assisted Review. In SIGIR. 905--908. Google ScholarDigital Library
- Donna Harman. 2011. Information Retrieval Evaluation .Morgan & Claypool. Google ScholarDigital Library
- Bruce Hedin, Stephen Tomlinson, Jason R Baron, and Douglas W Oard. 2009. Overview of the TREC 2009 legal track .In TREC .Google Scholar
- Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2017. Clef 2017 technologically assisted reviews in empirical medicine overview. In CLEF. 11--14.Google Scholar
- Inderjeet Mani, Gary Klein, David House, Lynette Hirschman, Therese Firmin, and Beth Sundheim. 2002. SUMMAC: a text summarization evaluation. Natural Language Engineering , Vol. 8, 1 (2002), 43--68. Google ScholarDigital Library
- David Milne. 2014. WikipediaMiner. https://github.com/dnmilne/wikipediaminer.Google Scholar
- Makoto Miwa, James Thomas, Alison O'Mara-Eves, and Sophia Ananiadou. 2014. Reducing systematic review workload through certainty-based screening. Journal of biomedical informatics , Vol. 51 (2014), 242--253. Google ScholarDigital Library
- Marcia Munoz and Ramya Nagarajan. 2001. Sentence Splitter. Cognitive Computation Group, CS, UIUC, http://cogcomp.org/page/tools_view/2.Google Scholar
- Douglas W Oard, Bruce Hedin, Stephen Tomlinson, and Jason R Baron. 2008. Overview of the TREC 2008 legal track .In TREC .Google Scholar
- Douglas W Oard and William Webber. 2013. Information retrieval for e-discovery. Foundations and Trends® in Information Retrieval , Vol. 7, 2--3, 99--237.Google ScholarCross Ref
- R Core Team. 2014. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing. http://www.R-project.org/Google Scholar
- Adam Roegiest, Gordon Cormack, Maura Grossman, and Charles Clarke. 2015. TREC 2015 Total Recall track overview. In TREC .Google Scholar
- Mark Sanderson and Hideo Joho. 2004. Forming test collections with no system pooling. In SIGIR. 33--40. Google ScholarDigital Library
- Evan Sandhaus. 2008. The New York Times Annotated Corpus. (October 2008). LDC Catalog No.: LDC2008T19, https://catalog.ldc.upenn.edu/ldc2008t19.Google Scholar
- Mark D Smucker and Chandra Prakash Jethani. 2010. Human performance and retrieval precision revisited. In SIGIR. 595--602. Google ScholarDigital Library
- Ian Soboroff and Stephen Robertson. 2003. Building a filtering test collection for TREC 2002. In SIGIR. 243--250. Google ScholarDigital Library
- Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language-model based search engine for complex queries (extended version) . Technical Report IR-407. CIIR, CS Dept., U. of Mass. Amherst.Google Scholar
- John Tredennick. 2011. E-Discovery, My How You've Grown! https://catalystsecure.com/blog/2011/04/e-discovery-my-how-youve-grown/.Google Scholar
- Juliá n Urbano and Mó nica Marrero. 2017. The Treatment of Ties in AP Correlation. In ICTIR. 321--324. Google ScholarDigital Library
- Ellen M Voorhees. 2001. Evaluation by highly relevant documents. In SIGIR . 74--82. %% Google ScholarDigital Library
- Ellen M. Voorhees and%% Donna Harman. 1999. %% Overview of the Eighth Text REtrieval Conference%% (TREC-8). In TREC .%%Google Scholar
- Ellen M Voorhees and Donna K Harman. 2005. The text retrieval conference. TREC: Experiment and evaluation in information retrieval (2005), 3--19.Google Scholar
- Ellen M Voorhees, Donna K Harman, et almbox. 2005. TREC: Experiment and evaluation in information retrieval. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
- Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. 2008. A New Rank Correlation Coefficient for Information Retrieval. In SIGIR. 587--594. Google ScholarDigital Library
- Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Angshuman Ghosh, Mark D. Smucker, Gordon V. Cormack, and Maura R. Grossman. 2017. UWaterlooMDS at the TREC 2017 Common Core Track. In TREC .Google Scholar
- Haotian Zhang, Gordon V. Cormack, Maura R. Grossman, and Mark D. Smucker. 2018. Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval. CoRR , Vol. abs/1803.08988 (2018).Google Scholar
- Haotian Zhang, Jimmy Lin, Gordon V Cormack, and Mark D Smucker. 2016. Sampling Strategies and Active Learning for Volume Estimation. In SIGIR. 981--984. Google ScholarDigital Library
Index Terms
- Effective User Interaction for High-Recall Retrieval: Less is More
Recommendations
Scalability of Continuous Active Learning for Reliable High-Recall Text Classification
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementFor finite document collections, continuous active learning ('CAL') has been observed to achieve high recall with high probability, at a labeling cost asymptotically proportional to the number of relevant documents. As the size of the collection ...
A System for Efficient High-Recall Retrieval
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalThe goal of high-recall information retrieval (HRIR) is to find all or nearly all relevant documents for a search topic. In this paper, we present the design of our system that affords efficient high-recall retrieval. HRIR systems commonly rely on ...
High-Recall Information Retrieval from Linked Big Data
COMPSAC '15: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 02In the current era of big data, high volumes of valuable information are available in collections of documents, the web, social networks, and high varieties of linked data. To search and retrieve useful information from these linked data, users often ...
Comments