research-article

Smooth Operators for Effective Systematic Review Queries

Authors:

Harrisen Scells,

Ferdinand Schlatt,

Martin PotthastAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 580 - 590

https://doi.org/10.1145/3539618.3591768

Published: 18 July 2023 Publication History

Abstract

Effective queries are crucial to minimising the time and cost of medical systematic reviews, as all retrieved documents must be judged for relevance. Boolean queries, developed by expert librarians, are the standard for systematic reviews. They guarantee reproducible and verifiable retrieval and more control than free-text queries. However, the result sets of Boolean queries are unranked and difficult to control due to the strict Boolean operators. We address these problems in a single unified retrieval model by formulating a class of smooth operators that are compatible with and extend existing Boolean operators. Our smooth operators overcome several shortcomings of previous extensions of the Boolean retrieval model. In particular, our operators are independent of the underlying ranking function, so that exact-match and large language model rankers can be combined in the same query. We found that replacing Boolean operators with equivalent or similar smooth operators often improves the effectiveness of queries. Their properties make tuning a query to precision or recall intuitive and allow greater control over how documents are retrieved. This additional control leads to more effective queries and reduces the cost of systematic reviews.

References

[1]

Amal Alharbi, William Briggs, and Mark Stevenson. 2018. Retrieving and Ranking Studies for Systematic Reviews: University of Sheffield 's Approach to CLEF eHealth 2018 Task 2. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum, Vol. 2125. CEUR Workshop Proceedings.

[2]

Amal Alharbi and Mark Stevenson. 2020. Refining Boolean Queries to Identify Relevant Studies for Systematic Review Updates. Journal of the American Medical Informatics Association, Vol. 27, 11 (Nov. 2020), 1658--1666. https://doi.org/10.1093/jamia/ocaa148

[3]

Sophia Ananiadou, Brian Rea, Naoaki Okazaki, Rob Procter, and James Thomas. 2009. Supporting Systematic Reviews Using Text Mining. Social Science Computer Review, Vol. 27, 4 (2009), 509--523.

Digital Library

[4]

Abraham Bookstein. 1980. Fuzzy Requests: An Approach to Weighted Boolean Searches. Journal of the american Society for Information Science, Vol. 31, 4 (1980), 240--247.

[5]

Gloria Bordogna and Gabriella Pasi. 1993. A Fuzzy Linguistic Approach Generalizing Boolean Information Retrieval: A Model and Its Evaluation. Journal of the American Society for Information Science, Vol. 44, 2 (March 1993), 70--82. https://doi.org/10.1002/(SICI)1097-4571(199303)44:270::AID-ASI23.0.CO;2-I

[6]

Duncan A Buell. 1981. A General Model of Query Processing in Information Retrieval Systems. Information Processing & Management, Vol. 17, 5 (1981).

[7]

Justin Clark. 2013. Systematic Reviewing. In Methods of Clinical Epidemiology, Gail M. Williams Suhail A. R. Doi (Ed.).

[8]

A.M. Cohen, W.R. Hersh, K. Peterson, and P.Y. Yen. 2006. Reducing Workload in Systematic Review Preparation Using Automated Citation Classification. Journal of the American Medical Informatics Association, Vol. 13, 2 (2006), 206--219.

[9]

Gordon V. Cormack, Charles L A Clarke, and Stefan Buettcher. 2009. Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '09. ACM Press, Boston, MA, USA, 758. https://doi.org/10.1145/1571941.1572114

Digital Library

[10]

Gordon V Cormack and Maura R Grossman. 2015. Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review. arXiv preprint arXiv:1504.06868 (2015). arxiv: 1504.06868

[11]

Gordon V Cormack and Maura R Grossman. 2018. Technology-Assisted Review in Empirical Medicine: Waterloo Participation in CLEF eHealth 2018. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum.

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arxiv: 1810.04805 [cs]

[13]

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Transactions on Computing for Healthcare (HEALTH), Vol. 3, 1 (2021), 1--23.

Digital Library

[14]

Julian Higgins, James Thomas, Jacqueline Chandler, Miranda Cumpston, Tianjing Li, Matthew Page, and Vivian Welch. 2022. Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane. https://training.cochrane.org/handbook/current/chapter-i

[15]

Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, and Allan Hanbury. 2022. Establishing Strong Baselines For TripClick Health Retrieval. In European Conference on Information Retrieval. Springer, 144--152.

[16]

D Frank Hsu and Isak Taksa. 2005. Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval. Information retrieval Journal, Vol. 8, 3 (2005), 449--480.

[17]

Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2017. CLEF 2017 Technologically Assisted Reviews in Empirical Medicine Overview. In CEUR Workshop Proceedings: Working Notes of CLEF 2017: Conference and Labs of the Evaluation Forum.

[18]

Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2019. CLEF 2019 Technology Assisted Reviews in Empirical Medicine Overview. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum, Vol. 2380.

[19]

Evangelos Kanoulas, Rene Spijker, Dan Li, and Leif Azzopardi. 2018. CLEF 2018 Technology Assisted Reviews in Empirical Medicine Overview. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum.

[20]

Sarvnaz Karimi, Justin Zobel, Stefan Pohl, and Falk Scholer. 2009. The Challenge of High Recall in Biomedical Systematic Search. In Proceedings of the 3rd International Workshop on Data and Text Mining in Bioinformatics. 89--92.

Digital Library

[21]

Youngho Kim, Jangwon Seo, and W Bruce Croft. 2011. Automatic Boolean Query Suggestion for Professional Search. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[22]

Grace E. Lee and Aixin Sun. 2018. Seed-Driven Document Ranking for Systematic Reviews in Evidence-Based Medicine. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 455--464.

[23]

Carolyn E. Lipscomb. 2000. Medical Subject Headings (MeSH ). Bulletin of the Medical Library Association, Vol. 88, 3 (July 2000), 265--266.

[24]

Robert Losee. 1987. Probabilistic Retrieval and Coordination Level Matching. Journal of the American Society for Information Science, Vol. 38, 4 (1987), 239--244.

[25]

Craig Macdonald and Iadh Ounis. 2006. Voting for Candidates: Adapting Data Fusion Techniques for an Expert Search Task. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, 387--396.

Digital Library

[26]

D. Martinez, S. Karimi, L. Cavedon, and T. Baldwin. 2008. Facilitating Biomedical Systematic Reviews Using Ranked Text Retrieval and Classification. In Proceedings of the 13th Australasian Document Computing Symposium.

[27]

Matthew Michelson and Katja Reuter. 2019. The Significant Cost of Systematic Reviews and Meta-Analyses: A Call for Greater Involvement of Machine Learning to Assess the Promise of Clinical Trials. Contemporary Clinical Trials Communications, Vol. 16 (Dec. 2019), 100443. https://doi.org/10.1016/j.conctc.2019.100443

[28]

Adamantios Minas, Athanasios Lagopoulos, and Grigorios Tsoumakas. 2018. Aristotle University's Approach to the Technologically Assisted Reviews in Empirical Medicine Task of the 2018 CLEF eHealth Lab. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum.

[29]

M. Miwa, J. Thomas, A. O'Mara-Eves, and S. Ananiadou. 2014. Reducing Systematic Review Workload through Certainty-Based Screening. Journal of Biomedical Informatics, Vol. 51 (2014), 242--253.

Digital Library

[30]

National Library of Medicine (US). 1963. Medical Subject Headings: Main Headings, Sub-headings, and Cross References Used in the Index Medicus and the National Library of Medicine Catalog. US Department of Health, Education, and Welfare. Public Health Service.

[31]

Alison O'Mara-Eves, James Thomas, John McNaught, Makoto Miwa, and Sophia Ananiadou. 2015. Using Text Mining for Study Identification in Systematic Reviews: A Systematic Review of Current Approaches. Systematic reviews, Vol. 4, 1 (2015), 5.

[32]

Chris D Paice. 1984. Soft Evaluation of Boolean Search Queries in Information Retrieval Systems. Information Technology: Research and Development, Vol. 3, 1 (1984), 33--41.

Digital Library

[33]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 85 (2011), 2825--2830.

Digital Library

[34]

Stefan Pohl, Justin Zobel, and Alistair Moffat. 2010. Extended Boolean Retrieval for Systematic Biomedical Reviews. Computer Science, Vol. 102 (2010).

[35]

Piotr Przybyła, Austin J. Brockmeier, Georgios Kontonatsios, Marie-Annick Le Pogam, John McNaught, Erik von Elm, Kay Nolan, and Sophia Ananiadou. 2018. Prioritising References for Systematic Reviews with RobotAnalyst: A User Study. Research Synthesis Methods, Vol. 9, 3 (2018), 470--488. https://doi.org/10.1002/jrsm.1311

[36]

Tadeusz Radecki. 1979. Fuzzy Set Theoretical Approach to Document Retrieval. Information Processing & Management, Vol. 15, 5 (1979), 247--259.

[37]

Navid Rekabsaz, Oleg Lesota, Markus Schedl, Jon Brassey, and Carsten Eickhoff. 2021. TripClick: The Log Files of a Large Health Web Search Engine. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2507--2513.

Digital Library

[38]

Stephen E Robertson. 1977. The Probability Ranking Principle in IR. Journal of documentation, Vol. 33, 4 (1977), 294--304.

Digital Library

[39]

Gerard Salton, Edward A Fox, and Harry Wu. 1982. Extended Boolean Information Retrieval. Technical Report. Cornell University.

[40]

Gerard Salton, Edward A Fox, and Harry Wu. 1983. Extended Boolean Information Retrieval. Commun. ACM, Vol. 26, 11 (1983), 1022--1036.

Digital Library

[41]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. ArXiv (Oct. 2019).

[42]

Harrisen Scells, Connor Forbes, Justin Clark, Bevan Koopman, and Guido Zuccon. 2022. The Impact of Query Refinement on Systematic Review Literature Search: A Query Log Analysis. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, Madrid Spain, 34--42. https://doi.org/10.1145/3539813.3545143

Digital Library

[43]

Harrisen Scells and Martin Potthast. 2023. pybool_ir: A Toolkit for Domain-Specific Search Experiments. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Taipei Taiwan.

Digital Library

[44]

Harrisen Scells and Guido Zuccon. 2018. Generating Better Queries for Systematic Reviews. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 475--484.

Digital Library

[45]

Harrisen Scells and Guido Zuccon. 2020. You Can Teach an Old Dog New Tricks: Rank Fusion Applied to Coordination Level Matching for Ranking in Systematic Reviews. In Proceedings of the 42nd European Conference on Information Retrieval. 399--414.

Digital Library

[46]

Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2019a. Automatic Boolean Query Refinement for Systematic Review Literature Search. In Proceedings of the 28th World Wide Web Conference. 1646--1656.

Digital Library

[47]

Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2020. A Comparison of Automatic Boolean Query Formulation for Systematic Reviews. Information Retrieval Journal (2020), 1--26.

[48]

Harrisen Scells, Guido Zuccon, Bevan Koopman, and Justin Clark. 2019b. Automatic Search Strategy Reformulation Interface for Systematic Reviews. In Proceedings of the 2019 Cochrane Colloquium.

[49]

Joseph A Shaw and Edward A Fox. 1995. Combination of Multiple Searches. NIST SPECIAL PUBLICATION SP (1995), 105--105.

[50]

I. Shemilt, A. Simon, G.J. Hollands, T.M. Marteau, D. Ogilvie, A. O'Mara-Eves, M.P. Kelly, and J. Thomas. 2014. Pinpointing Needles in Giant Haystacks: Use of Text Mining to Reduce Impractical Screening Workload in Extremely Large Scoping Reviews. Research Synthesis Methods, Vol. 5, 1 (2014), 31--49.

[51]

Maria Smith. 1990. Aspects of the P-Norm Model of Information Retrieval: Syntactic Query Generation, Efficiency, and Theoretical Properties. (May 1990).

[52]

CM Stansfield, Alison O'Mara-Eves, and James Thomas. 2015. Reducing Systematic Review Workload Using Text Mining: Opportunities and Pitfalls. Journal of the European Association for Health Information and Libraries, Vol. 11, 3 (2015), 8--10.

[53]

Christopher C Vogt and Garrison W Cottrell. 1999. Fusion via a Linear Combination of Scores. Information retrieval Journal, Vol. 1, 3 (1999), 151--173.

[54]

Byron C Wallace, Kevin Small, Carla E Brodley, Joseph Lau, and Thomas A Trikalinos. 2012. Deploying an Interactive Machine Learning System in an Evidence-Based Practice Center: Abstrackr. In Proceedings of the 2nd ACM International Health Informatics Symposium. 819--824.

Digital Library

[55]

Byron C Wallace, Thomas A Trikalinos, Joseph Lau, Carla Brodley, and Christopher H Schmid. 2010. Semi-Automated Screening of Biomedical Citations for Systematic Reviews. BMC bioinformatics, Vol. 11, 1 (2010), 55.

[56]

WG Waller and Donald H Kraft. 1979. A Mathematical Model of a Weighted Boolean Retrieval System. Information Processing & Management, Vol. 15, 5 (1979), 235--245.

[57]

Shuai Wang, Hang Li, Harrisen Scells, Daniel Locke, and Guido Zuccon. 2021. MeSH Term Suggestion for Systematic Review Literature Search. In Australasian Document Computing Symposium. ACM, Virtual Event Australia, 1--8. https://doi.org/10.1145/3503516.3503530

Digital Library

[58]

Shuai Wang, Harrisen Scells, Justin Clark, Bevan Koopman, and Guido Zuccon. 2022a. From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[59]

Shuai Wang, Harrisen Scells, Bevan Koopman, and Guido Zuccon. 2022b. Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search. Intelligent Systems with Applications, Vol. 16 (Nov. 2022), 200141. https://doi.org/10.1016/j.iswa.2022.200141

[60]

Shuai Wang, Harrisen Scells, Bevan Koopman, and Guido Zuccon. 2022c. Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search. https://doi.org/10.1145/3572960.3572980 arxiv: 2212.09017 [cs]

Digital Library

[61]

Huaying Wu, Tingting Wang, Jiayi Chen, Su Chen, Qinmin Hu, and Liang He. 2018. Ecnu at 2018 Ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine. Methods-a Companion to Methods in Enzymology, Vol. 4, 5 (2018), 7.

[62]

Lotfi A. Zadeh. 1965. Fuzzy Sets. Information and control, Vol. 8, 3 (1965), 338--353.

Index Terms

Smooth Operators for Effective Systematic Review Queries
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query reformulation
    2. Specialized information retrieval

Recommendations

Automatic Boolean Query Refinement for Systematic Review Literature Search
WWW '19: The World Wide Web Conference

In the medical domain, systematic reviews are a highly trustworthy evidence source used to inform clinical diagnosis and treatment, and governmental policy making. Systematic reviews must be complete in that all relevant literature for the research ...
Generating Better Queries for Systematic Reviews
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Systematic reviews form the cornerstone of evidence based medicine, aiming to answer complex medical questions based on all evidence currently available. Key to the effectiveness of a systematic review is an (often large) Boolean query used to search ...
A comparison of automatic Boolean query formulation for systematic reviews
Abstract
Systematic reviews are comprehensive literature reviews that target a highly focused research question. In the medical domain, complex Boolean queries are used to identify studies. To ensure comprehensiveness, all studies retrieved are screened ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Commission
Alexander von Humboldt Stiftung

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
193
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten