skip to main content
10.1145/3539618.3591768acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Smooth Operators for Effective Systematic Review Queries

Published: 18 July 2023 Publication History

Abstract

Effective queries are crucial to minimising the time and cost of medical systematic reviews, as all retrieved documents must be judged for relevance. Boolean queries, developed by expert librarians, are the standard for systematic reviews. They guarantee reproducible and verifiable retrieval and more control than free-text queries. However, the result sets of Boolean queries are unranked and difficult to control due to the strict Boolean operators. We address these problems in a single unified retrieval model by formulating a class of smooth operators that are compatible with and extend existing Boolean operators. Our smooth operators overcome several shortcomings of previous extensions of the Boolean retrieval model. In particular, our operators are independent of the underlying ranking function, so that exact-match and large language model rankers can be combined in the same query. We found that replacing Boolean operators with equivalent or similar smooth operators often improves the effectiveness of queries. Their properties make tuning a query to precision or recall intuitive and allow greater control over how documents are retrieved. This additional control leads to more effective queries and reduces the cost of systematic reviews.

References

[1]
Amal Alharbi, William Briggs, and Mark Stevenson. 2018. Retrieving and Ranking Studies for Systematic Reviews: University of Sheffield 's Approach to CLEF eHealth 2018 Task 2. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum, Vol. 2125. CEUR Workshop Proceedings.
[2]
Amal Alharbi and Mark Stevenson. 2020. Refining Boolean Queries to Identify Relevant Studies for Systematic Review Updates. Journal of the American Medical Informatics Association, Vol. 27, 11 (Nov. 2020), 1658--1666. https://doi.org/10.1093/jamia/ocaa148
[3]
Sophia Ananiadou, Brian Rea, Naoaki Okazaki, Rob Procter, and James Thomas. 2009. Supporting Systematic Reviews Using Text Mining. Social Science Computer Review, Vol. 27, 4 (2009), 509--523.
[4]
Abraham Bookstein. 1980. Fuzzy Requests: An Approach to Weighted Boolean Searches. Journal of the american Society for Information Science, Vol. 31, 4 (1980), 240--247.
[5]
Gloria Bordogna and Gabriella Pasi. 1993. A Fuzzy Linguistic Approach Generalizing Boolean Information Retrieval: A Model and Its Evaluation. Journal of the American Society for Information Science, Vol. 44, 2 (March 1993), 70--82. https://doi.org/10.1002/(SICI)1097-4571(199303)44:270::AID-ASI23.0.CO;2-I
[6]
Duncan A Buell. 1981. A General Model of Query Processing in Information Retrieval Systems. Information Processing & Management, Vol. 17, 5 (1981).
[7]
Justin Clark. 2013. Systematic Reviewing. In Methods of Clinical Epidemiology, Gail M. Williams Suhail A. R. Doi (Ed.).
[8]
A.M. Cohen, W.R. Hersh, K. Peterson, and P.Y. Yen. 2006. Reducing Workload in Systematic Review Preparation Using Automated Citation Classification. Journal of the American Medical Informatics Association, Vol. 13, 2 (2006), 206--219.
[9]
Gordon V. Cormack, Charles L A Clarke, and Stefan Buettcher. 2009. Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '09. ACM Press, Boston, MA, USA, 758. https://doi.org/10.1145/1571941.1572114
[10]
Gordon V Cormack and Maura R Grossman. 2015. Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review. arXiv preprint arXiv:1504.06868 (2015). arxiv: 1504.06868
[11]
Gordon V Cormack and Maura R Grossman. 2018. Technology-Assisted Review in Empirical Medicine: Waterloo Participation in CLEF eHealth 2018. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arxiv: 1810.04805 [cs]
[13]
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Transactions on Computing for Healthcare (HEALTH), Vol. 3, 1 (2021), 1--23.
[14]
Julian Higgins, James Thomas, Jacqueline Chandler, Miranda Cumpston, Tianjing Li, Matthew Page, and Vivian Welch. 2022. Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane. https://training.cochrane.org/handbook/current/chapter-i
[15]
Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, and Allan Hanbury. 2022. Establishing Strong Baselines For TripClick Health Retrieval. In European Conference on Information Retrieval. Springer, 144--152.
[16]
D Frank Hsu and Isak Taksa. 2005. Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval. Information retrieval Journal, Vol. 8, 3 (2005), 449--480.
[17]
Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2017. CLEF 2017 Technologically Assisted Reviews in Empirical Medicine Overview. In CEUR Workshop Proceedings: Working Notes of CLEF 2017: Conference and Labs of the Evaluation Forum.
[18]
Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2019. CLEF 2019 Technology Assisted Reviews in Empirical Medicine Overview. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum, Vol. 2380.
[19]
Evangelos Kanoulas, Rene Spijker, Dan Li, and Leif Azzopardi. 2018. CLEF 2018 Technology Assisted Reviews in Empirical Medicine Overview. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum.
[20]
Sarvnaz Karimi, Justin Zobel, Stefan Pohl, and Falk Scholer. 2009. The Challenge of High Recall in Biomedical Systematic Search. In Proceedings of the 3rd International Workshop on Data and Text Mining in Bioinformatics. 89--92.
[21]
Youngho Kim, Jangwon Seo, and W Bruce Croft. 2011. Automatic Boolean Query Suggestion for Professional Search. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[22]
Grace E. Lee and Aixin Sun. 2018. Seed-Driven Document Ranking for Systematic Reviews in Evidence-Based Medicine. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 455--464.
[23]
Carolyn E. Lipscomb. 2000. Medical Subject Headings (MeSH ). Bulletin of the Medical Library Association, Vol. 88, 3 (July 2000), 265--266.
[24]
Robert Losee. 1987. Probabilistic Retrieval and Coordination Level Matching. Journal of the American Society for Information Science, Vol. 38, 4 (1987), 239--244.
[25]
Craig Macdonald and Iadh Ounis. 2006. Voting for Candidates: Adapting Data Fusion Techniques for an Expert Search Task. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, 387--396.
[26]
D. Martinez, S. Karimi, L. Cavedon, and T. Baldwin. 2008. Facilitating Biomedical Systematic Reviews Using Ranked Text Retrieval and Classification. In Proceedings of the 13th Australasian Document Computing Symposium.
[27]
Matthew Michelson and Katja Reuter. 2019. The Significant Cost of Systematic Reviews and Meta-Analyses: A Call for Greater Involvement of Machine Learning to Assess the Promise of Clinical Trials. Contemporary Clinical Trials Communications, Vol. 16 (Dec. 2019), 100443. https://doi.org/10.1016/j.conctc.2019.100443
[28]
Adamantios Minas, Athanasios Lagopoulos, and Grigorios Tsoumakas. 2018. Aristotle University's Approach to the Technologically Assisted Reviews in Empirical Medicine Task of the 2018 CLEF eHealth Lab. In CEUR Workshop Proceedings: Working Notes of CLEF 2018: Conference and Labs of the Evaluation Forum.
[29]
M. Miwa, J. Thomas, A. O'Mara-Eves, and S. Ananiadou. 2014. Reducing Systematic Review Workload through Certainty-Based Screening. Journal of Biomedical Informatics, Vol. 51 (2014), 242--253.
[30]
National Library of Medicine (US). 1963. Medical Subject Headings: Main Headings, Sub-headings, and Cross References Used in the Index Medicus and the National Library of Medicine Catalog. US Department of Health, Education, and Welfare. Public Health Service.
[31]
Alison O'Mara-Eves, James Thomas, John McNaught, Makoto Miwa, and Sophia Ananiadou. 2015. Using Text Mining for Study Identification in Systematic Reviews: A Systematic Review of Current Approaches. Systematic reviews, Vol. 4, 1 (2015), 5.
[32]
Chris D Paice. 1984. Soft Evaluation of Boolean Search Queries in Information Retrieval Systems. Information Technology: Research and Development, Vol. 3, 1 (1984), 33--41.
[33]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 85 (2011), 2825--2830.
[34]
Stefan Pohl, Justin Zobel, and Alistair Moffat. 2010. Extended Boolean Retrieval for Systematic Biomedical Reviews. Computer Science, Vol. 102 (2010).
[35]
Piotr Przybyła, Austin J. Brockmeier, Georgios Kontonatsios, Marie-Annick Le Pogam, John McNaught, Erik von Elm, Kay Nolan, and Sophia Ananiadou. 2018. Prioritising References for Systematic Reviews with RobotAnalyst: A User Study. Research Synthesis Methods, Vol. 9, 3 (2018), 470--488. https://doi.org/10.1002/jrsm.1311
[36]
Tadeusz Radecki. 1979. Fuzzy Set Theoretical Approach to Document Retrieval. Information Processing & Management, Vol. 15, 5 (1979), 247--259.
[37]
Navid Rekabsaz, Oleg Lesota, Markus Schedl, Jon Brassey, and Carsten Eickhoff. 2021. TripClick: The Log Files of a Large Health Web Search Engine. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2507--2513.
[38]
Stephen E Robertson. 1977. The Probability Ranking Principle in IR. Journal of documentation, Vol. 33, 4 (1977), 294--304.
[39]
Gerard Salton, Edward A Fox, and Harry Wu. 1982. Extended Boolean Information Retrieval. Technical Report. Cornell University.
[40]
Gerard Salton, Edward A Fox, and Harry Wu. 1983. Extended Boolean Information Retrieval. Commun. ACM, Vol. 26, 11 (1983), 1022--1036.
[41]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. ArXiv (Oct. 2019).
[42]
Harrisen Scells, Connor Forbes, Justin Clark, Bevan Koopman, and Guido Zuccon. 2022. The Impact of Query Refinement on Systematic Review Literature Search: A Query Log Analysis. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, Madrid Spain, 34--42. https://doi.org/10.1145/3539813.3545143
[43]
Harrisen Scells and Martin Potthast. 2023. pybool_ir: A Toolkit for Domain-Specific Search Experiments. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Taipei Taiwan.
[44]
Harrisen Scells and Guido Zuccon. 2018. Generating Better Queries for Systematic Reviews. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 475--484.
[45]
Harrisen Scells and Guido Zuccon. 2020. You Can Teach an Old Dog New Tricks: Rank Fusion Applied to Coordination Level Matching for Ranking in Systematic Reviews. In Proceedings of the 42nd European Conference on Information Retrieval. 399--414.
[46]
Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2019a. Automatic Boolean Query Refinement for Systematic Review Literature Search. In Proceedings of the 28th World Wide Web Conference. 1646--1656.
[47]
Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2020. A Comparison of Automatic Boolean Query Formulation for Systematic Reviews. Information Retrieval Journal (2020), 1--26.
[48]
Harrisen Scells, Guido Zuccon, Bevan Koopman, and Justin Clark. 2019b. Automatic Search Strategy Reformulation Interface for Systematic Reviews. In Proceedings of the 2019 Cochrane Colloquium.
[49]
Joseph A Shaw and Edward A Fox. 1995. Combination of Multiple Searches. NIST SPECIAL PUBLICATION SP (1995), 105--105.
[50]
I. Shemilt, A. Simon, G.J. Hollands, T.M. Marteau, D. Ogilvie, A. O'Mara-Eves, M.P. Kelly, and J. Thomas. 2014. Pinpointing Needles in Giant Haystacks: Use of Text Mining to Reduce Impractical Screening Workload in Extremely Large Scoping Reviews. Research Synthesis Methods, Vol. 5, 1 (2014), 31--49.
[51]
Maria Smith. 1990. Aspects of the P-Norm Model of Information Retrieval: Syntactic Query Generation, Efficiency, and Theoretical Properties. (May 1990).
[52]
CM Stansfield, Alison O'Mara-Eves, and James Thomas. 2015. Reducing Systematic Review Workload Using Text Mining: Opportunities and Pitfalls. Journal of the European Association for Health Information and Libraries, Vol. 11, 3 (2015), 8--10.
[53]
Christopher C Vogt and Garrison W Cottrell. 1999. Fusion via a Linear Combination of Scores. Information retrieval Journal, Vol. 1, 3 (1999), 151--173.
[54]
Byron C Wallace, Kevin Small, Carla E Brodley, Joseph Lau, and Thomas A Trikalinos. 2012. Deploying an Interactive Machine Learning System in an Evidence-Based Practice Center: Abstrackr. In Proceedings of the 2nd ACM International Health Informatics Symposium. 819--824.
[55]
Byron C Wallace, Thomas A Trikalinos, Joseph Lau, Carla Brodley, and Christopher H Schmid. 2010. Semi-Automated Screening of Biomedical Citations for Systematic Reviews. BMC bioinformatics, Vol. 11, 1 (2010), 55.
[56]
WG Waller and Donald H Kraft. 1979. A Mathematical Model of a Weighted Boolean Retrieval System. Information Processing & Management, Vol. 15, 5 (1979), 235--245.
[57]
Shuai Wang, Hang Li, Harrisen Scells, Daniel Locke, and Guido Zuccon. 2021. MeSH Term Suggestion for Systematic Review Literature Search. In Australasian Document Computing Symposium. ACM, Virtual Event Australia, 1--8. https://doi.org/10.1145/3503516.3503530
[58]
Shuai Wang, Harrisen Scells, Justin Clark, Bevan Koopman, and Guido Zuccon. 2022a. From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[59]
Shuai Wang, Harrisen Scells, Bevan Koopman, and Guido Zuccon. 2022b. Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search. Intelligent Systems with Applications, Vol. 16 (Nov. 2022), 200141. https://doi.org/10.1016/j.iswa.2022.200141
[60]
Shuai Wang, Harrisen Scells, Bevan Koopman, and Guido Zuccon. 2022c. Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search. https://doi.org/10.1145/3572960.3572980 arxiv: 2212.09017 [cs]
[61]
Huaying Wu, Tingting Wang, Jiayi Chen, Su Chen, Qinmin Hu, and Liang He. 2018. Ecnu at 2018 Ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine. Methods-a Companion to Methods in Enzymology, Vol. 4, 5 (2018), 7.
[62]
Lotfi A. Zadeh. 1965. Fuzzy Sets. Information and control, Vol. 8, 3 (1965), 338--353.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. boolean queries
  2. retrieval models
  3. systematic reviews

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 193
    Total Downloads
  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)6
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media