Elsevier

Information Sciences

Volume 214, 10 December 2012, Pages 76-90
Information Sciences

Combining relevancy and methodological quality into a single ranking for evidence-based medicine

https://doi.org/10.1016/j.ins.2012.05.027Get rights and content

Abstract

Evidence-based medicine has recently received a large amount of attention in medical research. To help clinical practices use evidence-based medicine, it should be easy to find the best current evidence that is relevant to the clinical question and has high methodological quality. However, searching for relevant articles and appraising their validity is demanding work for most clinicians. We hypothesize that, through an effective design that addresses the two major aspects - relevance and quality - together with a ranking algorithm, search engines can automatically retrieve articles that are relevant to clinical questions and are based on valid evidence. The contribution of this study has two parts. First, we approach this problem by combining methodologies. After designing a suitable document query-relevance score and methodological quality score, we combined them using various fusion methods. The result was a twofold increase in the mean average precision. Second, for correct evaluation, we built a test collection using a preexisting reliable database, the Cochrane Reviews, which allowed robust and comprehensive evaluation.

Introduction

Since its release in November 2004, Google Scholar [6] has gained much popularity among researchers and students in many fields, including medicine. Among the many features of Google Scholar, convenience and efficiency offer the largest advantages. By typing only keywords, we can quickly obtain relevance-ranked search results. Compared to a Boolean search engine such as PubMed [9], we do not have to agonize over constructing appropriate Boolean query combinations or spend time finding relevant articles from the retrieved documents. Although its thoroughness in searching might not reach the level of Boolean query strategies that an expert manually crafts, Google Scholar returns the most important and highly cited articles for even a non-professional searcher. Google Scholar describes its aim as ’Ranking documents the way researchers do’ [41]. This type of intelligence-flavored smart ranking algorithm can be useful, especially for clinicians who attempt to apply evidence-based medicine (EBM) in their daily practice.

EBM is widely recognized as an important concept in medical research. Evidence-based health care is the conscientious use of the best current evidence to make decisions about patient care or delivering health services. The best current evidence is up-to-date information from relevant, valid research about the effects of different forms of health care [21]. In [14], Ghosh et al. state that the future competence of a physician is not measured by his or her ability to recall facts but by his or her ability to incorporate the best current evidence into the patient’s personal values.

However, practicing EBM in daily clinical care may be challenging, considering a physician’s limited time and possibly inadequate searching skills [16]. EBM includes an appraising step, critically evaluating an article’s evidence to decide whether it is reliable and robust [14]. Searching for relevant articles and assessing their validity is a demanding task.

We approached this problem by regarding it as an information retrieval task with two distinct priorities: finding sufficient research articles relevant to the clinician’s question and finding valid articles based on EBM methodological criteria and principles. We hypothesize that a search engine designed to consider those two aspects together can retrieve articles that are relevant and valid. Using various fusion algorithms, we combined the relevant feature and methodological quality scores into a single ranking.

In this paper, we first built a test collection (Section 3.2) using preexisting sources (Cochrane Reviews). Second, we used a probabilistic retrieval model and machine learning classifier to determine each document’s query relevancy and quality scores, respectively (Sections 3.3.1 Relevance ranking, 3.3.2 Quality ranking). We applied various fusion techniques to re-rank the retrieved documents (Section 3.3.3).

Section snippets

PubMed and Google Scholar

PubMed is a free database that accesses the MEDLINE database of citations, abstracts, and some full-text articles on life science and biomedical topics [9]. PubMed currently contains over 21 million publications and offers a comprehensive search over the biomedical literature with advanced search features.

However, some people find PubMed difficult to use. The effectiveness of a Boolean search engine depends entirely on the user, because constructing complex Boolean queries that narrow the

Overall ranking strategy

We designed a ranking strategy as a three-step process (Fig. 2). First, we measured the relevance score for each document using a probabilistic retrieval model (Okapi BM25). Second, we used a machine-learning classifier to compute the quality score. We experimented with Naive Bayes, SVMlight, and SVMperf as machine-learning classifiers. Finally, we combined the relevance and quality scores, using various fusion methods to draw the final ranking scores.

Test collections

We used two different text collections in

Results

We applied the aforementioned methodology to the held-out test set. At the relevance ranking stage, we obtained relevance-ranked retrieval results using the Okapi BM25 weighting model implemented in the Terrier search engine. The MAP was measured as 7.4%. The macro-averaged precision was 0.4%, and the macro-averaged recall was 56.0%. At the quality ranking stage, the SVMperf classifier trained on CHD was applied to the relevance ranked results, printing the quality score for each document. The

Discussion

Ranking and Re-ranking. Compared to relevance or quality ranking alone, our re-ranking methodologies increased the performance impressively, showing great potential. We can summarize our future directions as follows.

Conclusions

In this paper, we attempted to design an effective EBM ranking algorithm. We combined relevance and quality ranking using various fusion methods, yielding significant improvements in the final ranking performance. In our study, we built our test collection utilizing Cochrane Reviews, using 17 million MEDLINE documents as a corpus, which met both relevancy and quality standards.

We are indebted to the prior studies that gave helpful insights, but we can derive inspiration from more studies. To

Acknowledgments

Use of the Clinical Hedges database was made possible through a collaboration agreement with R.B. Haynes and N.L. Wilczynski at McMaster University, Hamilton, Ontario Canada. This work was supported by the National Research Foundation of Korea (NRF) with a grant funded by the Korean Government (MEST) (No. 2009-0075089).

References (84)

  • J. Serrano-Guerrero

    A google wave-based fuzzy recommender system to disseminate information in University Digital Libraries 2.0

    Information Sciences

    (2011)
  • About ACP Journal Club (cited 2011 July 15),...
  • ACP Journal Club (cited 2011 July 15),...
  • The Cochrane Library (cited 2011 July 15),...
  • Cochrane Reviews (cited 2011 July 15),...
  • Evidence-Based Medicine (cited 2011 July 15),...
  • Google Scholar (cited 2011 July 14),...
  • Leasing Journal Citations (cited 2011 July 15),...
  • MeSH (cited 2011 July 15),...
  • PubMed (cited 2011 July 14),...
  • PubMed Clinical Queries (cited 2011 July 15),...
  • S. Alonso

    hg-Index: a new index to characterize the scientific output of researchers based on the h-and g-indices

    Scientometrics

    (2010)
  • G. Amati et al.

    Probabilistic models of information retrieval based on measuring the divergence from randomness

    ACM Transactions on Information Systems (TOIS)

    (2002)
  • Amit Ghosh, D. Stengel, Nancy Spector, Narayana Murali, Franz Porzsolt (Eds.), Evidence-Based Health Care Seen From...
  • P.B. Andrew

    The use of the area under the ROC curve in the evaluation of machine learning algorithms

    Pattern Recognition

    (1997)
  • C. Apté et al.

    Automated learning of decision rules for text categorization

    ACM Transactions on Information Systems (TOIS)

    (1994)
  • J.A. Aslam et al.

    Models for metasearch

  • V. Bewick et al.

    Statistics review 13: receiver operating characteristic curves

    Crit Care

    (2004)
  • G.V. Cormack et al.

    Validity and power of t-test for comparing MAP and GMAP

  • D. Demner-Fushman et al.

    Answering clinical questions with knowledge-based and statistical techniques

    Computational Linguistics

    (2007)
  • H. Drucker et al.

    Support vector machines for spam categorization

    IEEE Transactions on Neural Networks

    (1999)
  • A. Edward, J.A.S. Fox, Combination of multiple searches, in: Proceedings of the 2nd Text REtrieval Conference (TREC-2),...
  • S. Eyheramendy, D.D. Lewis, D. Madigan, On the naive bayes model for text categorization, in: 9th International...
  • H. Fang et al.

    A formal study of information retrieval heuristics

  • H. Fang et al.

    Semantic term matching in axiomatic approaches to information retrieval

  • S. Gerani et al.

    Investigating learning approaches for blog post opinion retrieval

  • C.H. Goulden

    Methods of Statistical Analysis

    (1952)
  • H. Zaragoza, N. Craswell, M. Taylor, S. Saria, S. Robertson, Microsoft cambridge at TREC 13: Web and hard tracks, in:...
  • M. Hall

    The WEKA data mining software: an update

    SIGKDD Exploration Newsletter

    (2009)
  • R.B. Haynes

    Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey

    BMJ

    (2005)
  • Cited by (19)

    • CLAVER: An integrated framework of convolutional layer, bidirectional LSTM with attention mechanism based scholarly venue recommendation

      2021, Information Sciences
      Citation Excerpt :

      Both weighted k-nearest neighbors and Lucene similarity scores are taken into consideration in order to rank articles. Choi et al. [7] designed an effective Evidence-Based Medicine (EBM) ranking algorithm that can automatically extract papers which are more relevant to clinical questions and are based on valid evidence. They mainly combined relevance and quality ranking using various fusion methods.

    • A relevance and quality-based ranking algorithm applied to evidence-based medicine

      2020, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      In this second stage, this proposal outperforms Choi's approach: 8.2% vs. 9.42%. Although this is not a big difference, it should be remarked that this proposal is completely unsupervised, whereas Choi et al. proposed a supervised learning algorithm to select the documents used as an input for the quality-based stage [18]. In our case, as the user is not present, the choice of the biggest cluster as the best one is not always appropriate, and for this reason, the results might be even worse than expected in a real scenario where a real user can interact with the system.

    View all citing articles on Scopus
    View full text