A systematic review of machine learning applications in predicting opioid associated adverse events

Ramírez Medina, Carlos R.; Benitez-Aurioles, Jose; Jenkins, David A.; Jani, Meghna

doi:10.1038/s41746-024-01312-4

Download PDF

Article
Open access
Published: 16 January 2025

A systematic review of machine learning applications in predicting opioid associated adverse events

npj Digital Medicine volume 8, Article number: 30 (2025) Cite this article

2080 Accesses
9 Altmetric
Metrics details

Subjects

Abstract

Machine learning has increasingly been applied to predict opioid-related harms due to its ability to handle complex interactions and generating actionable predictions. This review evaluated the types and quality of ML methods in opioid safety research, identifying 44 studies using supervised ML through searches of Ovid MEDLINE, PubMed and SCOPUS databases. Commonly predicted outcomes included postoperative opioid use (n = 15, 34%) opioid overdose (n = 8, 18%), opioid use disorder (n = 8, 18%) and persistent opioid use (n = 5, 11%) with varying definitions. Most studies (96%) originated from North America, with only 7% reporting external validation. Model performance was moderate to strong, but calibration was often missing (41%). Transparent reporting of model development was often incomplete, with key aspects such as calibration, imbalance correction, and handling of missing data absent. Infrequent external validation limited the generalizability of current models. Addressing these aspects is critical for transparency, interpretability, and future implementation of the results.

Machine learning-based causal models for predicting the response of individual patients to dexamethasone treatment as prophylactic antiemetic

Article Open access 09 May 2023

Development and validation of a risk-score model for opioid overdose using a national claims database

Article Open access 23 March 2022

Predictive models in emergency medicine and their missing data strategies: a systematic review

Article Open access 23 February 2023

Introduction

Opioids, a class of medications used to treat acute and chronic pain, are associated with persistent use, adverse events, unintentional overdoses, and deaths¹. In 2020, nearly 70,000 opioid-related overdose deaths were reported in the United States^2,3. Although opioid mortality rates in other countries have not reached these levels, the adverse consequences of prescription opioid use are increasing in countries such as Canada, Australia, and the United Kingdom parallel to increasing prescription use^4,5,6,7 In response to the global impact of harmful opioid use on public health, international efforts have intensified to combat the opioid crisis through the development of effective prevention and treatment strategies⁸.

There has been growing interest in using machine learning (ML) to improve diagnosis, prognosis, and clinical decision-making, a trend largely driven by the widespread availability of large-volume data, such as electronic health records (EHRs) and advances in technology⁹. ML techniques have shown promise in handling large, nonlinear, and high-dimensional datasets and modelling complex clinical scenarios, offering flexibility over traditional statistical models¹⁰. However, despite their potential, ML applications in healthcare have not consistently led to improved patient outcomes¹⁰ and often fail to achieve notable clinical impact.^11,12 The increasing complexity of ML models such as gradient-boosted machines, random forests, and neural networks can reduce their transparency and interpretability, limiting their application within clinical practice.^13,14,15 Model development is often driven more by data availability than by clinical relevance¹⁶, with performance gains not always justifying the trade-off between accuracy and interpretability¹⁷. Although several ML models have been developed recently to predict opioid-related harms, their clinical utility remains uncertain. We conducted a systematic literature review to identify and summarise available ML prediction models pertaining to opioid-related harm outcomes and assess strengths, limitations, and risk of bias in these studies, providing an overview of the current state-of-the-art ML methods in opioid-drug safety research.

Results

Study selection

The search yielded 1315 studies (Fig. 1) 347 studies were identified as duplicates, and 894 were excluded after title/abstract screening. Seventy-four full-text articles were retrieved for full review. Among excluded studies, 11 did not use ML methods, six described studies that did not develop predictive models as the main objective, and three relied on data sources other than those in the scope of the review. The flow diagram outlining the study selection process and displaying detailed results of our literature search in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) can be found in Fig. 1.

Overview of included studies

All eligible studies were published between 2017 and 2023, with most of them performed in 2021 (n = 11, 25%, Supplementary Table 1). Studies were mainly conducted with data from the United States (n = 39,88.6%), with some studies originating in Canada^18,19,20 (n = 3, 6.8%), Iran (n = 1,2.3%)²¹, and Switzerland (n = 1,2.3%)²². The majority used EHRs as the main data source (n = 31, 70%) as opposed to administrative data, with a retrospective cohort study design to perform their analysis. External validation, vital for evaluating the generalisability of the models, was rare and identified for only five studies^{23,24,25,26,27}. For two studies this external validation was conducted within the same study of the model creation using a separate dataset^26,27, and for three models we identified independent studies that aimed to validate the models’ performance in a different setting^23,25. The performance of the machine learning models assessed in the reviewed studies varied widely, with area under the receiver operating characteristic curve (AUC) values ranging from 0.68²⁸ to 0.96²⁹ (Table 1). We caution against direct comparison due to differences in training and testing settings across studies. Variations include differences in datasets used, outcomes measured, data partition sizes, and the specific models employed.

Table 1 Description of included studies that focussed on machine learning in opioids-associated adverse outcomes

Full size table

The reliability of risk predictions (calibration metrics) was not reported for a notable portion of the reviewed studies (n = 18, 40.9%). These metrics are essential for evaluating how well the predicted probabilities align with the actual observed outcomes [Table 1].

Outcomes of interest of the reviewed studies

Within clinical prediction, diagnostic prediction models are used to estimate the probability of a disease that is already present, while prognostic models aim to assess the risk of future health conditions³⁰. The largest category of studies in this review focused on prognostic models for postoperative opioid use, with 15 studies (34%)^{23,31,32,33,34,35,36,37,38,39,40,41,42,43,44}. The majority of these studies examined hip or knee arthroscopy^31,45,46 and spine surgery^32,33,34. Other primary outcomes of prediction models included opioid overdose prediction (n = 8, 18%)^{26,35,36,37,38,39,40,41}, opioid use disorder (n = 8, 18%)^{20,29,42,43,44,47,48,49} and prolonged opioid use (with varying definitions, to be detailed in subsequent sections) (n = 5, 11%)^{22,25,50,51,52}. Additionally, four studies (7%) utilised a composite outcome that included hospitalisations, emergency department visits, substance abuse and mortality^19,27,53,54. A smaller subset of studies concentrated on other opioid-related harms as their main outcomes. Specifically, two studies (5%) focused on opioid dependence^55,56, one (n = 1, 2%) on mortality⁵⁷, and one on seizure after tramadol overdose (n = 1, 2%)²¹.

In the following sections, we provide a detailed explanation of each identified category and an overall summary of the methods, data collection procedures, and statistical analyses used in these studies to examine their research questions. Full summary of included studies, predictor variables (Supplementary Table 2) and numbers of participants/ outcomes (Supplementary Table 3) are included in the online supplementary information.

Prognostic models to predict opioid use after surgery

Fifteen studies (34%)^{23,24,28,31,32,33,34}^{,45,46,58,59,60,61,62,63} were identified with the primary objective of developing a prognostic predictive model using machine learning to address postoperative opioid use.

All eligible studies were conducted within the last four years, with most of the studies developed within 2022 (n = 4/15, 27%)^31,33,34,46, while the earliest studies were from 2019 (n = 3/15, 20%)²⁴^,59,60. The studies included in this systematic review used various terms to describe their outcomes related to postoperative opioid use, such as “prolonged,” “chronic use”, “persistent,” “sustained,” and “extended” opioid use. In the absence of a universally agreed definition that can yield different prevalence results⁶⁴, these definitions varied and encompassing different metrics at various time points. Examples include any opioid prescription filled between 90 to 365 days after surgery, continued opioid use beyond a 3-month postoperative period, use extending up to 6 months, continued postoperative opioid use at specific intervals (14 to 20 days, 6 weeks, 3 months, 6 months), filling at least one opioid prescription more than 90 days after surgery, uninterrupted filling of opioid prescriptions for at least 90 to 180 days, and opioid consumption continuing for at least 150 days following surgery.

Datasets and Sampling: Thirteen studies (n = 13/15, 87%) used EHRs as the main data source (two from a military data repository), and the remaining used insurance claims (n = 2/15, 13%). All the prediction models in these studies were developed with data from patients in the United States, and external validation could only be identified for two of the developed prediction models^23,24 (both in Taiwanese cohorts)^65,66. External validation remains to be performed in non-US and non-Taiwanese patient groups for all these developed models.

Sample sizes across the included studies exhibited substantial variation, ranging from 381³¹ to 112,898 patients²⁸ (mean = 13,209.6, median= 5,507). The overall number of outcome events was considerably smaller for most of the studies, with the percentage of outcome incidence ranging from 4%²³ to 41%⁴⁶ (mean = 13%, median= 10%). Although data to develop the prediction models was imbalanced (with an outcome frequency of less than 20%) for thirteen studies in this group, only eight studies explicitly acknowledged it and addressed using techniques such as oversampling (n = 2/15, 13%)³¹^,³³ or by reporting the area under the precision-recall curve (AUPRC) (n = 5, 33%)^23,60. AUPRC provides a more informative measure of classifier performance on imbalanced data sets than more common classification metrics such as AUC and accuracy, which can be misleading in such a scenario. Many of these studies only used demographic and preoperative predictors to build their predictive models. (n = 8/15, 53%).

When addressing missing data, only eight studies explained how they were handled. Most of them used multiple imputation method with ‘missForest’ (n = 7/15, 46%), a popular Random Forest (RF)-based missing data imputation package in software. One used the Multivariate Imputation by Chained Equations (MICE) package in R. While only two studies explicitly mentioned how continuous variables were handled^46,58, eight suggested that they were variables were kept as continuous in downstream analysis; for five studies this was left unclear (33%).

Most studies (n = 8/15, 53%) that developed prognostic models to predict opioid use after surgery used a group of five algorithms for predictive modelling due to their ability to handle complex data and produce accurate predictions: Stochastic Gradient Boosting, RF, Support Vector Machine, Neural Network, Elastic-net Penalised Logistic Regression. Other studies also incorporated XGBoost (n = 4 out of 15, 27%) and LASSO (n = 4 out of 15, 27%) algorithms in their analyses. The most common algorithms were RF (n = 12/15, 80%) and Elastic-net penalised logistic regression (n = 9/15, 60%). Values reported for the area under the receiver operating characteristic curve (AUC) ranged from 0.68²⁸ to 0.94³³. Notably, logistic regression, along with its regularized form, Elastic Net, were consistently reported to perform on par to the performance ensemble methods such as random forests and gradient boosting machine algorithms (n = 7/15, 46%). Calibration metrics were reported for most of the studies (n = 13/15, 87%), which included calibration plots, intercept, slope and Brier Score (ranging from 0.037 to 0.136) for these studies.

Patient subtypes: Most of the research on opioid use after orthopaedic surgery included patients who underwent a specific single surgical procedure, mainly arthroplasty (n = 3/15, 20%)³¹^,⁴⁵^,⁴⁶ and spine patients (n = 3/15, 20%)^23,24,61.

Prognostic models to predict opioid use disorder

Eight studies were identified^{20,29,42,43,44,47,48,49}. All, except for one Canadian study²⁰ used data from the United States. The sample size varied significantly, ranging from 130,120²⁹ to 5,183,566 ⁴⁴ (mean= 1,116,761, median= 361,527) with the percentage of outcome events ranging from 1% to 4% (Mean = 2%, Median=2%). Data in this category of studies were highly imbalanced. Two studies used oversampling techniques to handle classification imbalance^20,42 and three more reported AUPRC^44,47,48. No external validation was identified for any of these models. Data-driven variable selection was only mentioned in two studies, using the Andersen Behavioural Model⁴² and LASSO logistic regression methods alongside to supervised machine learning algorithms⁴⁴. In terms of performance, Gradient Boosting and neural network classifiers had the best performance based on AUC ranging from 0.88⁴⁹ to 0.959²⁹ with no overall performance benefit over logistic regression. In contrast, Support Vector Machine (SVM) was the worst-performing algorithm⁴⁷. Very few studies (n = 2/8, 25%) reported performing calibration as part of their analysis but only one study appropriately reported it, including calibration plots, intercept, slope, and Brier Score⁴⁹.

Prognostic models to predict opioid overdose

Eight studies were identified^{26,35,36,37,38,39,40,41}. All the studies in this subgroup were conducted in the United States, four (n = 4 out of 8, 50%) used EHR and four used administrative and insurance claims data from sources like Medicaid or Medicare (n = 4 out of 8, 50%), with the majority being developed in 2021 (n = 3 out of 8, 38%). The population sample size in these studies ranged from 237,259 to 7,284,389, with the percentage of outcome events ranging from 0.04%³⁷ to 1.36%²⁶. Although data imbalance was only directly addressed with imbalance data techniques for three studies^36,37,40 it was acknowledged by all studies and additional performance metrics were reported, including AUPRC²⁶^,38,39. Only one study was externally validated²⁶. The models developed in this category had high c-statistics, ranging from 0.85²⁶ to 0.95³⁵ and good prediction performance.

Prediction models for persistent opioid use

Five studies were identified.²²^,25,50,51,52 This included three with an explicit chronic opioid use outcome, one focusing on long-term opioid use prediction and one predicting progression from acute to chronic opioid use⁵¹. All of them were retrospective studies, with the oldest dating back to 2018²⁵. The population sample size ranged from 27,705 to 418,564 patients, with the percentage of outcome events ranging from 5%²⁵ to 30%⁵¹. Only one study addressed class imbalance but used a down-sampling approach²⁵, which risks discarding valuable information potentially leading to reduced model accuracy. All studies had internal validation, but only one was externally validated using data from two additional healthcare organizations²⁵ (Supplementary Table 4). The top-performing algorithms in terms of AUC were logistic regression (3 out of 5, 60%)²⁵, RF classifier (n = 1 out of 5, 20%)⁵², and XGBoost (n = 1 out of 5, 20%)⁵¹, with SVM being the worst performing (AUC = 0.72)⁵¹. One study also found that class balancing did not have a significant impact on model performance for most models, despite the relatively rare outcome⁵⁰. Only one study in this category (n = 1/5, 20%) reported and provided details of calibration for their analysed models⁵². Calibration details reported across studies are presented in Supplementary Table 5.

Prediction models for opioid dependence

Two studies were identified^55,56. The sample size ranged from 102,166 to 199,273. The percentage of outcome events ranged from 0.7%⁵⁵ to 3.9%⁵⁶. While both studies used large samples from US EHRs, Che et al benefited from having data from multiple hospitals. Additionally, Che et al. had access to a clear diagnosis of “opioid dependence” in the patient records⁵⁵ whilst Ellis et al. had to rely on various forms of substance dependence for the definition of their outcome (not exclusive to opioids)⁵⁶. Both studies addressed the small number of outcomes with class imbalance techniques and showed a good discrimination performance of 0.8⁵⁵ and 0.87⁵⁶. Che et al. reported that deep learning solutions achieved superior classification results and outperformed other baseline methods. None of the studies reported calibration measures nor were found to be externally validated.

Prognostic models to predict other opioid-related harms

Only a limited number of studies investigated opioid-related harms beyond those commonly addressed above. Among these studies, four developed prognostic models using administrative health data sets to predict hospitalization, emergency visits, and mortality^19,27,53,54. For one of these studies, simpler linear models carried higher discrimination and performance¹⁹ and for another, although the final model using XGBOOST had high discrimination performance, the calibration plot showed a consistent overestimation of risk⁵³. Vunikili et al. used a retrospective cohort of patients to build a model predicting opioid abuse and mortality. The XGBoost algorithm outperformed logistic regression for classifying patients susceptible to “opioid risks”⁵⁴. Fouladvand, et al. focused on predicting whether a person experienced opioid abuse, dependence, or an overdose event (opioid-related adverse outcomes) during the 6-month period after surgery. The best predictive performance was achieved by the RF model, with an AUC of 0.87. The model was well calibrated and had good discrimination and was externally validated using data from other states within the United States²⁷. Other studies in this category developed models to predict mortality risk after nonfatal opioid overdose (using gradient boosting machines)⁵⁷ and predict seizure due to tramadol poisoning (using machine learning models that did not show significant performance improvements compared to the logistic regression model)²¹.

Risks of bias

In accordance with the Prediction Model Risk of Bias Assessment Tool (PROBAST) guidance, we systematically evaluated the risk of bias in 44 identified studies across four key domains: participants, predictors, outcomes, and analysis (Table 2). We found that 16/44 studies had a high risk of bias in at least one domain and 19/44 unclear in at least one domain due to lack of information. Only 9/44 studies were found to have a low risk of bias across all domains. A summary of the risk of bias assessment for machine learning algorithms grouped by type of outcome is presented in Supplementary Figs. 1–4.

Table 2 PROBAST results for each domain considered for each paper included in the systematic review

Full size table

Participants: Ten studies had a high (9/44) or unclear (1/44) risk of bias for their participants. This was primarily due to the following issues: (1) the model was built using datasets from a single health centre (2) the paper’s inclusion and exclusion criteria is not described with enough details to be reproducible and/or (3) there are large differences in demographics of the target population and the population the data on which the model was built upon.

Predictors: Among the studies, 5/44 lacked sufficient information on predictors, rendering bias assessment challenging. For 36 studies, the risk of bias was deemed low due to the comprehensive description of predictor definition, selection, assessment, and handling during analysis. Conversely, 3/44 studies exhibited a high risk of bias, primarily due to insufficient details for the reproducibility of the analysis.

Outcomes: In terms of outcome variables, 36/44 studies demonstrated low risk, 3/44 were unclear, and 5/44 exhibited a high risk of bias. The key contributing factor was the inconsistency between the studies’ stated objective of predicting opioid-related harms and the inclusion of outcomes not specific to opioid-related harms.

Analysis: Only 11/44 studies were evaluated to have a low risk of bias in their analysis, either by adhering to TRIPOD guidelines, following the Guidelines for Developing and Reporting Machine Learning Models in Biomedical research, or providing sufficient detail on the analysis methodology. Conversely, 28 studies lacked adequate information, leading to classification as unclear risk, while 5 studies exhibited a high risk of bias. Common shortcomings included inadequate information for reproducibility, such as missing data handling, class imbalance handling, hyperparameter description, tool/library versions, and non-availability of code (Table 3).

Table 3 Description of Studies describing class imbalance considerations hyperparameters and data/ code availability

Full size table

Public availability of the algorithms and models: Only 5/44 studies published the code for reproducing their results, 4/44 suggested that it is available on request (Table 3).

Discussion

This systematic review summarises the extensive efforts of the international community to address the opioid crisis by developing ML prediction models. Comparing complex models with more interpretable ones is necessary to assess whether the trade-offs between performance and interpretability is justified¹⁷. Studies in this area have primarily been published in recent years and show promise and potential in identifying patients at risk of opioid-related harms in North America (n = 36/44, including studies from the US n = 33/44 and Canada n = 3/44). However, our findings reveal specific methodological limitations, including poor transparency, inadequate machine learning methodology disclosure, limited reproducibility, and biases in study design. The existing literature often relies on C-statistics alone as a performance metric, which may lead to overestimating the advantages of the prediction tools for rare outcomes. In a highly imbalanced dataset, as can be the case with opioid-adverse events, the ROC curve can be overly optimistic. ROC-AUC performance metric is calculated based on the true positive rate and false positive rate⁶⁷. The true positive rate (also known as sensitivity or recall) represents the proportion of actual positives correctly identified as such. The false positive rate represents the proportion of actual negatives incorrectly identified as positives⁶⁸. When the true negative rate is much larger (also known as the specificity or calculated as 1- false positive rate), even a large number of false positives might result in a low false positive rate, artificially inflating the AUC and giving a false impression of high performance (AUC close to 1) even when the model is not effectively identifying the few positive cases. Considering additional metrics, such as the Precision-Recall curve, F1 Score, or other metrics could better reflect model performance in imbalanced scenarios⁶⁹. The absence of calibration reporting in a substantial proportion of the examined studies (41%) raised concerns about the reliability of the reviewed prediction models. Details about internal and external validation for each of the studies is included in Supplementary Table 4.

Our review suggests that developed ML models for predicting opioid related harms primarily showcase ML’s potential rather than being created for clinical application, making them restricted for research purposes only. Key barriers to the practical application of these prediction models include the lack of external validation, accessibility, and the infrastructure to process risk scores and to ensure the algorithms usability and effectiveness²⁶. Given the importance of model interpretability for government policymaking, we suggest using explanatory algorithms such as Local Interpretable Model-Agnostic Explanations (LIME)⁷⁰ and Shapley Additive Explanations (SHAP)⁷¹. Explanatory algorithms could allow clinicians and patients to understand the relationships between the variables in the model, making them more transparent and interpretable. In the reviewed studies only 36% (n = 16/44) included them in their analysis. However, these methods have limitations. Both LIME and SHAP provide insights based on the correlations defined by the model, but they do not offer causal explanations. Additionally, these methods may struggle with collinearity; for example, in the presence of highly correlated variables, SHAP values might be high for one variable and zero or very low for another⁷². Furthermore, calculating SHAP values can be computationally expensive, especially for large datasets or complex models. Lastly, the SHAP method assumes the model is a “black box”, meaning it does not incorporate information about the model’s internal structure⁷³.

Only five models reached the threshold of methodological quality, reproducibility, and external validation that is required to support utilization in clinical practice. For most studies (n = 39,89%) there was an absence of reporting on how results could be implemented clinically or for external validation. This limitation emphasises why deployment of ML models in real-world clinical settings remains uncommon¹¹. The generalisability of the reviewed models is also limited, due to the use of data from limited sources such as being from a single centre,²¹^,62,63 a single geographic area,²⁶ or sources that don’t fully reflect the population they aim to study⁵⁸.

Sampling bias in data collection was found to be a common problem in studies using insurance claims datasets (which don’t fully capture the demographics most at risk for opioid-related harms) and research based on military personnel data (which may focus on a younger, predominantly male population⁵⁸). Some studies (n = 4,9%) also revealed risk of racial bias and found differences in the racial composition of the groups used to develop the model versus those used to test it⁶², which could particularly impact under-represented subgroups of patients and are important considerations in the development of ML algorithms⁷⁴.

Predictor variables varied significantly across the studies that developed clinical prediction models, with various studies not providing a complete list of variables used, which raises concerns about reproducibility. While demographic factors such as age and sex were commonly reported, only 15 out of 44 studies (n = 34%) explicitly included socioeconomic status as a predictor in their models. The lack of consistent reporting and inclusion of variables such as socioeconomic status in many studies may weaken the robustness of the ML models and limit their generalisability.

For the identified models aimed to predict mortality, their accuracy is questionable since the cause of death was not always available and may not necessarily be related to opioid exposure⁵⁷.

For the selection of model predictors, authors chose them based on theory or previous literature. There is an opportunity to explore the benefits of a pure data mining approach and compare its utility against with models developed with user input⁴³. The addition of opioid dosage (e.g. as morphine milligram equivalents per day) in future prediction models aimed for clinical implementation is crucial. Even though opioid dose has been found to be associated with a higher likelihood of opioid-harms by several studies^43,75, less than half of the reviewed studies (n = 21,44%) considered opioid dosage, often due to lack of data availability. Additionally, considering the time-varying nature of opioid use with discrete periods of being on or off the drug, transparent preparation of this type of data is especially important to avoid misclassification of opioid exposure that can considerably affect the point estimates associated with adverse outcomes⁷⁶. None of the reviewed models that predicted adverse opioid outcomes employed ML for time-to-event analysis, indicating a potential area for future development.

Recent advancements in deep learning have shown strong predictive capabilities of transformer-based models, which could improve the performance of existing ML models in fields like healthcare⁷⁷. Originally developed for natural language processing, their ability to capture structure in human language could generalise to life-sequences, such as socio-economic and health data, for classification tasks. However, despite their ability to provide highly accurate predictions and often outperform state-of-the-art algorithms, transformer-based models are still nascent in the context of clinical prediction modelling and present several current challenges. These include their highly complex architectures with multiple layers of attention mechanisms, a vast number of parameters, high computational demands, domain-specific adaptations and the need for large, high-quality, well-curated datasets⁷⁸.

We would like to acknowledge strengths and limitations of the current study. This review evaluated and summarised systematically the various applications of machine learning models in addressing prescription opioid-related harms in adults. We acknowledge that some authors may have performed model calibration as well as handling class imbalance and missing data but not mentioned so in their studies. Therefore, we labelled risk of bias as “unclear” to avoid overestimating problems. In cases with not enough information on model development, performance, and calibration, it is hard to interpret the validity of the results. This study complements the works by Garbin et al. and Emam et al., by focusing specifically on machine learning-based prognostic models that predict opioid-related harms and assessing bias risk using PROBAST. Systematic reviews have been partially conducted addressing the opioid epidemic in the present year^79,80, mainly focusing on postoperative opioid misuse, while other opioid-related harms have not undergone similar assessments. In addition, we report on specific areas of development that could be useful for future research building on the work done in this field thus far.

The application of machine learning in predicting opioid-related harms and identifying at-risk patients has shown promising results, offering valuable insights into the complex landscape of opioid use. Currently, most of these models have not been implemented in clinical practice. Instead, they serve as research tools, illustrating the potential power of machine learning in enhancing our understanding of opioid-related harms. One of the key limitations of the current literature is the lack of transparency reporting and limited external validation studies for the developed prediction models using ML. In navigating the future integration of ML in opioid-related harm prediction, researchers should prioritise comprehensive validation efforts, ensuring that these models are robust, generalisable, and properly reported to be reproducible.

Methods

Identification of studies

To identify relevant articles that were published from the inception of records until the 12th of October 2023, an all-time search was conducted using Ovid MEDLINE, PubMed, and SCOPUS databases. The search was performed without any restrictions on publication date, language, or study design to ensure all relevant studies were included. To improve transparency, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist⁸¹. Using the Population, Intervention, Comparison, and Outcome (PICO) framework, we defined our relevant search: different types of prescription opioids (population), a broad definition of ML (intervention) and opioid-related harms (outcome). We used a combination of MeSH terms, keywords, and Boolean operators to construct our search string (full search details are provided in Supplementary Table 6).

Inclusion criteria

We used a three-stage process to identify the studies to be included in this review (Fig. 1). Articles were eligible for full-text review if they detailed construction of one or more machine learning-based prediction models, focusing on a primary outcome directly associated with prescription opioid-harms. We specifically sought studies addressing harms such as dependence, misuse, abuse, overdose, hospitalisations, and death. It is important to note that the terminology used to describe these harms varied across studies. Terms such as chronic opioid use, long-term opioid use, and persistent opioid use may have been used interchangeably or with different definitions⁶⁴. Opioid dependence, whilst sometimes used synonymously in the literature, is the adaptation to repeated exposure to some drugs and medicines usually characterised by tolerance and/or withdrawal and may be inevitable for those on long-term opioids⁸². Addiction relates to dependence with a compulsive preoccupation to seek and take an opioid despite consequences⁸². To ensure a comprehensive coverage of opioid-related harms, we included studies that used a range of these terms and their synonyms (Supplementary Table 6). Regarding methodology, we considered a study to use supervised machine learning if it reported any statistical learning technique to predict outcomes of interest or categorise cases based on a known ground truth, regardless of the terminology used by the authors^83,84. We excluded studies that developed models based solely on regression techniques. In addition, eligible studies had to predominantly utilise data from EHRs or patient administrative records.

Exclusion criteria

We excluded studies from our review if they met any of the following criteria: 1) studies that primarily focused on risk factor identification without a clear emphasis on predictive modelling; 2) studies utilising ML to process and analyse human language, enhance the reading of images, to understand user-generated text; 3) studies employing genetic traits or molecular markers as predictive factors; 4) studies that were systematic reviews, letters, conference abstracts or commentaries: 5) studies relying on data sources other than EHRs or administrative patient data (e.g. public domain data such as social media, internet searches, and surveys); 6) studies not concentrating on human subjects or those specifically targeting paediatric populations (patients younger than 18 years old). Full inclusion and exclusion criteria can be found in [Table 4].

Table 4 - Eligibility criteria of included studies

Full size table

Screening process used to select studies for inclusion

All abstracts were screened by C.R. with uncertainties being resolved by consensus with M.J. The full text of selected abstracts was assessed for eligibility by C.R., with the supervision of M.J.

Data extraction, quality assessment and analysis

In our data extraction using the “CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies” (CHARMS) checklist Bias assessment process was performed using the Prediction model Risk of Bias Assessment (PROBAST) tool. We followed the standardised checklist designed by B.M. Fernandez-Felix, et al.⁸⁵. The data collected for each study included information such as study design, sample size, number of outcome events, main data sources, machine learning algorithms used, the best and worst algorithms based on C-statistic, outcomes measured and their definition, mention of class imbalance (imbalance between the frequency of outcome events and nonevents)⁸⁶ and missingness, calibration measures reported and details of internal and external validation. The full list of extraction items can be found in the Supplementary Table 6. To identify external validation studies an all-time search was conducted on October 12, 2023, using Ovid MEDLINE, PubMed, and SCOPUS databases (full search details are provided in Supplementary Table 6).

Data availability

All data generated or analysed during this study are included in this published article and its supplementary information. All study materials are available from the corresponding author upon reasonable request.

References

Skolnick, P. The Opioid Epidemic: Crisis and Solutions. Annu Rev. Pharm. Toxicol. 58, 143–159 (2018).
Article CAS Google Scholar
NIH National Institute on Drug Abuse. Drug Overdose Deaths: Facts and Figures. https://nida.nih.gov/research-topics/trends-statistics/overdose-death-rates (2024).
Volkow, N. D. & Blanco, C. The changing opioid crisis: development, challenges and opportunities. Mol. Psychiatry 26, 218–233 (2021).
Article PubMed Google Scholar
Moore, E. M., Warshawski, T., Jassemi, S., Charles, G. & Vo, D. X. Time to act: Early experience suggests stabilization care offers a feasible approach for adolescents after acute life-threatening opioid toxicity. Paediatr. Child Health 27, 260–264 (2022).
Article PubMed PubMed Central Google Scholar
Gardner, E. A., McGrath, S. A., Dowling, D. & Bai, D. The Opioid Crisis: Prevalence and Markets of Opioids. Forensic Sci. Rev. 34, 43–70 (2022).
CAS PubMed Google Scholar
Alenezi, A., Yahyouche, A. & Paudyal, V. Current status of opioid epidemic in the United Kingdom and strategies for treatment optimisation in chronic pain. Int J. Clin. Pharm. 43, 318–322 (2021).
Article PubMed Google Scholar
Jani, M., Birlie Yimer, B., Sheppard, T., Lunt, M. & Dixon, W. G. Time trends and prescribing patterns of opioid drugs in UK primary care patients with non-cancer pain: A retrospective cohort study. PLOS Med. 17, e1003270 (2020).
Article PubMed PubMed Central Google Scholar
Humphreys, K. et al. Responding to the opioid crisis in North America and beyond: recommendations of the Stanford-Lancet Commission. Lancet 399, 555–604 (2022).
Article PubMed PubMed Central Google Scholar
Roehrs, A., da Costa, C. A., Righi, R. D. & de Oliveira, K. S. Personal Health Records: A Systematic Literature Review. J. Med Internet Res 19, e13 (2017).
Article PubMed PubMed Central Google Scholar
Dhiman, P. et al. Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review. BMC Med. Res. Methodol. 22, 101 (2022).
Article PubMed PubMed Central Google Scholar
Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019).
Article PubMed PubMed Central Google Scholar
Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369, m1328 (2020).
Article PubMed PubMed Central Google Scholar
Wang, F., Kaushal, R. & Khullar, D. Should Health Care Demand Interpretable Artificial Intelligence or Accept “Black Box” Medicine? Ann. Intern Med. 172, 59–60 (2020).
Article PubMed Google Scholar
Lipton, Z. The mythos of model interpretability. Commun. ACM 61, 36–43 (2018).
Article Google Scholar
Ciobanu-Caraus, O. et al. A critical moment in machine learning in medicine: on reproducible and interpretable learning. Acta Neurochirurgica 166, 14 (2024).
Article PubMed PubMed Central Google Scholar
Varoquaux, G. & Cheplygina, V. Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digit Med. 5, 48 (2022).
Article PubMed PubMed Central Google Scholar
Christodoulou, E. et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019).
Article PubMed Google Scholar
Sharma, B. et al. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients. BMC Med. Inform. Decision Making 20, 79 (2020).
Article Google Scholar
Sharma, V., Kulkarni, V., Eurich, D. T., Kumar, L. & Samanani, S. Safe opioid prescribing: a prognostic machine learning approach to predicting 30-day risk after an opioid dispensation in Alberta, Canada. BMJ open 11, e043964 (2021).
Article PubMed PubMed Central Google Scholar
Liu, Y. S. et al. Individualized prospective prediction of opioid use disorder. Can. J. Psychiatry 68, 54–63 (2023).
Article PubMed Google Scholar
Behnoush, B. et al. Machine learning algorithms to predict seizure due to acute tramadol poisoning. Hum. Exp. Toxicol. 40, 1225–1233 (2021).
Article CAS PubMed Google Scholar
Held, U. et al. Development and internal validation of a prediction model for long-term opioid use-an analysis of insurance claims data. Pain 165, 44–53 (2024).
PubMed Google Scholar
Karhade, A. V. et al. Predicting prolonged opioid prescriptions in opioid-naïve lumbar spine surgery patients. Spine J. 20, 888–895 (2020).
Article PubMed Google Scholar
Karhade, A. V. et al. Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation. Spine J. 19, 1764–1771 (2019).
Article PubMed Google Scholar
Calcaterra, S. L. et al. Prediction of Future Chronic Opioid Use Among Hospitalized Patients. J. Gen. Intern. Med. 33, 898–905 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lo-Ciganic, W.-H. et al. Developing and validating a machine-learning algorithm to predict opioid overdose in Medicaid beneficiaries in two US states: a prognostic modelling study. Lancet Digi. Health 4, e455–e465 (2022).
Article CAS Google Scholar
Fouladvand, S. et al. A Comparative Effectiveness Study on Opioid Use Disorder Prediction Using Artificial Intelligence and Existing Risk Models. IEEE J. Biomed. Health Inform. 27, 3589–3598 (2023).
Article PubMed Google Scholar
Hur, J. et al. Predicting postoperative opioid use with machine learning and insurance claims in opioid-naïve patients. Am. J. Surg. 222, 659–665 (2021).
Article PubMed PubMed Central Google Scholar
Segal, Z. et al. Development of a machine learning algorithm for early detection of opioid use disorder. Pharmacol. Res. Perspect. 8, e00669 (2020).
Article CAS PubMed PubMed Central Google Scholar
van Smeden, M., Reitsma, J. B., Riley, R. D., Collins, G. S. & Moons, K. G. Clinical prediction models: diagnosis versus prognosis. J. Clin. Epidemiol. 132, 142–145 (2021).
Article PubMed Google Scholar
Lu, Y. et al. Machine-learning model successfully predicts patients at risk for prolonged postoperative opioid use following elective knee arthroscopy. Knee Surg. Sports Traumatol. Arthrosc. 30, 762–772 (2022).
Article PubMed Google Scholar
Katakam, A., Karhade, A. V., Schwab, J. H., Chen, A. F. & Bedair, H. S. Development and validation of machine learning algorithms for postoperative opioid prescriptions after TKA. J. Orthop. 22, 95–99 (2020).
Article PubMed PubMed Central Google Scholar
Gabriel, R. A. et al. Machine learning approach to predicting persistent opioid use following lower extremity joint arthroplasty. Reg. Anesthesia pain. Med. 47, 313–319 (2022).
Article Google Scholar
Klemt, C. et al. Machine learning algorithms predict extended postoperative opioid use in primary total knee arthroplasty. Knee Surg., Sports Traumatol., Arthrosc. 30, 2573–2581 (2022).
Article PubMed Google Scholar
Dong, X. et al. Predicting opioid overdose risk of patients with opioid prescriptions using electronic health records based on temporal deep learning. J. Biomed. Inform. 116, 103725 (2021).
Article PubMed Google Scholar
Dong, X. et al. Machine Learning Based Opioid Overdose Prediction Using Electronic Health Records. AMIA Annu. Symp. Proc. 2019, 389–398 (2019).
PubMed Google Scholar
Gellad, W. F. et al. Development and validation of an overdose risk prediction tool using prescription drug monitoring program data. Drug Alcohol Depend. 246, 109856 (2023).
Article CAS PubMed Google Scholar
Lo-Ciganic, W.-H. et al. Integrating human services and criminal justice data with claims data to predict risk of opioid overdose among Medicaid beneficiaries: A machine-learning approach. PloS one 16, e0248360 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lo-Ciganic, W.-H. et al. Evaluation of Machine-Learning Algorithms for Predicting Opioid Overdose Risk Among Medicare Beneficiaries With Opioid Prescriptions. JAMA Netw. Open 2, e190968 (2019).
Article PubMed PubMed Central Google Scholar
Ripperger, M. et al. Ensemble learning to predict opioid-related overdose using statewide prescription drug monitoring program and hospital discharge data in the state of Tennessee. J. Am. Med. Inform. Assoc.: JAMIA 29, 22–32 (2021).
Article PubMed PubMed Central Google Scholar
Sun, J. W. et al. Predicting overdose among individuals prescribed opioids using routinely collected healthcare utilization data. PloS one 15, e0241083 (2020).
Article CAS PubMed PubMed Central Google Scholar
Annis, I. E., Jordan, R. & Thomas, K. C. Quickly identifying people at risk of opioid use disorder in emergency departments: trade-offs between a machine learning approach and a simple EHR flag strategy. BMJ open 12, e059414 (2022).
Article PubMed PubMed Central Google Scholar
Banks, T. J., Nguyen, T. D., Uhlmann, J. K., Nair, S. S. & Scherrer, J. F. Predicting opioid use disorder before and after the opioid prescribing peak in the United States: A machine learning tool using electronic healthcare records. Health Inform. J. 29, 14604582231168826 (2023).
Article Google Scholar
Dong, X. et al. Identifying risk of opioid use disorder for patients taking opioid medications with deep learning. J. Am. Med. Inform. Assoc. : JAMIA 28, 1683–1693 (2021).
Article PubMed PubMed Central Google Scholar
Kunze, K. N., Polce, E. M., Alter, T. D. & Nho, S. J. Machine Learning Algorithms Predict Prolonged Opioid Use in Opioid-Naïve Primary Hip Arthroscopy Patients. J. Am. Acad. Orthop. Surg. Glob. Res. Rev. 5, 00093–00098 (2021).
Google Scholar
Grazal, C. F. et al. A machine-learning algorithm to predict the likelihood of prolonged opioid use following arthroscopic hip surgery. Arthrosc. 38, 839–847.e832 (2022).
Article Google Scholar
Gao, W., Leighton, C., Chen, Y., Jones, J. & Mistry, P. Predicting opioid use disorder and associated risk factors in a medicaid managed care population. Am. J. Managed Care 27, 148–154 (2021).
Article Google Scholar
Kashyap, A., Callison-Burch, C. & Boland, M. R. A deep learning method to detect opioid prescription and opioid use disorder from electronic health records. Int. J. Med. Inform. 171, 104979 (2023).
Article PubMed Google Scholar
Lo-Ciganic, W.-H. et al. Using machine learning to predict risk of incident opioid use disorder among fee-for-service Medicare beneficiaries: A prognostic study. PloS one 15, e0235981 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bjarnadottir, M. V., Anderson, D. B., Agarwal, R. & Nelson, D. A. Aiding the prescriber: developing a machine learning approach to personalized risk modeling for chronic opioid therapy amongst US Army soldiers. Health Care Manag. Sci. 25, 649–665 (2022).
Article PubMed Google Scholar
Johnson, D. G. et al. Prescription quantity and duration predict progression from acute to chronic opioid use in opioid-naive Medicaid patients. PLOS Digit. Health 1, https://doi.org/10.1371/journal.pdig.0000075 (2022).
Mohl, J. T. et al. Predicting Chronic Opioid Use Among Patients With Osteoarthritis Using Electronic Health Record Data. Arthritis Care Res. 75, 1511–1518 (2023).
Article Google Scholar
Sharma, V. et al. Development and Validation of a Machine Learning Model to Estimate Risk of Adverse Outcomes Within 30 Days of Opioid Dispensation. JAMA Netw. Open 5, e2248559 (2022).
Article PubMed PubMed Central Google Scholar
Vunikili, R. et al. Predictive modelling of susceptibility to substance abuse, mortality and drug-drug interactions in opioid patients. Front. Artif. Intell. 4, 742723 (2021).
Article PubMed PubMed Central Google Scholar
Che, Z., St Sauver, J., Liu, H. & Liu, Y. Deep Learning Solutions for Classifying Patients on Opioid Use. AMIA … Annu. Symp. Proc. AMIA Symp. 2017, 525–534 (2017).
PubMed Google Scholar
Ellis, R. J., Wang, Z., Genes, N. & Ma’ayan, A. Predicting opioid dependence from electronic health records with machine learning. BioData Min. 12, 3 (2019).
Article PubMed PubMed Central Google Scholar
Guo, J. et al. Predicting Mortality Risk After a Hospital or Emergency Department Visit for Nonfatal Opioid Overdose. J. Gen. Intern. Med. 36, 908–915 (2021).
Article PubMed PubMed Central Google Scholar
Anderson, A. B. et al. Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use after ACL Reconstruction? Clin. Orthop. Relat. Res. 478, 00–1618 (2020).
Article Google Scholar
Karhade, A. V., Schwab, J. H. & Bedair, H. S. Development of Machine Learning Algorithms for Prediction of Sustained Postoperative Opioid Prescriptions After Total Hip Arthroplasty. J. Arthroplast. 34, 2272–2277.e2271 (2019).
Article Google Scholar
Karhade, A. V. et al. Machine learning for prediction of sustained opioid prescription after anterior cervical discectomy and fusion. Spine J. 19, 976–983 (2019).
Article PubMed Google Scholar
Zhang, Y. et al. A predictive-modeling based screening tool for prolonged opioid use after surgical management of low back and lower extremity pain. Spine J. 20, 1184–1195 (2020).
Article PubMed Google Scholar
Giladi, A. M. et al. Patient-Reported Data Augment Prediction Models of Persistent Opioid Use after Elective Upper Extremity Surgery. Plast. Reconstruct. Surg. 152, 358e–366e (2023).
CAS Google Scholar
Baxter, N. B. et al. Predicting persistent opioid use after hand surgery: a machine learning approach. Plast. Reconstr. Surg. 54, 573–580 (2024).
Article Google Scholar
Huang, Y.-T., Jenkins, D. A., Peek, N., Dixon, W. G. & Jani, M. High frequency of long-term opioid use among patients with rheumatic and musculoskeletal diseases initiating opioids for the first time. Ann. Rheumatic Dis. 82, 1116–1117 (2023).
Article Google Scholar
Chen, S.-F. et al. External validation of machine learning algorithm predicting prolonged opioid prescriptions in opioid-naive lumbar spine surgery patients using a Taiwanese cohort. J. Formosan Med. Associat. https://doi.org/10.1016/j.jfma.2023.06.027 (2023).
Yen, H.-K. et al. A machine learning algorithm for predicting prolonged postoperative opioid prescription after lumbar disc herniation surgery. An external validation study using 1,316 patients from a Taiwanese cohort. Spine J. 22, 1119–1130 (2022).
Article PubMed Google Scholar
Nahm, F. S. Receiver operating characteristic curve: overview and practical use for clinicians. Korean J. Anesthesiol. 75, 25–36 (2022).
Article PubMed PubMed Central Google Scholar
Monaghan, T. F. et al. Foundational statistical principles in medical research: sensitivity, specificity, positive predictive value, and negative predictive value. Medicina (Kaunas) 57, https://doi.org/10.3390/medicina57050503 (2021).
Movahedi, F., Padman, R. & Antaki, J. F. Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores. J. Thorac. Cardiovasc Surg. 165, 1433–1442.e1432 (2023).
Article PubMed Google Scholar
Shin, J. Feasibility of local interpretable model-agnostic explanations (LIME) algorithm as an effective and interpretable feature selection method: comparative fNIRS study. Biomed. Eng. Lett. 13, 689–703 (2023).
Article PubMed PubMed Central Google Scholar
Rodríguez-Pérez, R. & Bajorath, J. Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J. Comput Aided Mol. Des. 34, 1013–1026 (2020).
Article PubMed PubMed Central Google Scholar
Durgia, C. Using SHAP for Explainability — Understand these Limitations First, https://towardsdatascience.com/using-shap-for-explainability-understand-these-limitations-first-1bed91c9d21 (2021).
Huang, X. & Marques-Silva, J. On the failings of Shapley values for explainability. Int. J. Approx. Reason. 171, 109112 (2024).
Article Google Scholar
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Article CAS PubMed Google Scholar
Goesling, J. et al. Trends and predictors of opioid use after total knee and total hip arthroplasty. Pain 157, 1259–1265 (2016).
Article CAS PubMed PubMed Central Google Scholar
Jani, M. et al. Take up to eight tablets per day”: Incorporating free-text medication instructions into a transparent and reproducible process for preparing drug exposure data for pharmacoepidemiology. Pharmacoepidemiol Drug Saf. 32, 651–660 (2023).
Article PubMed PubMed Central Google Scholar
Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 1–14 (2023).
Denecke, K., May, R. & Rivera-Romero, O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J. Med Syst. 48, 23 (2024).
Article PubMed PubMed Central Google Scholar
Garbin, C., Marques, N. & Marques, O. Machine learning for predicting opioid use disorder from healthcare data: A systematic review. Comput. methods Prog. Biomed. 236, 107573 (2023).
Article Google Scholar
Emam, O. S. et al. Machine learning algorithms predict long-term postoperative opioid misuse: a systematic review. Am Surg. 90, 140–151 (2024).
Article PubMed Google Scholar
Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372, n71 (2021).
Article PubMed PubMed Central Google Scholar
Taylor S. et al. Dependence and withdrawal associated with some prescribed medicines: An evidence review. Public Health England (2019) https://assets.publishing.service.gov.uk/media/5fc658398fa8f5474c800149/PHE_PMR_report_Dec2020.pdf.
Andaur Navarro, C. L. et al. Systematic review finds “spin” practices and poor reporting standards in studies on machine learning-based prediction models. J. Clin. Epidemiol. 158, 99–110 (2023).
Article PubMed Google Scholar
Pruneski, J. A. et al. Supervised machine learning and associated algorithms: applications in orthopedic surgery. Knee Surg. Sports Traumatol. Arthrosc. 31, 1196–1202 (2023).
Article PubMed Google Scholar
Fernandez-Felix, B. M., López-Alcalde, J., Roqué, M., Muriel, A. & Zamora, J. CHARMS and PROBAST at your fingertips: a template for data extraction and risk of bias assessment in systematic reviews of predictive models. BMC Med. Res. Methodol. 23, 44 (2023).
Article PubMed PubMed Central Google Scholar
Japkowicz, N. & Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 429–449 (2002).
Article Google Scholar

Download references

Acknowledgements

This work was funded by the National Institute for Health and Care Research (NIHR) [NIHR301413]. M.J. is funded by an NIHR Advanced Fellowship [NIHR301413]. The views expressed in this publication are those of the authors and not necessarily those of the NIHR, NHS or the UK Department of Health and Social Care. The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Author information

Authors and Affiliations

Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, Division of Musculoskeletal and Dermatological Sciences, The University of Manchester, Manchester, United Kingdom
Carlos R. Ramírez Medina & Meghna Jani
Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, United Kingdom
Jose Benitez-Aurioles & David A. Jenkins
NIHR Manchester Biomedical Research Unit, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, United Kingdom
Meghna Jani
Salford Royal Hospital, Northern Care Alliance, Salford, United Kingdom
Meghna Jani

Authors

Carlos R. Ramírez Medina
View author publications
You can also search for this author inPubMed Google Scholar
Jose Benitez-Aurioles
View author publications
You can also search for this author inPubMed Google Scholar
David A. Jenkins
View author publications
You can also search for this author inPubMed Google Scholar
Meghna Jani
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Study Design and Execution: M.J., and C.R. conceptualized the study. C.R, M.J. and D.A.J. collaboratively designed the study workflow. C.R. conducted formal analysis, including identifying eligible studies, performing data collection and assessing quality of the studies. Writing and Editing: C.R. drafted the initial manuscript, and M.J., D.A.J., C.R., and J.A. actively participated in review, editing, and finalization. Supervision: M.J. provided overall supervision and guidance throughout the research and writing process. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Meghna Jani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ramírez Medina, C.R., Benitez-Aurioles, J., Jenkins, D.A. et al. A systematic review of machine learning applications in predicting opioid associated adverse events. npj Digit. Med. 8, 30 (2025). https://doi.org/10.1038/s41746-024-01312-4

Download citation

Received: 05 March 2024
Accepted: 24 October 2024
Published: 16 January 2025
DOI: https://doi.org/10.1038/s41746-024-01312-4

Subjects

Abstract

Similar content being viewed by others

Machine learning-based causal models for predicting the response of individual patients to dexamethasone treatment as prophylactic antiemetic

Development and validation of a risk-score model for opioid overdose using a national claims database

Predictive models in emergency medicine and their missing data strategies: a systematic review

Introduction

Results

Study selection

Overview of included studies

Outcomes of interest of the reviewed studies

Prognostic models to predict opioid use after surgery

Prognostic models to predict opioid use disorder

Prognostic models to predict opioid overdose

Prediction models for persistent opioid use

Prediction models for opioid dependence

Prognostic models to predict other opioid-related harms

Risks of bias

Discussion

Methods

Identification of studies

Inclusion criteria

Exclusion criteria

Screening process used to select studies for inclusion

Data extraction, quality assessment and analysis

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Material

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links